Welcome to the 4th evaluation workshop for sentiment analysis focused on Spanish. TASS 2015 will be held as part of the 31st SEPLN Conference in Alicante, Spain, on September 15th, 2015. You are invited to attend the workshop, taking part in the proposed tasks and visiting this beautiful city!Register
|15:30 - 16:00||Opening and overview|
|16:00 - 17:00||Participant reports (I)|
|17:00 - 17:30||Coffee break|
|17:30 - 18:30||Participant reports (II)|
|18:30 - 19:00||Discussion and closing|
TASS is an experimental evaluation workshop for sentiment analysis and online reputation analysis focused on Spanish language, organized as a satellite event of the annual conference of the Spanish Society for Natural Language Processing (SEPLN). After three previous successful editions, TASS 2015 will take place on September 15th, 2015 at University of Alicante, Spain.
The aim of TASS is to provide a forum for discussion and communication where the latest research work and developments in the field of sentiment analysis in social media, specifically focused on Spanish language, can be shown and discussed by scientific and business communities. The main objective is to promote the application of state-of-the-art algorithms and techniques for sentiment analysis applied to short text opinions extracted from social media messages (specifically Twitter).
Several challenge tasks are proposed, intended to provide a benchmark forum for comparing the latest approaches in these fields. In addition, with the creation and release of the fully tagged corpus, we aim to provide a benchmark dataset that enables researchers to compare their algorithms and systems.
First of all, we are interested in evaluating the evolution of the different approaches for sentiment analysis and text classification in Spanish during these years. So, the traditional sentiment analysis at global level task will be repeated again, reusing the same corpus, to compare results. Moreover, we want to foster the research in the analysis of fine-grained polarity analysis at aspect level (aspect-based sentiment analysis, one of the new requirements of the market of natural language processing in these areas.
Thus the following two tasks are proposed this year.
Participants are expected to submit up to 3 results of different experiments for one or both of these tasks, in the appropriate format described below.
Along with the submission of experiments, participants will be invited to submit a paper to the workshop in order to describe their experiments and discussing the results with the audience in a regular workshop session. More information about format and requirements will be provided soon.
Submissions must be done through the following page, using the provided user and password:
There you must select the task and fill in the name of your group, the run ID and the run file, and the system will automatically check and evaluate your submission according to the defined metrics and keep a history of everything.
If you want to resubmit your experiment, just use the same group name and run id.
Please notice that the list of submissions is public and open to all participants.
You may submit any experiment at any moment that you want, but the valid official runs are the ones up to July 2nd, included.
All participants are invited to submit a paper with the description of the main keys of your systems and the discussion of your results. The papers will be reviewed by a scientific committee, and only the accepted papers will be published at CEUR.
Depending on the final number of participants and the time slot allocated for the workshop, all or a selected group op papers will be selected to be presented and discussed in the Workshop session.
The manuscripts must to satisfy the following rules:
Instead of describing the task and/or the corpus, focus on the description of your experiments and the analysis of your results, and include a citation to the Overview paper (more information will be provided soon).
Submissions can be done by email to
tass AT sngularmeaning.team. The deadline for submissions is July 20th. Notification of acceptance is expected for July 25th and the publication will be by the end of July.
This task consists on performing an automatic sentiment analysis to determine the global polarity of each message in the provided test sets (complete set and 1k set) of the General corpus (see below). This task is a reedition of the task in the previous years. Participants will be provided with the training set of the General corpus so that they may train and validate their models.
There will be two different evaluations: one based on 6 different polarity labels (P+, P, NEU, N, N+, NONE) and another based on just 4 labels (P, N, NEU, NONE).
Participants are expected to submit (up to 3) experiments for the 6-labels evaluation, but are also allowed to submit (up to 3) specific experiments for the 4-labels scenario.
Accuracy (correct tweet polarity according to the gold standard) will be used for ranking the systems. The confusion matrix will be generated and then used to evaluate the precision, recall and F1-measure for each individual category (polarity). Macroaveraged precision, recall and F1-measure will be also calculated for the whole run.
Results must be submitted in a plain text file with the following format:
tweetid \t polarity
where polarity can be:
NONEfor the 6-labels case
NONEfor the 4-labels case.
The same test corpus of previous years will be used for the evaluation, to allow for comparison among systems. Obviously, participants are not allowed to use any test data to train their systems.
Notice that there are two test sets: complete set and 1k set, a subset of the first one. The reason is that, to deal with the problem of the imbalanced distribution of labels between the training and test set, a selected test subset containing 1000 tweets with a similar distribution to the training corpus was extracted to be used for an alternate evaluation of the performance of systems.
Participants will be provided with a corpus tagged with a series of aspects, and systems must identify the polarity at the aspect-level. Two corpora will be provided: the Social-TV corpus, used last year, and the new STOMPOL corpus, collected this year (both described later). Both corpora have been splitted into training and test set, the first one for building and validating the systems, and the second for evaluation.
Participants are expected to submit up to 3 experiments for each corpus, each in a plain text file with the following format:
tweetid \t aspect \t polarity[for the Social-TV corpus]
tweetid \t aspect-entity \t polarity[for the STOMPOL corpus]
Allowed polarity values are
For evaluation, a single label combining "aspect-polarity" will be considered. Similarly to the first task, accuracy will be used for ranking the systems; precision, recall and F1-measure will be used to evaluate each individual category ("aspect-polarity" label); and macroaveraged precision, recall and F1-measure will be also calculated for the global result.
The general corpus contains over 68 000 Twitter messages, written in Spanish by about 150 well-known personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012. Although the context of extraction has a Spain-focused bias, the diverse nationality of the authors, including people from Spain, Mexico, Colombia, Puerto Rico, USA and many other countries, makes the corpus reach a global coverage in the Spanish-speaking world.
The general corpus has been divided into two sets: training (about 10%) and test (90%). The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems. Obviously, it is not allowed to use the test data from previous years to train the systems.
Each message in both the training and test set is tagged with its global polarity, indicating whether the text expresses a positive, negative or neutral sentiment, or no sentiment at all. A set of 6 labels has been defined: strong positive (
P+), positive (
P), neutral (
NEU), negative (
N), strong negative (
N+) and one additional no sentiment tag (
In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values:
DISAGREEMENT. This is especially useful to make out whether a neutral sentiment comes from neutral keywords or else the text contains positive and negative sentiments at the same time.
Moreover, the polarity at entity level, i.e., the polarity values related to the entities that are mentioned in the text, is also included for those cases when applicable. These values are similarly tagged with 6 possible values and include the level of agreement as related to each entity.
On the other hand, a selection of a set of topics has been made based on the thematic areas covered by the corpus, such as "política" ("politics"), "fútbol" ("soccer"), "literatura" ("literature") or "entretenimiento" ("entertainment"). Each message in both the training and test set has been assigned to one or several of these topics (most messages are associated to just one topic, due to the short length of the text).
All tagging has been done semiautomatically: a baseline machine learning model is first run and then all tags are manually checked by human experts. In the case of the polarity at entity level, due to the high volume of data to check, this tagging has just been done for the training set.
The following figure shows the information of two sample tweets. The first tweet is only tagged with the global polarity as the text contains no mentions to any entity, but the second one is tagged with both the global polarity of the message and the polarity associated to each of the entities that appear in the text (
<tweet> <tweetid>0000000000</tweetid> <user>usuario0</user> <content><![CDATA['Conozco a alguien q es adicto al drama! Ja ja ja te suena d algo!]]></content> <date>2011-12-02T02:59:03</date> <lang>es</lang> <sentiments> <polarity><value>P+</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>entretenimiento</topic> </topics> </tweet> <tweet> <tweetid>0000000001</tweetid> <user>usuario1</user> <content><![CDATA['UPyD contará casi seguro con grupo gracias al Foro Asturias.]]></content> <date>2011-12-02T00:21:01</date> <lang>es</lang> <sentiments> <polarity><value>P</value><type>AGREEMENT</type></polarity> <polarity><entity>UPyD</entity><value>P</value><type>AGREEMENT</type></polarity> <polarity><entity>Foro_Asturias</entity><value>P</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>política</topic> </topics> </tweet>
This corpus was collected during the 2014 Final of Copa del Rey championship in Spain between Real Madrid and F.C. Barcelona, played on 16 April 2014 at Mestalla Stadium in Valencia. Over 1 million tweets were collected from 15 minutes before to 15 minutes after the match. After filtering useless information, tweets in other languages than Spanish, a subset of 2 773 was selected.
All tweets were manually tagged with the aspects of the expressed messages and its sentiment polarity. Tweets may cover more than one aspect. The list of aspects is:
Equipo(any other team)
Jugador(any other player)
Sentiment polarity has been tagged from the point of view of the person who writes the tweet, using 3 levels:
N. No distinction is made in cases when the author does not express any sentiment or when he/she expresses a no-positive no-negative sentiment.
The Social-TV corpus was randomly divided into two sets: training (1 773 tweets) and test (1 000 tweets), with a similar distribution of both aspects and sentiments. The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems.
The following figure shows the information of three sample tweets in the training set.
<tweet id="456544898791907328"><sentiment aspect="Equipo-Real_Madrid" polarity="P">#HalaMadrid</sentiment> ganamos sin <sentiment aspect="Jugador-Cristiano_Ronaldo" polarity="NEU">Cristiano</sentiment>. .perdéis con <sentiment aspect="Jugador-Lionel_Messi" polarity="N">Messi</sentiment>. Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>! !!!!!</tweet> <tweet id="456544898942906369">@nevermind2192 <sentiment aspect="Equipo-Barcelona" polarity="P">Barça</sentiment> por siempre!!</tweet> <tweet id="456544898951282688"><sentiment aspect="Partido" polarity="NEU">#FinalCopa</sentiment> Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, campeón de la <sentiment aspect="Partido" polarity="P">copa del rey</sentiment></tweet>
STOMPOL (corpus of Spanish Tweets for Opinion Mining at aspect level about POLitics) is a corpus of Spanish tweets prepared for the research in the challenging task of opinion mining at aspect level. The tweets were gathered from 23rd to 24th of April, and are related to one of the following political aspects that appear in political campaigns:
Economia(Economics): taxes, infrastructure, markets, labor policy...
Sanidad(Health System): hospitals, public/private health system, drugs, doctors...
Educacion(Education): state school, private school, scholarships...
Propio_partido(Political party): anything good (speeches, electoral programme...) or bad (corruption, criticism) related to the entity
Otros_aspectos(Other aspects): electoral system, environmental policy...
Each aspect is related to one or several entities (separated by pipe
|) that correspond to one of the main political parties in Spain, which are:
Each tweet in the corpus has been manually tagged by two different annotators, and a third one in case of disagreement, with the sentiment polarity at aspect level. Sentiment polarity has been tagged from the point of view of the person who writes the tweet, using 3 levels:
N. Again, no difference is made between no sentiment and a neutral sentiment (neither positive nor negative).
Each political aspect is linked to its correspondent political party and its polarity.
Some examples are shown in the following figure:
<tweet id="591267548311769088">@ahorapodemos @Pablo_Iglesias_ @SextaNocheTV Que alguien pregunte si habrá cambios en las <sentiment aspect="Educacion" entity="Podemos" polarity="NEU">becas</sentiment> MEC para universitarios, por favor.</tweet> <tweet id="591192167944736769">#Arroyomolinos lo que le interesa al ciudadano son Políticos cercanos que se interesen y preocupen por sus problemas <sentiment aspect="Propio_partido" entity="Union_Progreso_y_Democracia" polarity="P">@UPyD</sentiment> VECINOS COMO TU</tweet>
The corpus is composed by 1284 tweets, and has been splitted into training set (784 tweets), which is provided for building and validating the systems, and test set (500 tweets) that will be used for evaluation.
|Release of tasks.|
|Release of training and test corpora (General and Social-TV).|
|Release of training STOMPOL corpus.|
|Release of test STOMPOL corpus.|
|Experiment submissions by participants.|
|Submission of papers.|
|September 15th, 2015||Workshop.|
Please send an email to
tass AT sngularmeaning.team filling in the TASS Corpus License agreement with your email, affiliation (institution, company or any kind of organization). You will be given a password to download the files in the password protected area.
All corpora will be made freely available to the community after the workshop.
If you use the corpus in your research (papers, articles, presentations for conferences or educational purposes), please include a citation to one of the following publications:
ATTOS: Análisis de Tendencias y Temáticas a través de Opiniones y Sentimientos (TIN2012-38536-C03-0)
Ciudad 2020: Hacia un nuevo modelo de ciudad inteligente sostenible (INNPRONTA IPT-20111006)