The workshop and shared task "Sentiment Analysis at SEPLN (TASS)" has been held since 2012, under the umbrella of the International Conference of the Spanish Society for Natural Language Processing (SEPLN). TASS was the first shared task on sentiment analysis in Twitter in Spanish. Spanish is the second language used in Facebook and Twitter , which calls for the development and availability of language-specific methods and resources for sentiment analysis. The initial aim of TASS was the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter.
Although sentiment analysis is still an open problem, the Organization Committee would like to foster research on other tasks related to the processing of the semantics of texts written in Spanish. Consequently, the name of the workshop/shared task has been changed to "Workshop on Semantic Analysis at SEPLN (TASS)".
As in previous years, TASS-2017 proposes two evaluation tasks related to polarity classification at tweet level and at aspect level. The novelty of this year is the proposal of a new dataset for the task of sentiment analysis at document (tweet) level.
Moreover, the Organization Committee appeals to the research community to propose and organize evaluation tasks related to other semantic tasks in the Spanish language. New tasks provide an opportunity to create linguistic resources, evaluate their usefulness, and promotes the consolidation of a community of researchers interested in the addressed topics. Thus, we encourage the semantic processing community to propose and submit evaluation tasks, with the support of the Organization Committee of TASS.
TASS-2017 will be the 6th event of the series and will be held in conjunction with the 33rd International Conference of the Spanish Society for Natural Language Processing (SEPLN), in Murcia, Spain, on September 19th, 2017.
A Google Group has been set up for this year’s TASS Shared Task where announcements will be made. Do send your questions and feedback to (firstname.lastname@example.org).
Proposal of Tasks
Semantic analysis has given rise to new tasks that attempt to further improve natural language understanding systems. In the context of sentiment analysis, some such tasks are cross- and multi-domain sentiment analysis, as well as aspect-based sentiment analysis. Outside the sentiment analysis arena, other tasks attracting the interest of the research community are stance classification, negation handling, rumour identification, fake news identification, open information extraction, argumentation mining, classification of semantic relations, and question answering of non-factoid questions, to name a few. We encourage the research community to propose evaluation tasks related to such semantic analysis processes in Spanish. The above list is by no means closed, so feel free to submit any evaluation task proposal that you consider interesting for the research community.
Proposals must include the following:
- Title of the task
- Description of the evaluation task
- Linguistic resources available or resources to be created
- Important dates
- Organization committee
- Contact person (name and email)
The proposals must be sent to email@example.com by April 15, 2017. Notification of acceptance April 21, 2017.
Although TASS-2017 will include tasks related to several types of semantic processing tasks, sentiment analysis is still the main target of the workshop. Two tasks address the performance of polarity classification systems of tweets written in Spanish.
Task 1: Sentiment Analysis at Tweet level
This task focuses on the evaluation of polarity classification systems at tweet level in Spanish. Training, development and test datasets will be provided in order to train and evaluate the systems. The new dataset is called InterTASS and it is composed for tweets written in Spanish. For more details, read the Datasets section.
The dataset, which is called InterTASS, is annotated with 4 different polarity labels (P, N, NEU, NONE), and the submitted systems will have to identify the intensity of the opinion expressed in each tweet.
The submitted systems can used any set of data as training dataset, i.e. the training set of InterTASS, other training sets from the previous editions or other sets of tweets. However, it is forbiden the use of the test set of InterTASS and the test set of the datasets of previous editions as training data. Participants can use any kind of linguistic resource for the development of their classification model. The systems must be evaluated on the test set of InterTASS and the two test sets of the General Corpus of TASS (see previous editions). Participants are expected to submit three experiments per each evaluation set, so each participant team can submit a maximum of 9 files of results. Regarding the evaluation of the performance of your systems with the two sets of the General Corpus of TASS, you only have to evaluate the classification at 4 levels of intensity of polarity (P, N, NEU, NONE).
Accuracy and the macro-averaged versions of Precision, Recall and F1 will be used as evaluation measures. Systems will be ranked by the Macro-F1 and Accuracy measures.
Results must be submitted in a plain text file with the following format:
tweet_id \t polarity
Where polarity can be:
Task 2: Aspect-based Sentiment Analysis
The second task proposes the development of aspect-based polarity classification systems. Two datasets are provided to evaluate the systems: Social-TV and STOMPOL. The two datasets have annotated for aspect, the main category of aspect, and the polarity of the opinion about the aspect. The systems have to classify the opinion about the given aspect in a three-intensity level range of opinion: Positive, Neutral and Negative.
Participants are expected to submit up to 3 experiments for each corpus, each in a plain text file with the following format:
tweetid \t aspect \t polarity
Allowed polarity values are
For evaluation, a single label combining "aspect-polarity" will be considered. As in Task 1, the macro-averaged version of Precision, Recall and F1, and Accuracy are the evaluation measures, and Macro-F1 will be used for ranking the systems.
The participants of TASS-2017 will use the following corpora for developing their systems.
International TASS Corpus (InterTASS) is a new corpus released this year for Task 1.
The sentiemnt of the tweets of the corpus are annotated in a scale of 4 levels of polarity:
NONE. The corpus has three datasets:
- Training: it is composed of 1008 tweets.
- Development: it is composed of 506 tweets.
- Test: it is composed of 1899 tweets.
The three datasets of the corpus are three XML files, and an example of a tweet of InterTASS is the following one:
<tweet> <tweetid>768224728049999872</tweetid> <user>caval100</user> <content>Se ha terminado #Rio2016 Lamentablemente no arriendo las ganancias al pueblo brasileño por la penuria que les espera Suerte y solidaridad</content> <date>2016-08-23 23:13:42</date> <lang>es</lang> <sentiment> <polarity><value>N</value></polarity> </sentiment> </tweet>
The General Corpus of TASS is still available. Please visit this link for details on how to obtain it.
This corpus was collected during the 2014 Copa del Rey final in Spain between Real Madrid and F.C. Barcelona, played on 16 April 2014 at Mestalla Stadium in Valencia. Over 1 million tweets were collected from 15 minutes before to 15 minutes after the match. Irrelevant tweets where filtered out and a subset of 2,773 was selected.
All tweets were manually annotated at aspect level and more than one aspect may be in each tweet. The list of aspects is:
Equipo(any other team)
Jugador(any other player)
Sentiment polarity was annotated from the point of view of the Twitter user, using 3 tags:
No distinction is made in cases when the author does not express any sentiment or
expresses a no-positive no-negative sentiment.
The Social-TV corpus was randomly divided into two sets: training (1,773 tweets) and test (1,000 tweets), with a similar distribution of both aspects and sentiments. The training set will be released so that participants may train and validate their models. The test corpus will be provided without any annotation and will be used to evaluate the results provided by the different systems.
Three sample tweets from the training set are shown here:
<tweet id="456544898791907328"> <sentiment aspect="Equipo-Real_Madrid" polarity="P">#HalaMadrid</sentiment> ganamos sin <sentiment aspect="Jugador-Cristiano_Ronaldo" polarity="NEU">Cristiano</sentiment>. .perdéis con <sentiment aspect="Jugador-Lionel_Messi" polarity="N">Messi</sentiment>. Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>! !!!!! </tweet> <tweet id="456544898942906369"> @nevermind2192 <sentiment aspect="Equipo-Barcelona" polarity="P">Barça</sentiment> por siempre!! </tweet> <tweet id="456544898951282688"> <sentiment aspect="Partido" polarity="NEU">#FinalCopa</sentiment> Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, campeón de la <sentiment aspect="Partido" polarity="P">copa del rey</sentiment> </tweet>
STOMPOL (corpus of Spanish Tweets for Opinion Mining at aspect level about POLitics) is a corpus of tweets written in Spanish annotated at aspect level. The topic of the tweets is the political campaign of the 2015 regional and local elections in Spain. The tweets were gathered April 23-24, and are related to one of the following political aspects:
Economía(Economy): taxes, infrastructure, markets, labor policy...
Sanidad(Health System): hospitals, public/private health system, drugs, doctors...
Educación(Education): state school, private school, scholarships...
Propio_partido(Political party): anything good (speeches, electoral programme...) or bad (corruption, criticism) related to the entity
Otros_aspectos(Other aspects): electoral system, environmental policy...
Each aspect is related to one or several entities (separated by the pipe symbol |) that correspond to one of the main political parties in Spain:
Each tweet in the corpus was manually annotated by two different annotators, plus a third one in case of disagreement, with the sentiment polarity at aspect level. Sentiment polarity was annotated from the point of view of the Twitter user, using 3 levels: P, NEU and N. No difference is made between no sentiment and neutral sentiment (neither positive nor negative).
Each political aspect is linked to its corresponding political party and its polarity.
Some examples are shown in the following figure:
<tweet id="591267548311769088"> @ahorapodemos @Pablo_Iglesias_ @SextaNocheTV Que alguien pregunte si habrá cambios en las <sentiment aspect="Educacion" entity="Podemos" polarity="NEU">becas</sentiment> MEC para universitarios, por favor. </tweet> <tweet id="591192167944736769"> #Arroyomolinos lo que le interesa al ciudadano son Políticos cercanos que se interesen y preocupen por sus problemas <sentiment aspect="Propio_partido" entity="Union_Progreso_y_Democracia" polarity="P">@UPyD</sentiment> VECINOS COMO TU </tweet>
The corpus is made up of 1,284 tweets, and has been divided into training set (784 tweets), which is provided for building and validating the systems, and test set (500 tweets) that will be used for evaluation.
Downloading any of these datasets requires the signment of the TASS Corpus Licence Agreement, which can be done filling the form that is in this link. After the submission of the form you will receive an email with the link to download the data.
If you use the corpus for your research (papers, articles, presentations for conferences or educational purposes), please cite one of the following publications:
- Martínez-Cámara, E., García-Cumbreras, M.A., Villena-Román, J., & García-Morera, J. (2016). TASS 2015 - The Evolution of the Spanish Opinion Mining Systems. Procesamiento del Lenguaje Natural, 56.
- Villena-Román, J., Martínez-Cámara, E., García-Morera, J. & Jiménez-Zafra, S. (2015). TASS 2014 - The Challenge of Aspect-based Sentiment Analysis. Procesamiento del Lenguaje Natural, 54.
- Villena-Román, J., García-Morera, J., Lana-Serrano, S., & González-Cristóbal, J.C. (2014). TASS 2013 - A Second Step in Reputation Analysis in Spanish. Procesamiento del Lenguaje Natural, 52.
- Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., González-Cristobal, J.C. (2013). TASS - Workshop on Sentiment Analysis at SEPLN. Procesamiento del Lenguaje Natural, 50.
You have to fill the Registration Form to be registered on the TASS 2017.
You can access to the evaluation page in this link.
Datasets downloadsParticipants must use the following datasets for developing and evaluating their data. The Licence has to be signed in order to download the data.
General Corpus of TASS
The Organization Committee of TASS encourages participants to submit a description paper of their systems. Submitted papers will be reviewed by a scientific committee, and only accepted papers will be published at CEUR, as in previous years (2015 and 2016).
The manuscripts must satisfy the following rules:
- Up to 6 pages plus references formatted according to the SEPLN template.
- Articles can be written in English or Spanish. The title, abstract and keywords must be written in both languages.
- The document format must be Word or Latex, but the submission must be in PDF format.
- Instead of describing the task and/or the corpus, you should focus on the description of your experiments and the analysis of your results, and include a citation to the Overview paper.
Depending on the final number of participants and the time allocated for the workshop, all or a selected group of papers will be presented and discussed in the Workshop session.
The proceedings are published in CEUR and you can read them here.
TASS 2017 is going to be celebrated the 19th of September (Tuesday). Currently, the venue of TASS is the Faculty of Letters of the University of Murcia. TASS is going to start at 15:30 and finish about 19:00. We recommend you to read the official program of the Conference in case there is any last change in the location and time of TASS.
The program of TASS is the following:
- All the papers are going to be orally presented.
- The language of the presentation can be Spanish or English.
- The duration of the presentation is 10 minutes. This is a strong requirement, and the chair will stop the presentation at minute ten.
- There will be some time for questions after each presentation.
Task proposal deadline
April 15, 2017
April 21, 2017
Release of training and development corpora
May 1, 2017
Release of test corpora
June 20, 2017
June 30, 2017
Experiment submission and evaluation
July 1, 2017; July 4, 2017
July 15, 2017; July 18, 2017
July 31, 2017
Camera ready submission
August 31, 2017
September 15, 2017
September 19, 2017
Edgar Casasola Murillo University of Costa Rica, Costa Rica
Fermín Cruz Mata University of Sevilla, Spain
Yoan Gutiérrez Vázquez University of Alicante, Spain
Lluís F. Hurtado Polytechnic University of Valencia, Spain
Salud María Jiménez Zafra University of Jaén, Spain
Mª. Teresa Martín Valdivia University of Jaén, Spain
Manuel Montes Gómez National Institute of Astrophysics, Optics and Electronics, Mexico
Antonio Moreno Ortíz University of Málaga, Spain
Preslav Nakov Qatar Computing Research Institute, Qatar
José Manuel Perea Ortega University of Extremadura, Spain
Ferrán Pla Universidad Politécnica de Valencia, Spain
Sara Rosenthal IBM Research, U.S.A.
Maite Taboada Simon Fraser University, Canada
L. Alfonso Ureña López University of Jaén, Spain