TASS-2018-Good-Or-Bad-News @SEPLN

About

This task is focused on an emotional categorization in the domain of news articles. A corpus is currently being built from RSS feeds of different online newspapers written in different varieties of Spanish, namelly Argentina, Chile, Colombia, Cuba, Spain, USA, Mexico, Peru and Venezuela. Then the plan is to classify the information provided by each feed (at least, a headline, and in many cases, a brief summary of the article) into an emotional categorization of SAFE or UNSAFE, from the point of view of the general public of each corresponding country.

This task could be considered as a kind of stance classification, on the positioning of the editor or the public of an article on a news content. The task is a strong challenge because it has to deal with the polarity of feeling (safe vs unsafe) and to work in combination with a (pseudo) thematic classification to be able to determine the meaning of the news. For example, the reduction of traffic accidents has a negative feeling because of the accidents, but the context of reducing the numbers makes that these are finally good news.

Tasks

Two subtasks are proposed, the first one aims at evaluating the performance of the systems without taking into account the varieties of the Spanish language. On the other hand, the second task is a local multilingual challenge, because the training set is composed of texts written in the Spanish language spoken in Spain and the test sets are texts written in the Spanish language spoken in different countries of America.

Subtask-1: Monolingual classification

The aim of the task is the classification of the headline of a news in SAFE or UNSAFE for incorporating an ad. If a news arises a positive or a neutral emotion, it is safe for incorporating ads, but if it arises a negative emotion, the news is unsafe for adding ads. The submitted systems will have to face up the following challenges:

Lack of context. The participants will work with headline of news, that are usually very short and without any contextual information.
Genearalization. The topics of the headlines of news are very diverse, so it makes more difficult the classification task.
Lexical diversity. The training and the test data include utterances written in the Spanish language spoken in Spain and in different coutries of America.

The participants will be provided with a training and development sets of the SANSE corpus (see Dataset section), and the test set for the evaluation. In this task, the three sets are composed of headlines of news written in different version of the Spanish language, but the country of the text is not relevant for this task. The three sets are annotated with two levels of safety: SAFE and UNSAFE. Therefore, the task is a binary classification task.

The evaluation of Subtask-1 is organized in two levels:

The test data will be composed of the headlines of the test subset of the SANSE corpus.
The test set will be larger than in L1, about 13,000 headlines, and the headlines are written in all varieties of the Spanish language.

Subtask-2: Multilingual classification

The aim is similar to the Subtask-1, but in this case the aim is to evaluate the generalization capacity of the submitted systems. The participants will be provided with a training and development set of SANSE with headlines of news only written in the Spanish language spoken in Spain. Several test sets will be provided, and they will be composed of headline of news written in the Spanish language spoken in different countries of America.

Evaluation

The systems presented will be evaluated using the measures of Macro-Precision, Macro-Recall, Macro-F1 and Accuracy.

Datasets

The participants will use the Spanish brANd Safe Emotion corpus (SANSE).

SANSE corpus

The SANSE corpus is composed of 2,000 headlines of news writen in the Spanish language spoken in Spain and in several Amaerican countries, specifically Mexico, Cuba, Chile, Colombia, Argentina, Venezuela, Peru and U.S.A. Therefore, SENSE is a representative corpus of headline of news written in Spanish all over the Spanish speaking world.

The annotation was carried out by two human annotators, namelly the two organizers of the task. A safe headline of a news was defined as an utterance that arises a positive or neutral emotion in the reader, or it is not related to a controversial topics: religion, extreme wing political topic, topics that may arise strong positive emotions to some readers but strong negative emotions to other ones. An unsafe headline was defined as an utterance that arises negative emotions on the reader. Some examples:

Así será el nuevo pan integral en España, según una nueva ley en marcha.

According to a new law in progress, the new wholemeal bread will be in what follows.

SAFE

Casi 300 municipios de Colombia en riesgo electoral.

Almost 300 Colombian towns are in electoral risk.

UNSAFE

The agreement of the annotation was 0,58 according to Π [1] and Κ [2], which may consider moderate according to Landis and Koch [3]. Although the agreement is moderate, it is close to be considered substantial, and we have also to take into account that it is a new classification task that works with a strong subjective content. All those cases with no agreement between the two annotators, a third annotator undid the tie. We will work in making the annotations guidelines more precise in order to improve the agreement of the annotators. Besides, we hope that the participants will give us insights with the aim of improving the annotation of the data.

The SANSE corpus is divided in three subsest for Substask-1, specifically: training, development and test. The statistics of the three subsets are in the following table.

Training

1250

Development

250

Test

(L1) 500; (L2) 13,152

The statistics of SENSE corpus for Subtask-2 are the following:

Training (Spain)

300

Development (Spain)

Test (Mexico)

144

Test (Cuba)

194

Test (Chile)

194

Test (Colombia)

195

Test (Argentina)

198

Test (Venezuela)

233

Test (Peru)

234

Test (USA)

260

Shared Task

Evaluation

The evaluation web page is available and it is at: http://www.sepln.org/workshops/tass/2018/task-4/private/evaluation/evaluate.php

Results must be submitted in a plain text file with the following format:

ID_Headline\tLABEL

Official results

The official results web page will be released after the evaluation time.

Award

The best system of Subtask-1 will receive the best system award that is a cash prize of 100€, which is sponsored by MeaningCloud.

Datasets downloads

The use of SANSE corpus requires of agreeing the terms of use of the data through the signment of the TASS Data License.

Subtask-1

Training & Development: You must sign the License of terms of use in order to download it. The license is at: http://www.sepln.org/workshops/tass/tass_data/download.php
Test: It will be released in the download section of TASS. You must use the same URL that you used for downloading the Training and Development datasets.

Subtask-2

Training & Development: You must sign the License of terms of use in ordeer to download it. The license is at: http://www.sepln.org/workshops/tass/tass_data/download.php
Test: It will be released in the download section of TASS. You must use the same URL that you used for downloading the Training and Development datasets.

Proceedings

The same as for Task-1 and Task-2. See main webpage of TASS-2018.

You should take into account that the content of the paper must be 6 pages plus references. You have to strongly focus on the description of your system, and you do not have to waste space describing the details of the task or the corpora. The details of the task and the corpora will be published in a "Overview" paper, which we recommend you to cite in your paper. The provisional bibtex code of the reference to the Overview paper is:

@inproceedings{overview_tass2018,
	author		= "Mart\'{i}nez-C\'{a}mara, Eugenio and Almeida-Cruz, Yudivi\'{a}n and D\'{i}az-Galiano, Manuel C. and Est\'{e}vez-Velarde, Suilan and Garc\'{i}a-Cumbreras, Miguel \'{A}. and Garc\'{i}a-Vega, Manuel and Guti\'{e}rrez, Yoan and Montejo R\'{a}ez, Arturo and Montoyo, Andr\'{e}s and Mu\~{n}oz, Rafael and Piad-Morffis, Alejandro and Villena-Rom\'{a}n Julio",
	title		= "Overview of TASS 2018: Opinions, Health and Emotions",
	booktitle	= "Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS 2018)",
	editor		= "Mart\'{i}nez-C\'{a}mara, Eugenio and Almeida Cruz, Yudivi\'{a}n and D\'{i}az-Galiano, Manuel C. and Est\'{e}vez Velarde, Suilan and Garc\'{i}a-Cumbreras, Miguel \'{A}. and Garc\'{i}a-Vega, Manuel and Guti\'{e}rrez V\'{a}zquez, Yoan and Montejo R\'{a}ez, Arturo and Montoyo Guijarro, Andr\'{e} and Mu\~{n}oz Guillena, Rafael and Piad Morffis, Alejandro and Villena-Rom\'{a}n Julio",
	volume		= "",
	series		= "CEUR Workshop Proceedings",
	pages		= "1-X",
	address	= "Sevilla, Spain",
	publisher	= "CEUR-WS",
	year		= "2018",
	month		= "September",
}

Because of the CEUR rules for assignation of volume number, we will kindly ask you to update the reference of the "Overview" paper before the submission of the camera-ready.

You have to send your paper to the email direction: tass-sepln@googlegroups.com. If you have any problem, please, let us know (emcamara@decsai.ugr.es).

Program

To be announced.

Presentation instructions

To be announced.

Important dates

Release of training and development corpora

May 2, 2018

Release of test corpora

June 25, 2018

Deadline for evaluation

June 27, 2018

Deadline for evaluation

July 3, 2018

Paper submission

July 16, 2018

Paper submission

July 24, 2018

Review notification

August 7, 2018

Camera ready submission

September 5, 2018

Publication

September 17, 2018

Workshop

September 18, 2018

Organization

Julio Villena Román MeaningCloud, Spain
Eugenio Martínez Cámara University of Granada, Spain

References

Scott, William A. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3):321–325.

Cohen, Jacob. 1960. A coefficient of agree-ment for nominal scales. Educational and Psychological Measurement, 20(1):37–46.

Landis, J. Richard and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.