Call for Participation DSTC11 - Track4 - Baseline Model Available

LF
Luis Fernando D'Haro
Tue, Jan 31, 2023 8:24 PM

Track 4: Robust and Multilingual Automatic Evaluation Metrics for
Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge
(DSTC11.T4)

Call for Participation


**************** Baseline Model is now available !! ***************


TRACK GOALS AND DETAILS: Two main goals and tasks:
•    Task 1: Propose and develop effective Automatic Metrics for evaluation
of open-domain multilingual dialogs.
•    Task 2: Propose and develop Robust Metrics for dialogue systems
trained with back translated and paraphrased dialogs in English.
EXPECTED PROPERTIES OF THE PROPOSED METRICS:
•    High correlation with human annotated assessments.
•    Explainable metrics in terms of the quality of the model-generated
responses.
•    Participants can propose their own metric or optionally improve the
baseline evaluation metric deep AM-FM (Zhang et al, 2020).

DATASETS:
For training: Up to 18 Human-Human curated multilingual datasets (+3M
turns), with turn/dialogue level automatic annotations as toxicity or
sentiment analisys, among others.
Dev/Test: Up to 10 Human-Chatbot curated multilingual datasets (+150k
turns), with turn/dialogue level human annotations including QE metrics or
cosine similarity.
Data translated and back-translated into several languages (English,
Spanish and Chinese). Also, there are several paraphrases with annotations
for each dataset.

BASELINE MODEL:
The default choice is Deep AM-FM (Zhang et al, 2020). This model has been
adapted to be able to evaluate multilingual datasets, as well as to work
with paraphrased and backtranslated sentences.
GitHub: https://github.com/karthik19967829/DSTC11-Benchmark

REGISTRATION AND FURTHER INFORMATION:
ChatEval: https://chateval.org/dstc11
GitHub: https://github.com/Mario
-RC/dstc11_track4_robust_multilingual_metrics

PROPOSED SCHEDULE:
Training/Validation data release: From November to December in 2022
Test data release: Middle of March in 2023
Entry submission deadline: Middle of March in 2023
Submission of final results: End of March in 2023
Final result announcement: Early of April in 2023
Paper submission: From March to May in 2023
Workshop: July-September/2023 in a venue to be announced with DSTC11

ORGANIZATIONS:
Universidad Politécnica de Madrid (Spain)
National University of Singapore (Singapore)
Tencent AI Lab (China)
New York University (USA)
Carnegie Mellon University (USA)

Mario Rodríguez Cantelar
Postgraduate Non-Doctoral Researcher / PhD student
Centre for Automation and Robotics (UPM-CSIC)

--
Luis Fernando D'Haro
Profesor Contratado Doctor / Associate Professor
Grupo de Tecnología del Habla y Aprendizaje Automático / Speech Technology
and Machine Learning Group
Dpto. de Ingeniería Electrónica / Dept. of Electronics Engineering
Escuela Técnica Superior de Ingeniería de Telecomunicación
Universidad Politécnica de Madrid
Avenida Complutense nº 30, Ciudad Universitaria, 28040 - Madrid (España).
Despacho/Room: B-108
Teléfono/Phone: (+34) 910672174
Homepage: http://gth.die.upm.es/~lfdharo

*Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)* *Call for Participation* ******************************************************************* **************** *Baseline Model is now available !!* *************** ******************************************************************* *TRACK GOALS AND DETAILS: Two main goals and tasks:* • Task 1: Propose and develop effective Automatic Metrics for evaluation of open-domain multilingual dialogs. • Task 2: Propose and develop Robust Metrics for dialogue systems trained with back translated and paraphrased dialogs in English. *EXPECTED PROPERTIES OF THE PROPOSED METRICS:* • High correlation with human annotated assessments. • Explainable metrics in terms of the quality of the model-generated responses. • Participants can propose their own metric or optionally improve the baseline evaluation metric deep AM-FM (Zhang et al, 2020). *DATASETS:* For training: Up to 18 Human-Human curated multilingual datasets (+3M turns), with turn/dialogue level automatic annotations as toxicity or sentiment analisys, among others. Dev/Test: Up to 10 Human-Chatbot curated multilingual datasets (+150k turns), with turn/dialogue level human annotations including QE metrics or cosine similarity. Data translated and back-translated into several languages (English, Spanish and Chinese). Also, there are several paraphrases with annotations for each dataset. *BASELINE MODEL:* The default choice is Deep AM-FM (Zhang et al, 2020). This model has been adapted to be able to evaluate multilingual datasets, as well as to work with paraphrased and backtranslated sentences. GitHub: https://github.com/karthik19967829/DSTC11-Benchmark *REGISTRATION AND FURTHER INFORMATION:* ChatEval: https://chateval.org/dstc11 GitHub: https://github.com/Mario -RC/dstc11_track4_robust_multilingual_metrics *PROPOSED SCHEDULE:* Training/Validation data release: From November to December in 2022 Test data release: Middle of March in 2023 Entry submission deadline: Middle of March in 2023 Submission of final results: End of March in 2023 Final result announcement: Early of April in 2023 Paper submission: From March to May in 2023 Workshop: July-September/2023 in a venue to be announced with DSTC11 *ORGANIZATIONS:* Universidad Politécnica de Madrid (Spain) National University of Singapore (Singapore) Tencent AI Lab (China) New York University (USA) Carnegie Mellon University (USA) *Mario Rodríguez Cantelar* Postgraduate Non-Doctoral Researcher / PhD student Centre for Automation and Robotics (UPM-CSIC) -- *Luis Fernando D'Haro* Profesor Contratado Doctor / Associate Professor Grupo de Tecnología del Habla y Aprendizaje Automático / Speech Technology and Machine Learning Group Dpto. de Ingeniería Electrónica / Dept. of Electronics Engineering Escuela Técnica Superior de Ingeniería de Telecomunicación *Universidad Politécnica de Madrid* Avenida Complutense nº 30, Ciudad Universitaria, 28040 - Madrid (España). Despacho/Room: B-108 Teléfono/Phone: (+34) 910672174 *Homepage:* http://gth.die.upm.es/~lfdharo