INLG 2024 Tutorial on Human Evaluation of NLP System Quality
24th September 2024 at INLG 2024, Tokyo
*Available to both in-person and remote attendees. Registration for INLG is
open now: *https://amarys-jtb.jp/INLG2024/
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Famarys-jtb.jp%2FINLG2024%2F&data=05%7C02%7Cc.thomson%40ABDN.AC.UK%7C019744129e4b467ca76908dcd579893d%7C8c2b19ad5f9c49d490773ec3cfc52b3f%7C0%7C0%7C638619965949210013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=XXKU7ypzrdDiQxXeH%2FD1pH2WC2mbzOZpHNont72tEpE%3D&reserved=0
*We will release all slides and colab notebooks from the tutorial: *
https://human-evaluation-tutorial.github.io/
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhuman-evaluation-tutorial.github.io%2F&data=05%7C02%7Cc.thomson%40ABDN.AC.UK%7C019744129e4b467ca76908dcd579893d%7C8c2b19ad5f9c49d490773ec3cfc52b3f%7C0%7C0%7C638619965949223360%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Gt36vXFjiGtSOR%2F1VLKlQHcDsnxyFeQ96e4TBNjlukk%3D&reserved=0
Anya Belz†
João Sedoc⚘
Craig Thomson†
Simon Mille†
Rudali Huidrom†
†ADAPT Research Centre, Dublin City University, Ireland
⚘New York University, USA
Description:
Human evaluation has always been considered the most reliable form of
evaluation in Natural Language Processing (NLP), but recent research has
thrown up a number of concerning issues, including in the design (Belz et
al., 2020; Howcroft et al., 2020) and execution (Thomson et al., 2024) of
human evaluation experiments. Standardisation and comparability across
different experiments is low, as is reproducibility in the sense that
repeat runs of the same evaluation often do not support the same main
conclusions, quite apart from not producing similar scores.
The current situation is likely to be in part due to how human evaluation
is viewed in NLP: not as something that needs to be studied and learnt
before venturing into conducting an evaluation experiment, but something
that anyone can throw together without prior knowledge by pulling in a
couple of students from the lab next door.
Our aim with this tutorial is primarily to inform participants about the
range of options available and choices that need to be made when creating
human evaluation experiments, and what the implications of different
decisions are. Moreover, we will present best practice principles and
practical tools that help researchers design scientifically rigorous,
informative and reliable experiments.
As the next section indicates we are planning for a morning of
presentations and brief exercises, followed by a practical session in the
afternoon where participants will be supported in creating evaluation
experiments and analysing results from them, using tools and other
resources provided by the tutorial team.
We aim to address all aspects of human evaluation of system outputs in a
research setting, equipping participants with the knowledge, tools,
resources and hands-on experience needed to design and execute rigorous and
reliable human evaluation experiments. Take-home materials and online
resources will continue to support participants in conducting experiments
after the tutorial.
Schedule:
Time Unit #: Topic
09:30—10:00 Unit 1: Introduction
10:00—10:30 Unit 2: Development and Components of Human Evaluations
10:30—10:45 Break
10:45—11:45 Unit 3: Quality Criteria and Evaluation Modes
11:45—12:30 Unit 4: Experiment Design
12:30—14:00 Lunch
14:00—15:15 Unit 5: Statistical Analysis of Results
15:15—15:30 Break
15:30—16:15 Unit 6: Experiment Implementation
16:15—16:40 Unit 7: Experiment Execution
16:40—16:55 Break
16:55—18:30 Unit 8: Practical Session
Summary paper:
Anya Belz, João Sedoc, Craig Thomson, Simon Mille and Rudali Huidrom. 2024. The
INLG 2024 Tutorial on Human Evaluation of NLP System Quality: Background,
Overall Aims, and Summaries of Taught Units
https://aclanthology.org/2024.inlg-tutorials.1. In Proceedings of the
17th International Conference on Natural Language Generation, Tokyo, Japan.
--
*
Séanadh Ríomhphoist/Email Disclaimer
*Tá an ríomhphost seo agus aon
chomhad a sheoltar leis faoi rún agus is lena úsáid ag an seolaí agus sin
amháin é. Is féidir tuilleadh a léamh anseo. *
This e-mail and any files
transmitted with it are confidential and are intended solely for use by the
addressee. Read more here.
--
https://www.facebook.com/DCU/ https://twitter.com/DCU
https://www.linkedin.com/company/dublin-city-university
https://www.instagram.com/dublincityuniversity/?hl=en
https://www.youtube.com/user/DublinCityUniversity