GUM 13 Corpus Survey

LL
Lauren Levine
Thu, Apr 16, 2026 5:52 PM

(Apologies for cross-postings)

*** The GUM Corpus - Public Survey ***

*** Georgetown University Multilayer Corpus ***

The Corpling Lab at Georgetown University
https://gucorpling.org/corpling/ would like your participation in this
survey
https://docs.google.com/forms/d/e/1FAIpQLSfDyzbvHHvQKTcwC9Ym9bp-_DlEV8XvBX-6_Bxu5mr7CeHWBA/viewform?usp=header
to
help us better understand GUM usage and preferences regarding current and
potential new genres in the GUM corpus, which would be of great help for
our future selection of genres and availability of formats and annotation
layers.

Survey Link: https://forms.gle/SQkfN8MTHNXo32Z3A

GUM is an open source corpus of richly annotated English texts from
multiple genres: academic, bio, fiction, interview, news, travel, how-to,
Reddit forum discussions, conversations, political speeches, CC vlogs,
textbooks, podcasts, letters, L1 essays, and oral court arguments. The
corpus is created by students as part of the Computational Linguistics
curriculum at Georgetown University and is available under Creative Commons
licenses. As of now, the GUM Corpus has released 12 series containing over
291K tokens annotated for multiple layers. For more information and to
search or download the corpus online, see: https://gucorpling.org/gum/

We value your opinions and appreciate your participation and help! For full
consideration, please respond to the survey by the end of July.

Our lab will be attending the ACL 2026 main conference, CODI-CRAC, and LAW
XX in San Diego, so please feel free to come talk to us if you are in
attendance as well!

Best,

Lauren Levine

--
Lauren Levine
Ph.D. Student | Computational Linguistics
Department of Linguistics
Georgetown University

(Apologies for cross-postings) *** The GUM Corpus - Public Survey *** *** Georgetown University Multilayer Corpus *** The *Corpling Lab at Georgetown University* <https://gucorpling.org/corpling/> would like your participation in this survey <https://docs.google.com/forms/d/e/1FAIpQLSfDyzbvHHvQKTcwC9Ym9bp-_DlEV8XvBX-6_Bxu5mr7CeHWBA/viewform?usp=header> to help us better understand GUM usage and preferences regarding current and potential new genres in the GUM corpus, which would be of great help for our future selection of genres and availability of formats and annotation layers. Survey Link: https://forms.gle/SQkfN8MTHNXo32Z3A GUM is an open source corpus of richly annotated English texts from multiple genres: academic, bio, fiction, interview, news, travel, how-to, Reddit forum discussions, conversations, political speeches, CC vlogs, textbooks, podcasts, letters, L1 essays, and oral court arguments. The corpus is created by students as part of the Computational Linguistics curriculum at Georgetown University and is available under Creative Commons licenses. As of now, the GUM Corpus has released 12 series containing over 291K tokens annotated for multiple layers. For more information and to search or download the corpus online, see: https://gucorpling.org/gum/ We value your opinions and appreciate your participation and help! For full consideration, please respond to the survey by the end of July. Our lab will be attending the ACL 2026 main conference, CODI-CRAC, and LAW XX in San Diego, so please feel free to come talk to us if you are in attendance as well! Best, Lauren Levine -- *Lauren Levine* *Ph.D. Student | Computational Linguistics* Department of Linguistics Georgetown University