*** First Workshop on Information Extraction from Scientific Publications
(WIESP) at AACL-IJCNLP 2022 ***
The number of scientific papers published per year has exploded in recent
years. Indexing the article's full text in search engines helps discover
and retrieve vital scientific information to continue building on the
shoulders of giants, informing policy, and making evidence-based decisions.
Nevertheless, it is difficult to navigate this ocean of data. Using simple
string matching has substantial limitations: human language is ambiguous in
nature, context matters, and we frequently use the same word and acronyms
to represent a multitude of different meanings. Extracting structured and
semantically relevant information from scientific publications (e.g.,
named-entity recognition, summarization, citation intention, linkage to
knowledge graphs) allows for better selection and filter articles.
The First Workshop on Information Extraction from Scientific Publications
(WIESP) will create the necessary forum to foster discussion and research
using Natural Language Processing and Machine Learning. WIESP would
specifically focus on topics related to information extraction from
scientific publications, including (but not limited to):
In addition to research paper presentations, WIESP would also feature
keynote talks, a panel discussion, and a shared task. We will update the
details on our website as and when they become available. We especially
welcome participation from academic and research institutions, government
and industry labs, publishers, and information service providers. Projects
and organizations using NLP/ML techniques in their text mining and
enrichment efforts are also welcome to participate.
Call for Papers
We invite papers of the following categories:
Long papers must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
Short papers must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
Position papers will give voice to authors who wish to take a position on a
topic listed above or the field of scholarly information extraction.
Submissions need not present original work and should be two to four pages
in length, including title, text, figures and tables, and references.
Demo papers should be no more than four (4) pages in length,
including references, and should describe implemented systems that are of
relevance to the theme of the workshop. Authors of demo papers should be
willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.
Extended Abstracts We welcome submissions of extended abstracts (2
pages max) related to the research topics mentioned above. Submissions may
include previously published results, late-breaking results, or a
description of ongoing projects in the broad field of information
extraction and mining from scientific publications. Extended abstracts can
also summarize existing work, work in progress, or a collection of works
under a unified theme (e.g., a series of closely related papers that build
on each other or tackle a common problem).
Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall, 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysical Data System (
All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)
Submission Website and Format
Submission will be via softconf. We will update the submission link shortly
on our website. Submissions should follow the ACLPUB formatting guidelines (
https://acl-org.github.io/ACLPUB/formatting.html) and template files (
https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long
and Short Papers) will be subject to a double-blind peer-review process.
Position papers, Demo papers, and Extended Abstracts need not be
anonymized. The authors will present accepted papers at the workshop either
as a talk or a poster. All accepted papers will be published in the
We follow the same policies as AACL-IJCNLP 2022 regarding preprints and
double submissions. The anonymity period for WIESP 2022 is from July 15 to
Researcher at UFAL, Charles University, CZ