Empathy List Archives

sigdial@list.sigdial.org

SIGdial Mailing List

View all threads

Call for Papers: First Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022

Tirthankar Ghosal

Wed, May 11, 2022 2:38 PM

*** First Workshop on Information Extraction from Scientific Publications
(WIESP) at AACL-IJCNLP 2022 ***

*** Website: https://ui.adsabs.harvard.edu/WIESP/
*** Twitter: https://twitter.com/wiesp_nlp

The number of scientific papers published per year has exploded in recent
years. Indexing the article's full text in search engines helps discover
and retrieve vital scientific information to continue building on the
shoulders of giants, informing policy, and making evidence-based decisions.
Nevertheless, it is difficult to navigate this ocean of data. Using simple
string matching has substantial limitations: human language is ambiguous in
nature, context matters, and we frequently use the same word and acronyms
to represent a multitude of different meanings. Extracting structured and
semantically relevant information from scientific publications (e.g.,
named-entity recognition, summarization, citation intention, linkage to
knowledge graphs) allows for better selection and filter articles.

The First Workshop on Information Extraction from Scientific Publications
(WIESP) will create the necessary forum to foster discussion and research
using Natural Language Processing and Machine Learning. WIESP would
specifically focus on topics related to information extraction from
scientific publications, including (but not limited to):

Scientific document parsing
Scientific named-entity recognition
Scientific article summarization
Question-answering on scientific articles
Citation context/span extraction
Structured information extraction from full-text, tables, figures,
bibliography
Novel datasets curated from scientific publications
Argument extraction and mining
Challenges in information extraction from scientific articles
Building knowledge graphs via mining scientific literature; querying
scientific knowledge graphs
Novel tools for IE on scientific literature and interaction with users
Mathematical information extraction
Scientific concepts, facts extraction
Visualizing scientific knowledge
Bibliometric and Altmetric studies via information extraction from
scientific articles and metadata
Information extraction from COVID-19 articles to inform public health
policy

In addition to research paper presentations, WIESP would also feature
keynote talks, a panel discussion, and a shared task. We will update the
details on our website as and when they become available. We especially
welcome participation from academic and research institutions, government
and industry labs, publishers, and information service providers. Projects
and organizations using NLP/ML techniques in their text mining and
enrichment efforts are also welcome to participate.

Call for Papers

We invite papers of the following categories:

Long papers must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.

Short papers must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
Position papers will give voice to authors who wish to take a position on a
topic listed above or the field of scholarly information extraction.
Submissions need not present original work and should be two to four pages
in length, including title, text, figures and tables, and references.

Demo papers should be no more than four (4) pages in length,
including references, and should describe implemented systems that are of
relevance to the theme of the workshop. Authors of demo papers should be
willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.

Extended Abstracts We welcome submissions of extended abstracts (2
pages max) related to the research topics mentioned above. Submissions may
include previously published results, late-breaking results, or a
description of ongoing projects in the broad field of information
extraction and mining from scientific publications. Extended abstracts can
also summarize existing work, work in progress, or a collection of works
under a unified theme (e.g., a series of closely related papers that build
on each other or tackle a common problem).

Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)

A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall, 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:

https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks

Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysical Data System (
https://ui.adsabs.harvard.edu/).

Important Dates

Paper/Abstract Submission Deadline: August 25, 2022
Notification of workshop paper/abstract acceptance: September 25, 2022
Camera-ready Submission Deadline: October 10, 2022
Workshop: November 24, 2021 (online)

All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)

Submission Website and Format

Submission will be via softconf. We will update the submission link shortly
on our website. Submissions should follow the ACLPUB formatting guidelines (
https://acl-org.github.io/ACLPUB/formatting.html) and template files (
https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long
and Short Papers) will be subject to a double-blind peer-review process.
Position papers, Demo papers, and Extended Abstracts need not be
anonymized. The authors will present accepted papers at the workshop either
as a talk or a poster. All accepted papers will be published in the
workshop proceedings.

We follow the same policies as AACL-IJCNLP 2022 regarding preprints and
double submissions. The anonymity period for WIESP 2022 is from July 15 to
September 25.

Organizers

Tirthankar Ghosal, Charles University, CZ
Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian,
USA
Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA
Robert M. Patton, Oak Ridge National Laboratory, USA
Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA
Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Tirthankar Ghosal

Researcher at UFAL, Charles University, CZ

https://member.acm.org/~tghosal

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

*** First Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022 *** *** Website: https://ui.adsabs.harvard.edu/WIESP/ *** Twitter: https://twitter.com/wiesp_nlp The number of scientific papers published per year has exploded in recent years. Indexing the article's full text in search engines helps discover and retrieve vital scientific information to continue building on the shoulders of giants, informing policy, and making evidence-based decisions. Nevertheless, it is difficult to navigate this ocean of data. Using simple string matching has substantial limitations: human language is ambiguous in nature, context matters, and we frequently use the same word and acronyms to represent a multitude of different meanings. Extracting structured and semantically relevant information from scientific publications (e.g., named-entity recognition, summarization, citation intention, linkage to knowledge graphs) allows for better selection and filter articles. The First Workshop on Information Extraction from Scientific Publications (WIESP) will create the necessary forum to foster discussion and research using Natural Language Processing and Machine Learning. WIESP would specifically focus on topics related to information extraction from scientific publications, including (but not limited to): - Scientific document parsing - Scientific named-entity recognition - Scientific article summarization - Question-answering on scientific articles - Citation context/span extraction - Structured information extraction from full-text, tables, figures, bibliography - Novel datasets curated from scientific publications - Argument extraction and mining - Challenges in information extraction from scientific articles - Building knowledge graphs via mining scientific literature; querying scientific knowledge graphs - Novel tools for IE on scientific literature and interaction with users - Mathematical information extraction - Scientific concepts, facts extraction - Visualizing scientific knowledge - Bibliometric and Altmetric studies via information extraction from scientific articles and metadata - Information extraction from COVID-19 articles to inform public health policy In addition to research paper presentations, WIESP would also feature keynote talks, a panel discussion, and a shared task. We will update the details on our website as and when they become available. We especially welcome participation from academic and research institutions, government and industry labs, publishers, and information service providers. Projects and organizations using NLP/ML techniques in their text mining and enrichment efforts are also welcome to participate. ***Call for Papers*** We invite papers of the following categories: ***Long papers*** must describe substantial, original, completed, and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Papers must not exceed eight (8) pages of content, plus unlimited pages of references. The final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. ***Short papers*** must describe original and unpublished work. Please note that a short paper is not a shortened long paper. Instead, short papers should have a point that can be made in a few pages, such as a small, focused contribution, a negative result, or an interesting application nugget. Short papers must not exceed four (4) pages, plus unlimited pages of references. The final versions of short papers will be given one additional page of content (up to 5 pages) so that reviewers' comments can be taken into account. Position papers will give voice to authors who wish to take a position on a topic listed above or the field of scholarly information extraction. Submissions need not present original work and should be two to four pages in length, including title, text, figures and tables, and references. ***Demo papers*** should be no more than four (4) pages in length, including references, and should describe implemented systems that are of relevance to the theme of the workshop. Authors of demo papers should be willing to present a demo of their system during WIESP at AACL-IJCNLP 2022. ***Extended Abstracts*** We welcome submissions of extended abstracts (2 pages max) related to the research topics mentioned above. Submissions may include previously published results, late-breaking results, or a description of ongoing projects in the broad field of information extraction and mining from scientific publications. Extended abstracts can also summarize existing work, work in progress, or a collection of works under a unified theme (e.g., a series of closely related papers that build on each other or tackle a common problem). ***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)*** A good amount of astrophysics research makes use of data coming from missions and facilities such as ground observatories in remote locations or space telescopes, as well as digital archives that hold large amounts of observed and simulated data. These missions and facilities are frequently named after historical figures or use some ingenious acronym which, unfortunately, can be easily confused when searching for them in the literature via simple string matching. For instance, Planck can refer to the person, the mission, the constant, or several institutions. Automatically recognizing entities such as missions or facilities would help tackle this word sense disambiguation problem. The shared task consists of Named Entity Recognition (NER) on samples of text extracted from astrophysics publications. The labels were created by domain experts and designed to identify entities of interest to the astrophysics community. They range from simple to detect (ex: URLs) to highly unstructured (ex: Formula), and from useful to researchers (ex: Telescope) to more useful to archivists and administrators (ex: Grant). Overall, 31 different labels are included, and their distribution is highly unbalanced (ex: ~100x more Citations than Proposals). Submissions will be scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity level and scikit-learn's Matthews correlation coefficient method at the token level. We also encourage authors to propose their own evaluation metrics. A sample dataset and more instructions can be found at: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks Participants (individuals or groups) will have the opportunity to present their findings during the workshop and write a short paper. The best performant or interesting approaches might be invited to further collaborate with the NASA Astrophysical Data System ( https://ui.adsabs.harvard.edu/). ***Important Dates*** - Paper/Abstract Submission Deadline: August 25, 2022 - Notification of workshop paper/abstract acceptance: September 25, 2022 - Camera-ready Submission Deadline: October 10, 2022 - Workshop: November 24, 2021 (online) ***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)*** ***Submission Website and Format*** Submission will be via softconf. We will update the submission link shortly on our website. Submissions should follow the ACLPUB formatting guidelines ( https://acl-org.github.io/ACLPUB/formatting.html) and template files ( https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long and Short Papers) will be subject to a double-blind peer-review process. Position papers, Demo papers, and Extended Abstracts need not be anonymized. The authors will present accepted papers at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings. We follow the same policies as AACL-IJCNLP 2022 regarding preprints and double submissions. The anonymity period for WIESP 2022 is from July 15 to September 25. ***Organizers*** - Tirthankar Ghosal, Charles University, CZ - Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian, USA - Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA - Robert M. Patton, Oak Ridge National Laboratory, USA - Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA - Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++