*** First Workshop on Information Extraction from Scientific Publications
(WIESP) at AACL-IJCNLP 2022 ***
*** Website: https://ui.adsabs.harvard.edu/WIESP/
*** Twitter: https://twitter.com/wiesp_nlp
The number of scientific papers published per year has exploded in recent
years. Indexing the article's full text in search engines helps discover
and retrieve vital scientific information to continue building on the
shoulders of giants, informing policy, and making evidence-based decisions.
Nevertheless, it is difficult to navigate this ocean of data. Using simple
string matching has substantial limitations: human language is ambiguous in
nature, context matters, and we frequently use the same word and acronyms
to represent a multitude of different meanings. Extracting structured and
semantically relevant information from scientific publications (e.g.,
named-entity recognition, summarization, citation intention, linkage to
knowledge graphs) allows for better selection and filter articles.
The First Workshop on Information Extraction from Scientific Publications
(WIESP) will create the necessary forum to foster discussion and research
using Natural Language Processing and Machine Learning. WIESP would
specifically focus on topics related to information extraction from
scientific publications, including (but not limited to):
- Scientific document parsing
- Scientific named-entity recognition
- Scientific article summarization
- Question-answering on scientific articles
- Citation context/span extraction
- Structured information extraction from full-text, tables, figures,
 bibliography
- Novel datasets curated from scientific publications
- Argument extraction and mining
- Challenges in information extraction from scientific articles
- Building knowledge graphs via mining scientific literature; querying
 scientific knowledge graphs
- Novel tools for IE on scientific literature and interaction with users
- Mathematical information extraction
- Scientific concepts, facts extraction
- Visualizing scientific knowledge
- Bibliometric and Altmetric studies via information extraction from
 scientific articles and metadata
- Information extraction from COVID-19 articles to inform public health
 policy
In addition to research paper presentations, WIESP would also feature
keynote talks, a panel discussion, and a shared task. We will update the
details on our website as and when they become available. We especially
welcome participation from academic and research institutions, government
and industry labs, publishers, and information service providers. Projects
and organizations using NLP/ML techniques in their text mining and
enrichment efforts are also welcome to participate.
Call for Papers
We invite papers of the following categories:
Long papers must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
Short papers must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
Position papers will give voice to authors who wish to take a position on a
topic listed above or the field of scholarly information extraction.
Submissions need not present original work and should be two to four pages
in length, including title, text, figures and tables, and references.
Demo papers should be no more than four (4) pages in length,
including references, and should describe implemented systems that are of
relevance to the theme of the workshop. Authors of demo papers should be
willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.
Extended Abstracts We welcome submissions of extended abstracts (2
pages max) related to the research topics mentioned above. Submissions may
include previously published results, late-breaking results, or a
description of ongoing projects in the broad field of information
extraction and mining from scientific publications. Extended abstracts can
also summarize existing work, work in progress, or a collection of works
under a unified theme (e.g., a series of closely related papers that build
on each other or tackle a common problem).
Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall, 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysical Data System (
https://ui.adsabs.harvard.edu/).
Important Dates
- Paper/Abstract Submission Deadline: August 25, 2022
- Notification of workshop paper/abstract acceptance: September 25, 2022
- Camera-ready Submission Deadline: October 10, 2022
- Workshop: November 24, 2021 (online)
All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)
Submission Website and Format
Submission will be via softconf. We will update the submission link shortly
on our website. Submissions should follow the ACLPUB formatting guidelines (
https://acl-org.github.io/ACLPUB/formatting.html) and template files (
https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long
and Short Papers) will be subject to a double-blind peer-review process.
Position papers, Demo papers, and Extended Abstracts need not be
anonymized. The authors will present accepted papers at the workshop either
as a talk or a poster. All accepted papers will be published in the
workshop proceedings.
We follow the same policies as AACL-IJCNLP 2022 regarding preprints and
double submissions. The anonymity period for WIESP 2022 is from July 15 to
September 25.
Organizers
- Tirthankar Ghosal, Charles University, CZ
- Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian,
 USA
- Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA
- Robert M. Patton, Oak Ridge National Laboratory, USA
- Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA
- Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*** First Workshop on Information Extraction from Scientific Publications
(WIESP) at AACL-IJCNLP 2022 ***
*** Website: https://ui.adsabs.harvard.edu/WIESP/
*** Twitter: https://twitter.com/wiesp_nlp
The number of scientific papers published per year has exploded in recent
years. Indexing the article's full text in search engines helps discover
and retrieve vital scientific information to continue building on the
shoulders of giants, informing policy, and making evidence-based decisions.
Nevertheless, it is difficult to navigate this ocean of data. Using simple
string matching has substantial limitations: human language is ambiguous in
nature, context matters, and we frequently use the same word and acronyms
to represent a multitude of different meanings. Extracting structured and
semantically relevant information from scientific publications (e.g.,
named-entity recognition, summarization, citation intention, linkage to
knowledge graphs) allows for better selection and filter articles.
The First Workshop on Information Extraction from Scientific Publications
(WIESP) will create the necessary forum to foster discussion and research
using Natural Language Processing and Machine Learning. WIESP would
specifically focus on topics related to information extraction from
scientific publications, including (but not limited to):
- Scientific document parsing
- Scientific named-entity recognition
- Scientific article summarization
- Question-answering on scientific articles
- Citation context/span extraction
- Structured information extraction from full-text, tables, figures,
bibliography
- Novel datasets curated from scientific publications
- Argument extraction and mining
- Challenges in information extraction from scientific articles
- Building knowledge graphs via mining scientific literature; querying
scientific knowledge graphs
- Novel tools for IE on scientific literature and interaction with users
- Mathematical information extraction
- Scientific concepts, facts extraction
- Visualizing scientific knowledge
- Bibliometric and Altmetric studies via information extraction from
scientific articles and metadata
- Information extraction from COVID-19 articles to inform public health
policy
In addition to research paper presentations, WIESP would also feature
keynote talks, a panel discussion, and a shared task. We will update the
details on our website as and when they become available. We especially
welcome participation from academic and research institutions, government
and industry labs, publishers, and information service providers. Projects
and organizations using NLP/ML techniques in their text mining and
enrichment efforts are also welcome to participate.
***Call for Papers***
We invite papers of the following categories:
***Long papers*** must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
***Short papers*** must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
Position papers will give voice to authors who wish to take a position on a
topic listed above or the field of scholarly information extraction.
Submissions need not present original work and should be two to four pages
in length, including title, text, figures and tables, and references.
***Demo papers*** should be no more than four (4) pages in length,
including references, and should describe implemented systems that are of
relevance to the theme of the workshop. Authors of demo papers should be
willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.
***Extended Abstracts*** We welcome submissions of extended abstracts (2
pages max) related to the research topics mentioned above. Submissions may
include previously published results, late-breaking results, or a
description of ongoing projects in the broad field of information
extraction and mining from scientific publications. Extended abstracts can
also summarize existing work, work in progress, or a collection of works
under a unified theme (e.g., a series of closely related papers that build
on each other or tackle a common problem).
***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall, 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysical Data System (
https://ui.adsabs.harvard.edu/).
***Important Dates***
- Paper/Abstract Submission Deadline: August 25, 2022
- Notification of workshop paper/abstract acceptance: September 25, 2022
- Camera-ready Submission Deadline: October 10, 2022
- Workshop: November 24, 2021 (online)
***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)***
***Submission Website and Format***
Submission will be via softconf. We will update the submission link shortly
on our website. Submissions should follow the ACLPUB formatting guidelines (
https://acl-org.github.io/ACLPUB/formatting.html) and template files (
https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long
and Short Papers) will be subject to a double-blind peer-review process.
Position papers, Demo papers, and Extended Abstracts need not be
anonymized. The authors will present accepted papers at the workshop either
as a talk or a poster. All accepted papers will be published in the
workshop proceedings.
We follow the same policies as AACL-IJCNLP 2022 regarding preprints and
double submissions. The anonymity period for WIESP 2022 is from July 15 to
September 25.
***Organizers***
- Tirthankar Ghosal, Charles University, CZ
- Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian,
USA
- Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA
- Robert M. Patton, Oak Ridge National Laboratory, USA
- Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA
- Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA
-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++