Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents

Published:

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, Karin Verspoor (2020) Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. In Proceedings of CLEF 2020: 237—254.

@InProceedings{10.1007/978-3-030-58219-7_18,
author="He, Jiayuan
and Nguyen, Dat Quoc
and Akhondi, Saber A.
and Druckenbrodt, Christian
and Thorne, Camilo
and Hoessel, Ralph
and Afzal, Zubair
and Zhai, Zenan
and Fang, Biaoyan
and Yoshikawa, Hiyori
and Albahem, Ameer
and Cavedon, Lawrence
and Cohn, Trevor
and Baldwin, Timothy
and Verspoor, Karin",
editor="Arampatzis, Avi
and Kanoulas, Evangelos
and Tsikrika, Theodora
and Vrochidis, Stefanos
and Joho, Hideo
and Lioma, Christina
and Eickhoff, Carsten
and N{\'e}v{\'e}ol, Aur{\'e}lie
and Cappellato, Linda
and Ferro, Nicola",
title="Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents",
booktitle="Experimental IR Meets Multilinguality, Multimodality, and Interaction",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="237--254",
abstract="In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. Using the ChEMU corpus of 1500 ``snippets'' (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 addresses chemical named entity recognition, the identification of chemical compounds and their specific roles in chemical reactions. Task 2 focuses on event extraction, the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than our baseline methods.",
isbn="978-3-030-58219-7"
}

Abstract

In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. Using the ChEMU corpus of 1500 ``snippets’’ (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 addresses chemical named entity recognition, the identification of chemical compounds and their specific roles in chemical reactions. Task 2 focuses on event extraction, the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than our baseline methods.