Abstract: Following the digitization of health reports into electronic health records (EHR), the need for techniques or algorithms to extract and leverage the information within these records has significantly increased. Natural language processing (NLP), specifically subfields such as named entity recognition (NER) and named entity linking (NEL) to different standards, is one of the main solutions. However, due to the nature of these records, which are often not shared due to privacy concerns, the available samples are scarce or virtually non-existent. In recent years, several shared tasks, mainly carried out by Barcelona Supercomputing Center (BSC), have emerged, providing researchers with new annotated records to improve the state of the art in NER and NEL. With this increase in data, NER has managed to accurately detect entities present in domain-specific texts. Despite this, knowledge bases (KBs) used as annotation standards, such as SNOMED-CT or the Unified Medical Language System (UMLS), are vast, with the latter encompassing over 3.5 million concepts in Spanish. This, combined with the diversity and heterogeneity of language, whether focusing on a single language or adding complexity by considering multiple languages, renders traditional approaches like classification ineffective. This has led me to explore alternative solutions that maximize performance in this series of tasks through the use of contrastive learning, the enrichment of knowledge bases, and their potential combinations with more traditional approaches. In this talk, I will explain the developments in natural language processing, mainly in named entity recognition and subsequent linking, with the latest advancements in the clinical domain. Furthermore, I’ll delve into how these innovations are poised to revolutionize the accessibility and utility of health data, paving the way for more personalized and efficient healthcare solutions.
Bio: Fernando is a Ph.D. candidate at University of Malaga, and member of the Computational Intelligence in Biomedicine (ICB). His work focuses on extracting information present in electronic health records and leveraging it to improve decision-making in the medical sector. In this regard, natural language processing techniques such as named entity recognition or linking to different standards are essential.