Clinical analogy resolution performance for foundation language models

Villena, Fabián; Quiroga Curin, Tamara Nancy; Dunstan Escudero, Jocelyn Mariel

Clinical analogy resolution performance for foundation language models

Date

2024

Authors

Villena, Fabián

Quiroga Curin, Tamara Nancy

Dunstan Escudero, Jocelyn Mariel

Abstract

Using extensive data sources to create foundation language models has revolutionized the performance of deep learning-based architectures. This remarkable improvement has led to state-of-the-art results for various downstream NLP tasks, including clinical tasks. However, more research is needed to measure model performance intrinsically, especially in the clinical domain. We revisit the use of analogy questions as an effective method to measure the intrinsic performance of language models for the clinical domain in English. We tested multiple Transformers-based language models over analogy questions constructed from the Unified Medical Language System (UMLS), a massive knowledge graph of clinical concepts. Our results show that large language models are significantly more performant for analogy resolution than small language models. Similarly, domain-specific language models perform better than general domain language models. We also found a correlation between intrinsic and extrinsic performance, validated through PubMedQA extrinsic task. Creating clinical-specific and language-specific language models is essential for advancing biomedical and clinical NLP and will ensure a valid application in clinical practice. Finally, given that our proposed intrinsic test is based on a term graph available in multiple languages, the dataset can be built to measure the performance of models in languages other than English.

Keywords

Information systems, Language models, Applied computing, Health informatics, Computing methodologies, Natural language processing

URI

https://doi.org/10.1145/3709155
https://repositorio.uc.cl/handle/11534/102927

Collections

Artículos de revistas

Full item page