Ngoc Tram Luu,
"A recommender system for research papers in medicine at Bloom Diagnostics"
, in Masterarbeit am Institut für Wirtschaftsinformatik - Data & Knowledge Engineering, Betreuung: Assoz. Univ.-Prof. Mag. Dr. Christoph G. Schütz, unter Anleitung von Simon Staudinger, MSc, 8-2023
A recommender system for research papers in medicine at Bloom Diagnostics
Sprache des Titels:
The medical field is rapidly evolving due to advancements in Artifical Intelligence, which allows machines to perform cognitive activities to achieve specific objectives using data as input. As a result, there is a growing demand for text-mining methods to extract useful insights from vast volumes of medical textual data. However, the application of Natural Language Processing techniques in medicine faces several challenges, including the need to adapt to medical terminologies and the differences between ordinary corpora and medical corpora. Deep learning approaches have made advances in text mining methods feasible, but they still face challenges such as the difficulties of scaling efficiently and the lack of domainspecific data.
An actual use case of Bloom Diagnostics GmbH, a start-up in the field of digital health that offers home access to blood tests and health advice, served as the basis for the thesis. In order to validate suggestions to users provided based on the blood tests, the medical team must review a vast amount of relevant scientific records to ensure accurate suggestions to their users. A systematic literature search process is required by the team to extract valuable information, which is time-consuming and manual. With the purpose of accelerating the literature search process, the thesis proposes a general concept for building a recommender system for research papers in the medical field, and focuses on ranking/ re-ranking passages based on their relevance to natural language questions.
Various NLP models, including BM25, BERT, and BioBERT, are compared to determine the most efficient setting. Apart from BM25 and BERT, which can already perform ranking/ re-ranking tasks, BioBERT needs to be fine-tuned separately with medical data so the model is comparable with other ones. By comparing and contrasting those models in different combinations and settings and evaluating the results using various criteria, the main contribution of this thesis is a solution that helps researchers save time by recommending the most relevant passages returned by these techniques. The recommender system, which focuses on recommendations of passages relevant for natural language questions, is built using the current literature search web application for medical researchers at Bloom Diagnostics.