LLM-IE: a python package for biomedical generative information extraction with large language models

Objectives Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete IE pipelines....

Full description

Saved in:
Bibliographic Details
Published inJAMIA open Vol. 8; no. 2; p. ooaf012
Main Authors Hsu, Enshuo, Roberts, Kirk
Format Journal Article
LanguageEnglish
Published United States Oxford University Press 01.04.2025
Subjects
Online AccessGet full text
ISSN2574-2531
2574-2531
DOI10.1093/jamiaopen/ooaf012

Cover

More Information
Summary:Objectives Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete IE pipelines. Materials and Methods The LLM-IE supports named entity recognition, entity attribute extraction, and relation extraction tasks. We benchmarked it on the i2b2 clinical datasets. Results The sentence-based prompting algorithm resulted in the best 8-shot performance of over 70% strict F1 for entity extraction and about 60% F1 for entity attribute extraction. Discussion We developed a Python package, LLM-IE, highlighting (1) an interactive LLM agent to support schema definition and prompt design, (2) state-of-the-art prompting algorithms, and (3) visualization features. Conclusion The LLM-IE provides essential building blocks for developing robust information extraction pipelines. Future work will aim to expand its features and further optimize computational efficiency. Lay Summary In the biomedical field, there is a significant need for processing large amounts of documents (eg, clinical notes and literature) by extracting the entities (eg, drug names, procedure names, and clinical trials), attributes (eg, drug dosage and frequency), and relations (eg, disease-treatment relations and drug-adverse event relations) into a structured format (eg, JSON, XML, or tabular). Large language models (LLMs) have shown great promise to automate such processes with minimal human labor while achieving high performance. Despite the exciting new advancement, at this point, there is no dedicated software to implement it, making the application challenging. Therefore, we publish LLM-IE, an open-source Python package that provides a general framework for LLM-based information extraction (“IE”). We introduce comprehensive APIs that include an LLM agent to help users write prompts, extractors that apply prompting algorithms to extract structured data, and visualization tools that serve web applications and render HTML. We benchmark LLM-IE with three popular clinical IE datasets and show comparable results to previous literature. Moving forward, we will improve computational performance, optimize post-processing, and continue to implement cutting-edge prompting algorithms and support emerging inference engines.
Bibliography:SourceType-Scholarly Journals-1
content type line 14
ObjectType-Report-1
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:2574-2531
2574-2531
DOI:10.1093/jamiaopen/ooaf012