LLM-IE: a python package for biomedical generative information extraction with large language models
Objectives Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete IE pipelines....
Saved in:
| Published in | JAMIA open Vol. 8; no. 2; p. ooaf012 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Oxford University Press
01.04.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2574-2531 2574-2531 |
| DOI | 10.1093/jamiaopen/ooaf012 |
Cover
| Summary: | Objectives
Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete IE pipelines.
Materials and Methods
The LLM-IE supports named entity recognition, entity attribute extraction, and relation extraction tasks. We benchmarked it on the i2b2 clinical datasets.
Results
The sentence-based prompting algorithm resulted in the best 8-shot performance of over 70% strict F1 for entity extraction and about 60% F1 for entity attribute extraction.
Discussion
We developed a Python package, LLM-IE, highlighting (1) an interactive LLM agent to support schema definition and prompt design, (2) state-of-the-art prompting algorithms, and (3) visualization features.
Conclusion
The LLM-IE provides essential building blocks for developing robust information extraction pipelines. Future work will aim to expand its features and further optimize computational efficiency.
Lay Summary
In the biomedical field, there is a significant need for processing large amounts of documents (eg, clinical notes and literature) by extracting the entities (eg, drug names, procedure names, and clinical trials), attributes (eg, drug dosage and frequency), and relations (eg, disease-treatment relations and drug-adverse event relations) into a structured format (eg, JSON, XML, or tabular). Large language models (LLMs) have shown great promise to automate such processes with minimal human labor while achieving high performance. Despite the exciting new advancement, at this point, there is no dedicated software to implement it, making the application challenging. Therefore, we publish LLM-IE, an open-source Python package that provides a general framework for LLM-based information extraction (“IE”). We introduce comprehensive APIs that include an LLM agent to help users write prompts, extractors that apply prompting algorithms to extract structured data, and visualization tools that serve web applications and render HTML. We benchmark LLM-IE with three popular clinical IE datasets and show comparable results to previous literature. Moving forward, we will improve computational performance, optimize post-processing, and continue to implement cutting-edge prompting algorithms and support emerging inference engines. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 content type line 14 ObjectType-Report-1 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 2574-2531 2574-2531 |
| DOI: | 10.1093/jamiaopen/ooaf012 |