Transforming Healthcare with AI: Multimodal VQA Systems and LLMs
In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. Howev...
Saved in:
Published in | International Conference on Signal Processing and Communication (Online) pp. 584 - 590 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.02.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2643-444X |
DOI | 10.1109/ICSC64553.2025.10968734 |
Cover
Summary: | In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. However, the medical VQA datasets available today are tiny, mostly focused on basic classification tasks, and devoid of semantic reasoning and clinical expertise. Our previous work proposed a VQA technique using three distinct relationship graphs-implicit, spatial, and semantic-based on the Medical-CXR-VQA dataset, mainly focuses on chest X-ray images that achieving 62% accuracy. By training an LLM technique, we improved label extraction accuracy to 80%. The labels were also thoroughly reviewed with two clinical specialists for greater accuracy. We then introduced a larger dataset, Medical VQA-RAD dataset (VQA-Radiology), which focuses on radiology images (including X-ray, CT scan, MRIs and Ultrasounds) that includes detailed inquiries about anomalies, locations, severity, and types. Based on this dataset, We created a novel chain-of-thought and prompt engineering bio-medical multi-modal VQA technique, refined from the Llama-3-8B model using the bespoke dataset. This approach combines curated and synthesized biological data, offering significant benefits to researchers, doctors, and biomedical professionals, by improving the comprehension and generation of content related to a wide range of biological topics. |
---|---|
ISSN: | 2643-444X |
DOI: | 10.1109/ICSC64553.2025.10968734 |