Transforming Healthcare with AI: Multimodal VQA Systems and LLMs

In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. Howev...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Signal Processing and Communication (Online) pp. 584 - 590
Main Authors Vengatachalam, Sowthika Semmandampalayam, Rakkiannan, Thamilselvan
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.02.2025
Subjects
Online AccessGet full text
ISSN2643-444X
DOI10.1109/ICSC64553.2025.10968734

Cover

More Information
Summary:In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. However, the medical VQA datasets available today are tiny, mostly focused on basic classification tasks, and devoid of semantic reasoning and clinical expertise. Our previous work proposed a VQA technique using three distinct relationship graphs-implicit, spatial, and semantic-based on the Medical-CXR-VQA dataset, mainly focuses on chest X-ray images that achieving 62% accuracy. By training an LLM technique, we improved label extraction accuracy to 80%. The labels were also thoroughly reviewed with two clinical specialists for greater accuracy. We then introduced a larger dataset, Medical VQA-RAD dataset (VQA-Radiology), which focuses on radiology images (including X-ray, CT scan, MRIs and Ultrasounds) that includes detailed inquiries about anomalies, locations, severity, and types. Based on this dataset, We created a novel chain-of-thought and prompt engineering bio-medical multi-modal VQA technique, refined from the Llama-3-8B model using the bespoke dataset. This approach combines curated and synthesized biological data, offering significant benefits to researchers, doctors, and biomedical professionals, by improving the comprehension and generation of content related to a wide range of biological topics.
ISSN:2643-444X
DOI:10.1109/ICSC64553.2025.10968734