Transforming Healthcare with AI: Multimodal VQA Systems and LLMs

In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. Howev...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Signal Processing and Communication (Online) pp. 584 - 590
Main Authors	Vengatachalam, Sowthika Semmandampalayam, Rakkiannan, Thamilselvan
Format	Conference Proceeding
Language	English
Published	IEEE 20.02.2025
Subjects	Accuracy Biological system modeling Biomedical imaging Chain-of-thought Graph Learning Large Language Model Large language models Medical Dataset Medical services Multimodal model Prompt engineering Question answering (information retrieval) Ultrasonic imaging Visual Question Answering Visualization X-ray imaging
Online Access	Get full text
ISSN	2643-444X
DOI	10.1109/ICSC64553.2025.10968734

Cover

More Information
Summary:	In Multimodal Large Language Models (LLMs), Visual Question Answering in Medicine (VQA) is an essential activity that allows for clinically acceptable answers to questions concerning medical imagery. This could alleviate pressure on healthcare systems, especially in resource-limited countries. However, the medical VQA datasets available today are tiny, mostly focused on basic classification tasks, and devoid of semantic reasoning and clinical expertise. Our previous work proposed a VQA technique using three distinct relationship graphs-implicit, spatial, and semantic-based on the Medical-CXR-VQA dataset, mainly focuses on chest X-ray images that achieving 62% accuracy. By training an LLM technique, we improved label extraction accuracy to 80%. The labels were also thoroughly reviewed with two clinical specialists for greater accuracy. We then introduced a larger dataset, Medical VQA-RAD dataset (VQA-Radiology), which focuses on radiology images (including X-ray, CT scan, MRIs and Ultrasounds) that includes detailed inquiries about anomalies, locations, severity, and types. Based on this dataset, We created a novel chain-of-thought and prompt engineering bio-medical multi-modal VQA technique, refined from the Llama-3-8B model using the bespoke dataset. This approach combines curated and synthesized biological data, offering significant benefits to researchers, doctors, and biomedical professionals, by improving the comprehension and generation of content related to a wide range of biological topics.
ISSN:	2643-444X
DOI:	10.1109/ICSC64553.2025.10968734