Context-Guided Medical Visual Question Answering

  • Wafa Arsalane
  • , Philip Chikontwe
  • , Miguel Luna
  • , Myeongkyun Kang
  • , Sang Hyun Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Given a medical image and a question in natural language, medical VQA systems are required to predict clinically relevant answers. Integrating information from visual and textual modalities requires complex fusion techniques due to the semantic gap between images and text, as well as the diversity of medical question types. To address this challenge, we propose aligning image and text features in VQA models by using text from medical reports to provide additional context during training. Specifically, we introduce a transformer-based alignment module that learns to align the image with the textual context, thereby incorporating supplementary medical features that can enhance the VQA model’s predictive capabilities. During the inference stage, VQA operates robustly without requiring any medical report. Our experiments on the Rad-Restruct dataset demonstrate a significant impact of the proposed strategy and show promising improvements, positioning our approach as competitive with state-of-the-art methods in this task.

Original languageEnglish
Title of host publicationMedical Information Computing - First MICCAI Meets Africa Workshop, MImA 2024, and First MICCAI Student Board Workshop on Empowering Medical Information Computing and Research through Early-Career Expertise, EMERGE 2024, Held in Conjunction with MICCAI 2024, Revised Selected Papers
EditorsUdunna Anazodo, Naren Akash, Moritz Fuchs, Celia Cintas, Alessandro Crimi, Tinahse Mutsvangwa, Farouk Dako, Willam Ogallo
PublisherSpringer Science and Business Media Deutschland GmbH
Pages245-255
Number of pages11
ISBN (Print)9783031791024
DOIs
StatePublished - 2025
Event1st MICCAI Meets Africa Workshop, MImA 2024 and 1st MICCAI Student Board Workshop on Empowering Medical Information Computing and Research through Early-Career Expertise, EMERGE 2024, Held in Conjunction with MICCAI 2024 - Marrakesh, Morocco
Duration: 6 Oct 20246 Oct 2024

Publication series

NameCommunications in Computer and Information Science
Volume2240
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference1st MICCAI Meets Africa Workshop, MImA 2024 and 1st MICCAI Student Board Workshop on Empowering Medical Information Computing and Research through Early-Career Expertise, EMERGE 2024, Held in Conjunction with MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period6/10/246/10/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Keywords

  • Medical Image Interpretation
  • Medical Visual Question Answering
  • Radiology
  • VQA

Fingerprint

Dive into the research topics of 'Context-Guided Medical Visual Question Answering'. Together they form a unique fingerprint.

Cite this