In this post, we will explain how to adapt MiniGPT-4 with bounding box prediction capabilities to detect tumors across four MRI modalities. We’ll walk through the process of creating a custom dataloader, training the model, and evaluating it.

Fusion Model

The goal is to train a Large Language Model (LLM) that accepts four MRI image inputs and predicts bounding box coordinates for tumor locations. Each image corresponds to a different MRI modality.

For training, we will use prompts such as:

"<Img1><ImageHere></Img1><Img2><ImageHere></Img2><Img3><ImageHere></Img3><Img4><ImageHere></Img4> where is the tumor?"

We will introduce the ‘' token to predict the bounding box coordinates. The model will output the coordinates [maxx, maxy, minx, miny].

Example ‘summary.jsonl’

{
  "t2f Sagittal_path": "BraTS-GLI-00000-000_2_0.jpg",
  "t2w Sagittal_path": "BraTS-GLI-00000-000_2_2.jpg",
  "t1n Sagittal_path": "BraTS-GLI-00000-000_2_3.jpg",
  "t1c Sagittal_path": "BraTS-GLI-00000-000_2_4.jpg",
  "box": [[96, 84, 68, 68]]
}

This file links each modality image with the corresponding bounding box coordinates.

Code Implementation

Adapting MiniGPT-4 (Vicuna 7B) for Tumor Detection with Bounding Box Prediction Using 4 MRI Modalities

Fusion Model

This file links each modality image with the corresponding bounding box coordinates.

Code Implementation

Enjoy Reading This Article?