Skip to content

Add vision only mode#537

Merged
gkumbhat merged 13 commits into
foundation-model-stack:mainfrom
gkumbhat:add_vision_only_mode
Jun 24, 2026
Merged

Add vision only mode#537
gkumbhat merged 13 commits into
foundation-model-stack:mainfrom
gkumbhat:add_vision_only_mode

Conversation

@gkumbhat

@gkumbhat gkumbhat commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Changes

This PR adds ability to load vision part of the model from a given multimodal model. This allows ability for sendnn-inference to load the model's different parts differently for efficient processing

gkumbhat added 9 commits June 19, 2026 18:57
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@gkumbhat gkumbhat force-pushed the add_vision_only_mode branch from d14e05d to 2d6eee5 Compare June 22, 2026 17:07
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@gkumbhat gkumbhat marked this pull request as ready for review June 22, 2026 18:21
@gkumbhat gkumbhat requested a review from flaviabeo June 22, 2026 18:22
gkumbhat added 2 commits June 22, 2026 19:20
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>

@flaviabeo flaviabeo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally with mistralai/Ministral-3-3B-Instruct-2512, all seems to be working.

Image size: (547, 365)

Embeddings shape: torch.Size([1, 834, 3072])
  (batch=1, seq_len=834, hidden_dim=3072) 

Let me know if there are any other models of interest that would be nice to test with this mode.

@flaviabeo flaviabeo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, testing with ibm-granite/granite-vision-3.2-2b I am getting an error - I used the almost the same script from Mistral3 so maybe some issue related specific to granite vision?

Comment thread fms/models/llava_next.py
if not isinstance(original_image_size, (list, tuple)):
original_size = original_image_size.tolist()

original_height, original_width = original_size

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I test with granite-vision, I got and error that traces to this line:

  File "/home/flaviabeodebayser/fms/foundation-model-stack/fms/models/llava_next.py", line 270, in select_best_resolution
    original_height, original_width = original_size
                                      ^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'original_size' where it is not associated with a value
INFO:torch._dynamo.eval_frame:TorchDynamo attempted to trace the following frames: [

]
INFO:torch._dynamo.utils:TorchDynamo compilation metrics:
Function, Runtimes (s)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be an existing bug which gets uncovered in specific scenario .. I have pushed a fix now.

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>

@flaviabeo flaviabeo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now granite-vision also works :)
LGTM!

@gkumbhat gkumbhat merged commit 771a197 into foundation-model-stack:main Jun 24, 2026
4 checks passed
@gkumbhat gkumbhat deleted the add_vision_only_mode branch June 24, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants