Add vision only mode by gkumbhat · Pull Request #537 · foundation-model-stack/foundation-model-stack

gkumbhat · 2026-06-22T16:09:52Z

Changes

This PR adds ability to load vision part of the model from a given multimodal model. This allows ability for sendnn-inference to load the model's different parts differently for efficient processing

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>

flaviabeo

Tested locally with mistralai/Ministral-3-3B-Instruct-2512, all seems to be working.

Image size: (547, 365)

Embeddings shape: torch.Size([1, 834, 3072])
  (batch=1, seq_len=834, hidden_dim=3072)

Let me know if there are any other models of interest that would be nice to test with this mode.

flaviabeo

So, testing with ibm-granite/granite-vision-3.2-2b I am getting an error - I used the almost the same script from Mistral3 so maybe some issue related specific to granite vision?

flaviabeo · 2026-06-23T20:11:31Z

        if not isinstance(original_image_size, (list, tuple)):
            original_size = original_image_size.tolist()

        original_height, original_width = original_size


As I test with granite-vision, I got and error that traces to this line:

File "/home/flaviabeodebayser/fms/foundation-model-stack/fms/models/llava_next.py", line 270, in select_best_resolution original_height, original_width = original_size ^^^^^^^^^^^^^ UnboundLocalError: cannot access local variable 'original_size' where it is not associated with a value INFO:torch._dynamo.eval_frame:TorchDynamo attempted to trace the following frames: [ ] INFO:torch._dynamo.utils:TorchDynamo compilation metrics: Function, Runtimes (s)

this seems to be an existing bug which gets uncovered in specific scenario .. I have pushed a fix now.

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>

flaviabeo

Now granite-vision also works :)
LGTM!

gkumbhat added 9 commits June 19, 2026 18:57

📝 Update design doc to add impact to other model like llava_next

f561fb3

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

✨ Add vision only implementation to mistral and ministral3 model

5262eaf

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

✨ Add vision only mode to llava next as well

d9ecc6d

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

✅ Add and update unit tests

f92c1db

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🐛 Fix merge image vector device error

0b9579d

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

📝 Update design doc

9eab938

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🎨 Fix formatting

f4e294f

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>

🚸 Improve logging for vision only case

0733261

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🎨 Fix formatting

2d6eee5

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat force-pushed the add_vision_only_mode branch from d14e05d to 2d6eee5 Compare June 22, 2026 17:07

🐛 Fix text embedding loading for vision only case

2414c54

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat marked this pull request as ready for review June 22, 2026 18:21

gkumbhat requested a review from flaviabeo June 22, 2026 18:22

gkumbhat added 2 commits June 22, 2026 19:20

🐛 Fix filter key logging issue

1a5864e

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🎨 Fix formatting

36cc671

Signed-off-by: gkumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat mentioned this pull request Jun 23, 2026

Separate mm encoder scheduling torch-spyre/sendnn-inference#1015

Open

5 tasks

flaviabeo approved these changes Jun 23, 2026

View reviewed changes

flaviabeo requested changes Jun 23, 2026

View reviewed changes

🐛 Handle case when image_size is already a list

fe8a0ea

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>

flaviabeo approved these changes Jun 24, 2026

View reviewed changes

gkumbhat merged commit 771a197 into foundation-model-stack:main Jun 24, 2026
4 checks passed

gkumbhat deleted the add_vision_only_mode branch June 24, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vision only mode#537

Add vision only mode#537
gkumbhat merged 13 commits into
foundation-model-stack:mainfrom
gkumbhat:add_vision_only_mode

gkumbhat commented Jun 22, 2026 •

edited

Loading

Uh oh!

flaviabeo left a comment

Uh oh!

flaviabeo left a comment

Uh oh!

flaviabeo Jun 23, 2026

Uh oh!

gkumbhat Jun 23, 2026

Uh oh!

flaviabeo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gkumbhat commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

flaviabeo left a comment

Choose a reason for hiding this comment

Uh oh!

flaviabeo left a comment

Choose a reason for hiding this comment

Uh oh!

flaviabeo Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gkumbhat Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

flaviabeo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gkumbhat commented Jun 22, 2026 •

edited

Loading