:bug: Fix vision features attention calculation by gkumbhat · Pull Request #507 · foundation-model-stack/foundation-model-stack

gkumbhat · 2026-02-15T00:14:58Z

Changes

This PR fixes the differences in vision feature we were seeing. Because of the layout, pixtral was getting considered as causal model and thus is_causal was getting set to True for doing scaled_dot_product_attention, whereas we are only using it as encoder for getting the image features in mistral3.py. Therefore, to resolve the issue, we are explicitly setting the attn_name=sdpa_bidirectional when calling vision_tower in get_image_feature function.

This makes the output of FMS look pretty similar to that of using transformers directly.

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

flaviabeo

LGTM!

🐛 Fix vision features attention calculation

8aaf4d3

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat requested a review from flaviabeo February 16, 2026 13:12

flaviabeo approved these changes Feb 16, 2026

View reviewed changes

flaviabeo merged commit 9124614 into foundation-model-stack:main Feb 16, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 Fix vision features attention calculation#507

🐛 Fix vision features attention calculation#507
flaviabeo merged 1 commit into
foundation-model-stack:mainfrom
gkumbhat:fix_mistral_vision_features

gkumbhat commented Feb 15, 2026

Uh oh!

flaviabeo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gkumbhat commented Feb 15, 2026

Changes

Uh oh!

flaviabeo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants