Skip to content

🐛 Fix vision features attention calculation#507

Merged
flaviabeo merged 1 commit into
foundation-model-stack:mainfrom
gkumbhat:fix_mistral_vision_features
Feb 16, 2026
Merged

🐛 Fix vision features attention calculation#507
flaviabeo merged 1 commit into
foundation-model-stack:mainfrom
gkumbhat:fix_mistral_vision_features

Conversation

@gkumbhat

Copy link
Copy Markdown
Collaborator

Changes

This PR fixes the differences in vision feature we were seeing. Because of the layout, pixtral was getting considered as causal model and thus is_causal was getting set to True for doing scaled_dot_product_attention, whereas we are only using it as encoder for getting the image features in mistral3.py. Therefore, to resolve the issue, we are explicitly setting the attn_name=sdpa_bidirectional when calling vision_tower in get_image_feature function.

This makes the output of FMS look pretty similar to that of using transformers directly.

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@gkumbhat gkumbhat requested a review from flaviabeo February 16, 2026 13:12

@flaviabeo flaviabeo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@flaviabeo flaviabeo merged commit 9124614 into foundation-model-stack:main Feb 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants