Skip to content

🐛 Fix pad token id resolution in padding function#512

Merged
Ssukriti merged 4 commits into
foundation-model-stack:mainfrom
gkumbhat:fix_pad_token_id_resolution
Apr 29, 2026
Merged

🐛 Fix pad token id resolution in padding function#512
Ssukriti merged 4 commits into
foundation-model-stack:mainfrom
gkumbhat:fix_pad_token_id_resolution

Conversation

@gkumbhat

@gkumbhat gkumbhat commented Mar 3, 2026

Copy link
Copy Markdown
Collaborator

Changes

Currently pad_input_ids doesn't take into account padding token id for model and assumes it to be 0 for all users. However, this can sometimes be not the case, for example for mistral-3 models, the padding token id is actually 11. This can cause very confusing problems. This PR fixes it by taking optional pad_token_id parameter and using that instead of doing torch.zeros

gkumbhat added 2 commits March 3, 2026 16:28
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@gkumbhat gkumbhat marked this pull request as ready for review March 31, 2026 18:30
@gkumbhat gkumbhat requested a review from flaviabeo April 3, 2026 17:50
Comment thread fms/utils/generation.py
use_cache: bool = False,
contiguous_cache: bool = False,
eos_token_id: Optional[int] = None,
pad_token_id: Optional[int] = None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used in generate @gkumbhat ?

@Ssukriti

Copy link
Copy Markdown
Contributor

can you update PR @gkumbhat ?

@gkumbhat

Copy link
Copy Markdown
Collaborator Author

@Ssukriti yep. I'll update the PR

Comment thread fms/utils/generation.py
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@Ssukriti Ssukriti merged commit 476e18e into foundation-model-stack:main Apr 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants