Skip to content

Added granite_v4.py to support 8b model#497

Merged
JRosenkranz merged 10 commits into
mainfrom
granite-v4-moe-hybrid
Jan 12, 2026
Merged

Added granite_v4.py to support 8b model#497
JRosenkranz merged 10 commits into
mainfrom
granite-v4-moe-hybrid

Conversation

@rzbhatti

Copy link
Copy Markdown
Contributor

Adds granite_v4.py to support Granite4 family of models. This PR addresses only the granite4-8b model for now.
Mamba layers are not implemented yet in this PR.

Basic test run with AFTU:

python3 ~/git/foundation-model-stack/aiu-fms-testing-utils/scripts/inference.py     \
--architecture=hf_pretrained   \
--model_path=~/models/ibm-granite/granite-4.0-8b/r251118a   \
--tokenizer=~/models/ibm-granite/granite-4.0-8b/r251118a    \
--device_type=aiu     \
--unfuse_weights     \
--compile_dynamic     \
--compile     --default_dtype=fp16     \
--fixed_prompt_length=64     \
--max_new_tokens=20     \
--timing=per-token     \
--batch_size=1

Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com>
@rzbhatti rzbhatti linked an issue Dec 11, 2025 that may be closed by this pull request
@rzbhatti rzbhatti requested review from ani300 and kaoutar55 December 11, 2025 20:49
Comment thread fms/models/granite_v4.py Outdated

@kaoutar55 kaoutar55 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rzbhatti for adding granite_v4 and wiring GraniteMoeHybrid* into FMS. This is the missing piece we need to start enabling Granite 4.0 8B in FMS and on AIU.

this is a good first enabling step. I have a few comments around configuration correctness (alignment with HF), future MoE support, code reuse, and tests.

Comment thread fms/models/granite_v4.py Outdated
Comment thread fms/models/__init__.py Outdated
"bamba",
"gpt_bigcode",
"granite",
"granite_v4",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have:
Architecture name: granite_v4
Class names: GraniteMoeHybrid*
HF architecture: GraniteMoeHybridForCausalLM

This creates confusion. Consider either:

Renaming classes to GraniteV4* to match the file/architecture name
Or explain in this PR why "MoeHybrid" is used for a dense model

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done deliberately to ensure consistency, as all Granite 4.0 models use a single "superset architecture" name GraniteMoeHybridForCausalLM, for simplified model management during training and loading, even for those variants (like the non-Hybrid/non-MoE models) where the name is technically a superset of the functional architecture.
https://huggingface.co/collections/ibm-granite/granite-40-language-models

The HF transformer implementation here uses the granitemoehybrid. If we do not want to associate this architecture with version 4, a better move could be to use granitemoehybrid everywhere.

Comment thread fms/models/granite_v4.py Outdated


@dataclass
class GraniteMoeHybridConfig(ModelConfig):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is supposed to support MoE models (even if 8B is dense), the config lacks.
num_experts
num_experts_per_tok
router_aux_loss_coef
Expert-specific parameters

We should at least add a comment to mention that this will be addressed in a future PR or clarity that the scope of this PR is to support only the the dense 8B; MoE variants will be added later.

Comment thread fms/models/granite_v4.py Outdated
attention_multiplier=0.0078125,
)

_3_1_2b_config = GraniteMoeHybridConfig(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_3_1_2b_config is defined but not registered.
Register it: models.register_model(_architecture_name, "3.1-2b", ...)
or remove it from this PR

Comment thread fms/models/hf/utils.py Outdated
"FMS model implementations currently only support LlamaForCausalLM, GPTBigCodeForCausalLM, MixtralForCausalLM, RobertaForMaskedLM, RobertaForQuestionAnswering, RobertaForSequenceClassification, GraniteForCausalLM, MistralForCausalLM, BambaForCausalLM, SiglipModel, LlavaNextForConditionalGeneration, MPNetForMaskedLM, BertForMaskedLM, and BertForSequenceClassification"
"FMS model implementations currently only support LlamaForCausalLM, GPTBigCodeForCausalLM, MixtralForCausalLM, RobertaForMaskedLM, RobertaForQuestionAnswering, RobertaForSequenceClassification, GraniteForCausalLM, GraniteMoeHybridForCausalLM, MistralForCausalLM, BambaForCausalLM, SiglipModel, LlavaNextForConditionalGeneration, MPNetForMaskedLM, BertForMaskedLM, and BertForSequenceClassification"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR should include:

  • Unit tests for the new model architecture
  • HF weight loading tests
  • Output verification tests

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstrings for classes and key methods
Comments explaining Granite 4.0-specific features (embedding_multiplier, residual_multiplier, attention_multiplier, logits_scaling)

Comment thread fms/models/granite_v4.py Outdated
return preds


_8b_config = GraniteMoeHybridConfig(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm that _8b_config matches the actual HF config for ibm-granite/granite-4.0-8b (or 8b-base)

Right now, some values (notably src_vocab_size=49155, pad_id=0, and max_expected_seq_len=8192) look more like the older “tiny-preview” tokenizer / config than the final Granite 4.0 configs, which generally use a 100k vocab and long contexts. I suggest to populate _8b_config directly from HF config
Use config.vocab_size and config.pad_token_id instead of hard-coding the 49k/0 pair unless this is the behavior we want here.

Comment thread fms/models/granite_v4.py Outdated
)

# hf -> fms requires a transpose operation for the query and key
# weight and bias parameters for Llama models

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now applied to Granite 4 as well, it might be clearer to make that comment architecture-agnostic or explicitly say “Llama/Granite”.

Comment thread fms/models/hf/utils.py Outdated
config, "head_dim", config.hidden_size // config.num_attention_heads
)
elif architecture == "GraniteMoeHybridForCausalLM":
inner_dim = config.intermediate_size

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner_dim is computed and never used. We can use it to sanity-check or derive hidden_grow_factor from intermediate_size:
hidden_grow_factor = config.intermediate_size / config.hidden_size

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually already happening here

@rzbhatti rzbhatti self-assigned this Jan 6, 2026
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Comment thread fms/models/granite_moe_hybrid.py Outdated
activation_fn=str_to_activation(self.config.activation_fn),
p_dropout=self.config.p_dropout,
use_bias=False, # Granite 4 does not define MLP bias
fused=True, # Granite 4 comes with fused weights

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to default to fused, this can be handled through an adapter

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide more details on what kind of adapter you are referring here?

@JRosenkranz JRosenkranz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for correctness for this model against hf outputs?

…comment

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
for pattern, repl in replacements:
new_name = re.sub(pattern, repl, new_name)

if "shared_mlp.input_linear.weight" in new_name:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this fused logic to let the GraniteBlock accept both fused and unfused weight.

While we could handle this with wg1_fused and allow FMS to work with fused weights, vllm-spyre actually sets fused_weights=False, which creates a problem with mixed configuration.

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

@JRosenkranz JRosenkranz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
@gkumbhat gkumbhat force-pushed the granite-v4-moe-hybrid branch from 5f0401d to 3a77ba6 Compare January 12, 2026 19:38
@JRosenkranz JRosenkranz merged commit 3be7476 into main Jan 12, 2026
4 checks passed
@gkumbhat gkumbhat deleted the granite-v4-moe-hybrid branch January 12, 2026 20:36
alex-jw-brooks pushed a commit to alex-jw-brooks/foundation-model-stack that referenced this pull request Feb 4, 2026
* Added granite_v4.py to support 8b model

Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com>

* ♻️✅ Refactor for granite_v4 to inherit from granite and add unit tests

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* 🚚 Rename granite_v4 to granite_moe_hybrid

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* 🎨 Fix linting issues

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* 🎨 Fix ruff warning

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* ♻️🐛 Fix glu unit fused weight processing

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* ♻️ Move fused weights logic outside of GraniteBlock class per review comment

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* 📝 Add comment for unfusing, fused weight logic for naming function

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* 🐛 Fix initialization issue with GraniteMoe

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

---------

Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Co-authored-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for GraniteMoeHybridForCausalLM architecture

4 participants