Added granite_v4.py to support 8b model by rzbhatti · Pull Request #497 · foundation-model-stack/foundation-model-stack

rzbhatti · 2025-12-11T20:45:19Z

Adds granite_v4.py to support Granite4 family of models. This PR addresses only the granite4-8b model for now.
Mamba layers are not implemented yet in this PR.

Basic test run with AFTU:

python3 ~/git/foundation-model-stack/aiu-fms-testing-utils/scripts/inference.py     \
--architecture=hf_pretrained   \
--model_path=~/models/ibm-granite/granite-4.0-8b/r251118a   \
--tokenizer=~/models/ibm-granite/granite-4.0-8b/r251118a    \
--device_type=aiu     \
--unfuse_weights     \
--compile_dynamic     \
--compile     --default_dtype=fp16     \
--fixed_prompt_length=64     \
--max_new_tokens=20     \
--timing=per-token     \
--batch_size=1

Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com>

kaoutar55

Thanks @rzbhatti for adding granite_v4 and wiring GraniteMoeHybrid* into FMS. This is the missing piece we need to start enabling Granite 4.0 8B in FMS and on AIU.

this is a good first enabling step. I have a few comments around configuration correctness (alignment with HF), future MoE support, code reuse, and tests.

kaoutar55 · 2025-12-12T03:42:43Z

    "bamba",
    "gpt_bigcode",
    "granite",
+    "granite_v4",


We have:
Architecture name: granite_v4
Class names: GraniteMoeHybrid*
HF architecture: GraniteMoeHybridForCausalLM

This creates confusion. Consider either:

Renaming classes to GraniteV4* to match the file/architecture name
Or explain in this PR why "MoeHybrid" is used for a dense model

This is done deliberately to ensure consistency, as all Granite 4.0 models use a single "superset architecture" name GraniteMoeHybridForCausalLM, for simplified model management during training and loading, even for those variants (like the non-Hybrid/non-MoE models) where the name is technically a superset of the functional architecture.
https://huggingface.co/collections/ibm-granite/granite-40-language-models

The HF transformer implementation here uses the granitemoehybrid. If we do not want to associate this architecture with version 4, a better move could be to use granitemoehybrid everywhere.

kaoutar55 · 2025-12-12T03:47:18Z

+
+
+@dataclass
+class GraniteMoeHybridConfig(ModelConfig):


If this is supposed to support MoE models (even if 8B is dense), the config lacks.
num_experts
num_experts_per_tok
router_aux_loss_coef
Expert-specific parameters

We should at least add a comment to mention that this will be addressed in a future PR or clarity that the scope of this PR is to support only the the dense 8B; MoE variants will be added later.

kaoutar55 · 2025-12-12T03:48:37Z

+    attention_multiplier=0.0078125,
+)
+
+_3_1_2b_config = GraniteMoeHybridConfig(


_3_1_2b_config is defined but not registered.
Register it: models.register_model(_architecture_name, "3.1-2b", ...)
or remove it from this PR

kaoutar55 · 2025-12-12T03:49:37Z

-            "FMS model implementations currently only support LlamaForCausalLM, GPTBigCodeForCausalLM, MixtralForCausalLM, RobertaForMaskedLM, RobertaForQuestionAnswering, RobertaForSequenceClassification, GraniteForCausalLM, MistralForCausalLM, BambaForCausalLM, SiglipModel, LlavaNextForConditionalGeneration, MPNetForMaskedLM, BertForMaskedLM, and BertForSequenceClassification"
+            "FMS model implementations currently only support LlamaForCausalLM, GPTBigCodeForCausalLM, MixtralForCausalLM, RobertaForMaskedLM, RobertaForQuestionAnswering, RobertaForSequenceClassification, GraniteForCausalLM, GraniteMoeHybridForCausalLM, MistralForCausalLM, BambaForCausalLM, SiglipModel, LlavaNextForConditionalGeneration, MPNetForMaskedLM, BertForMaskedLM, and BertForSequenceClassification"
        )



The PR should include:

Unit tests for the new model architecture

HF weight loading tests

Output verification tests

Please add docstrings for classes and key methods
Comments explaining Granite 4.0-specific features (embedding_multiplier, residual_multiplier, attention_multiplier, logits_scaling)

kaoutar55 · 2025-12-12T04:01:15Z

+            return preds
+
+
+_8b_config = GraniteMoeHybridConfig(


Can confirm that _8b_config matches the actual HF config for ibm-granite/granite-4.0-8b (or 8b-base)

Right now, some values (notably src_vocab_size=49155, pad_id=0, and max_expected_seq_len=8192) look more like the older “tiny-preview” tokenizer / config than the final Granite 4.0 configs, which generally use a 100k vocab and long contexts. I suggest to populate _8b_config directly from HF config
Use config.vocab_size and config.pad_token_id instead of hard-coding the 49k/0 pair unless this is the behavior we want here.

kaoutar55 · 2025-12-12T04:03:05Z

+        )
+
+        # hf -> fms requires a transpose operation for the query and key
+        # weight and bias parameters for Llama models


Since this is now applied to Granite 4 as well, it might be clearer to make that comment architecture-agnostic or explicitly say “Llama/Granite”.

kaoutar55 · 2025-12-12T04:05:16Z

            config, "head_dim", config.hidden_size // config.num_attention_heads
        )
+    elif architecture == "GraniteMoeHybridForCausalLM":
+        inner_dim = config.intermediate_size


inner_dim is computed and never used. We can use it to sanity-check or derive hidden_grow_factor from intermediate_size:
hidden_grow_factor = config.intermediate_size / config.hidden_size

this is actually already happening here

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

JRosenkranz · 2026-01-09T19:55:04Z

+            activation_fn=str_to_activation(self.config.activation_fn),
+            p_dropout=self.config.p_dropout,
+            use_bias=False,  # Granite 4 does not define MLP bias
+            fused=True,  # Granite 4 comes with fused weights


I don't think we have to default to fused, this can be handled through an adapter

Can you please provide more details on what kind of adapter you are referring here?

JRosenkranz

Can we add a test for correctness for this model against hf outputs?

…comment Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat · 2026-01-10T00:37:56Z

+        for pattern, repl in replacements:
+            new_name = re.sub(pattern, repl, new_name)
+
+        if "shared_mlp.input_linear.weight" in new_name:


Implemented this fused logic to let the GraniteBlock accept both fused and unfused weight.

While we could handle this with wg1_fused and allow FMS to work with fused weights, vllm-spyre actually sets fused_weights=False, which creates a problem with mixed configuration.

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

JRosenkranz

lgtm

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

* Added granite_v4.py to support 8b model Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com> * ♻️✅ Refactor for granite_v4 to inherit from granite and add unit tests Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * 🚚 Rename granite_v4 to granite_moe_hybrid Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * 🎨 Fix linting issues Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * 🎨 Fix ruff warning Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * ♻️🐛 Fix glu unit fused weight processing Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * ♻️ Move fused weights logic outside of GraniteBlock class per review comment Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * 📝 Add comment for unfusing, fused weight logic for naming function Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> * 🐛 Fix initialization issue with GraniteMoe Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> --------- Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com> Co-authored-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Added granite_v4.py to support 8b model

33de9bf

Signed-off-by: Rashed Z. Bhatti, PhD <rzbhatti@gmail.com>

rzbhatti linked an issue Dec 11, 2025 that may be closed by this pull request

Add support for GraniteMoeHybridForCausalLM architecture #496

Closed

rzbhatti requested review from ani300 and kaoutar55 December 11, 2025 20:49

gkumbhat reviewed Dec 11, 2025

View reviewed changes

Comment thread fms/models/granite_v4.py Outdated

kaoutar55 reviewed Dec 12, 2025

View reviewed changes

rzbhatti self-assigned this Jan 6, 2026

gkumbhat added 2 commits January 8, 2026 15:30

♻️✅ Refactor for granite_v4 to inherit from granite and add unit tests

88638d1

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🚚 Rename granite_v4 to granite_moe_hybrid

8aa99fb

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat mentioned this pull request Jan 8, 2026

✨ Add granite moe hybrid for causalLM support #495

Closed

gkumbhat added 4 commits January 8, 2026 16:00

Merge branch 'main' into granite-v4-moe-hybrid

97ef5e7

🎨 Fix linting issues

c3b1a6a

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🎨 Fix ruff warning

3754ee8

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

♻️🐛 Fix glu unit fused weight processing

8f01154

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

JRosenkranz reviewed Jan 9, 2026

View reviewed changes

JRosenkranz requested changes Jan 9, 2026

View reviewed changes

♻️ Move fused weights logic outside of GraniteBlock class per review …

a0ab69b

…comment Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat reviewed Jan 10, 2026

View reviewed changes

📝 Add comment for unfusing, fused weight logic for naming function

a86d855

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

JRosenkranz approved these changes Jan 12, 2026

View reviewed changes

🐛 Fix initialization issue with GraniteMoe

3a77ba6

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat force-pushed the granite-v4-moe-hybrid branch from 5f0401d to 3a77ba6 Compare January 12, 2026 19:38

JRosenkranz merged commit 3be7476 into main Jan 12, 2026
4 checks passed

gkumbhat deleted the granite-v4-moe-hybrid branch January 12, 2026 20:36

Uh oh!

Conversation

rzbhatti commented Dec 11, 2025

Uh oh!

Uh oh!

kaoutar55 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants