Skip to content

Refactor HF Config -> FMS Model Kwarg Building#494

Merged
JRosenkranz merged 18 commits into
foundation-model-stack:mainfrom
alex-jw-brooks:refactor_kwarg_building
Jan 7, 2026
Merged

Refactor HF Config -> FMS Model Kwarg Building#494
JRosenkranz merged 18 commits into
foundation-model-stack:mainfrom
alex-jw-brooks:refactor_kwarg_building

Conversation

@alex-jw-brooks

Copy link
Copy Markdown
Contributor

Breaks up the big HF -> FMS _map_model_config chain into a global registry with more well-patterned & reusable param builders. After this change, when adding a new model architecture, we should:

  1. Add a new build_<new_arch>_params callable, which takes as input the HF config, and return the dict of FMS config_params.
  2. Register the HF -> FMS architecture and builder func with the _FMS_MODEL_REGISTRY in fms/models/hf/config_utils/__init__.py.
  3. Add an example model to the test config mapping in Add Tests for FMS Model Kwarg Correctness #493 and run the file (currently as main, although probably should be consolidated into the internal testing sub package, or wrapped in scripts later on) generate the corresponding JSON dict for the params.
  4. Add the file, which can be run with those tests ^ to ensure future alignment in the produced kwargs.

In some cases, some architectures may have one or two extra kwargs that need to be pulled out of the common config based on the task, e.g., classification for Bert / Roberta - currently these cases are pretty limited, so for now, this is handled by setting the a new flag, (i.e., is_classify) with a default to the more common case, and wrapping it in a partial when mapping the less common case in the registry, which allows us to patch the kwarg while keeping the signature standardized. Hopefully this will make things easier to look at!

In the future, I think there are a couple of things that could be added to make this more useful as well:

  • We should support recursive construction of params for models like granite vision in a more generic way; it's still hardcode to granite / siglip since the llava next builder just calls these directly
  • It would be nice to add some utils for mapping model config classes <-> param dicts, which is a thin layer, but can be a point of awkwardness with the recursive case, since we need to make the model config classes for the subconfigs at the moment 🙂

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
hf_arch_name, *fms_info
)

def register_model_arch_info(self, hf_arch_name, fms_arch_name, param_builder):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring for this method as well as type hints

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added doc strings and type hints for everything 🙂


class ModelConfigRegistry:
"""Wrapper class that handles converting hf config -> FMS kwargs."""
def __init__(self, registry_map=None):

@JRosenkranz JRosenkranz Nov 25, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type hint for registry_map. In the docs include what the mapping for registry_map is


class ModelConfigRegistry:
"""Wrapper class that handles converting hf config -> FMS kwargs."""
def __init__(self, registry_map=None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the registry_map optional here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making explicit why registry_map is optional in the ModelConfigRegistry constructor. If we don’t expect external callers to omit it, mark it as non-optional and remove the guard.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I made it optional because I wanted the registry to support out of tree use-cases in case we end up needing it one day for whatever reason, and the map was saved directly instead of split apart (i.e., was None to not have the ref type as a default value).

I agree though - I'll make it not optional and remove the guard, since ultimately someone could just pass an empty dict if they want to do the same thing 🙂

self.model_param_builders[hf_arch_name] = param_builder
self.model_arch_mappings[hf_arch_name] = fms_arch_name

def map_hf_to_fms_arch(self, architecture):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring for this method as well as type hints

return fms_arch
raise KeyError(f"HF architecture {architecture} is unsupported! Registered architectures: {list(self.model_arch_mappings.keys())}")

def map_hf_arch_to_fms_params(self, architecture, config):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type hints for this, this will help to differential the hf config from fms

Comment thread fms/models/hf/config_utils/__init__.py Outdated
@@ -0,0 +1,22 @@
from functools import partial
from fms.models.hf.config_utils.config_registry import ModelConfigRegistry
from fms.models.hf.config_utils.param_builders import *

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace the star import with explicit imports (e.g., import the builder functions individually or import the module as a namespace).
This will improve readability, auto-complete, and static analysis across the stack.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - done 🙂


class ModelConfigRegistry:
"""Wrapper class that handles converting hf config -> FMS kwargs."""
def __init__(self, registry_map=None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making explicit why registry_map is optional in the ModelConfigRegistry constructor. If we don’t expect external callers to omit it, mark it as non-optional and remove the guard.

Comment thread fms/models/hf/utils.py
if linear_config:
config_params["linear_config"] = linear_config

config_params = _FMS_MODEL_CONFIG_REGISTRY.hf_config_to_fms_config_params(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _map_model_config is fully removed and replaced with registry-based param builders, can we confirm backward compatibility across all previously supported models (Granite, Mistral, Mixtral, Bamba, Roberta, LlavaNext, MPNet, etc.)?
A short regression test or note confirming parity would help ensure we don't have a regression.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I have added a test for regression testing here, just in a separate PR to make the review easier: #493.

This PR has one example for every model architecture and validates that infer_model_configuration produces the same config param kwargs for each model arch, and I used it to validate the correctness of this refactor. We can also add new model architectures to this test as we go to prevent future regressions in the model param 😄

if architecture in self.model_arch_mappings:
fms_arch = self.model_arch_mappings[architecture]
return fms_arch
raise KeyError(f"HF architecture {architecture} is unsupported! Registered architectures: {list(self.model_arch_mappings.keys())}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before raising the KeyError, we might want to emit a warning log (logger.warning).
This makes debugging easier when users load partially-supported or custom HF configs.

@alex-jw-brooks alex-jw-brooks Nov 27, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me - I changed the two lookups to grab the arch & param builder to warnings, and moved the error to throw it after one or both warnings would log in case those mappings somehow get misaligned

"tie_heads": config.tie_word_embeddings,
}
# Should not have overlap
assert not any(common_params) in config_params

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this is an issue: any(common_params) returns a bool, so this check does not validate key collisions.
Should be:
assert not any(k in config_params for k in common_params).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 didn't realize that I left that in there, and nice catch, thanks! Rewrote the validation to only raise if there are intersecting keys whose values are conflicting, so it'll raise something like ValueError: Model param builder uses common params, but has conflicting values for key(s) ['src_vocab_size'] instead of a random assert

@@ -0,0 +1,22 @@
from functools import partial

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a top-level docstring explaining how HF → FMS config mapping works via the registry?
This is now a central extension point, so documenting “how to register a new model” would be very helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Added an in depth explanation for how to map the params here, and can also pull it out into more findable docs if we end up wanting steps for how to implement and test new models in general later on 😄

# If the recevied the outer siglip model config, pass only the vision
# encoder config, because we do not care about the text encoder here.
if hasattr(config, "vision_config"):
config = config.vision_config

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mutates the meaning of config inside the function and may cause confusion or accidental override.
Better to have:
vision_cfg = config.vision_config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 🙂 changed and added a bit of clarification in the doctstring to make it more explicit that the wrapping config (i.e., siglipconfig / llama next config) should be passed, and not the already unwrapped siglipvisionconfig

raise KeyError(f"HF architecture {architecture} is unsupported! Registered architectures: {list(self.model_arch_mappings.keys())}")

def hf_config_to_fms_config_params(self, config, model_path):
architecture = config.architectures[0]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HF configs sometimes include multiple architectures.
Should we validate or warn when len(config.architectures) > 1?
Relying on index 0 may not always be correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Currently we do already do this (here) though, so the behavior is unchanged in this PR.

For now, I added a "The provided HF config supports multiple architectures; only the first, <ARCH_NAME>, will be used in building the FMS model's config params" warning to make the current behavior more clear!

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@alex-jw-brooks alex-jw-brooks force-pushed the refactor_kwarg_building branch from 340feee to 00f6db8 Compare November 26, 2025 11:37
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@alex-jw-brooks alex-jw-brooks force-pushed the refactor_kwarg_building branch from 6dd8b86 to e7cf3c9 Compare November 27, 2025 10:55
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@alex-jw-brooks alex-jw-brooks force-pushed the refactor_kwarg_building branch from e7cf3c9 to 360d7cc Compare November 27, 2025 10:57
@alex-jw-brooks

Copy link
Copy Markdown
Contributor Author

Thanks for the quick reviews @JRosenkranz and @kaoutar55! I think things should be ready for another look when you have a moment

Comment thread fms/models/hf/config_utils/__init__.py Outdated

@JRosenkranz JRosenkranz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

alex-jw-brooks and others added 3 commits December 18, 2025 08:53
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
@JRosenkranz JRosenkranz merged commit 43eaec0 into foundation-model-stack:main Jan 7, 2026
4 checks passed
alex-jw-brooks added a commit to alex-jw-brooks/foundation-model-stack that referenced this pull request Feb 4, 2026
…k#494)

* refactor hf utils to split kwarg building

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* split fms kwarg builders -> separate utils

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* refactor into global registry for model configs

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* run formatting

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* no wild imports in builder imports

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* raise conflicting builder/common params values

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Make registry map required

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* dont shadow config var name in siglip builder

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* warn if we have multiple archs

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add guidelines for mapping new architectures

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add docstrings and type hints

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* add config util types

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* move keyerror further in registry mapping

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add builder docstrings, pull out wrapped imports

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* clarify siglip param builder

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* fix mypy

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* fix tests, make registry private

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

---------

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants