Model Confusion Turns AI Model Loading Into a Supply-Chain Attack Surface
Model Confusion is a naming-based AI supply-chain attack that allows code intended to load a local model to silently fetch a malicious model from a public registry instead. Disclosed by Checkmarx, the issue exploits ambiguous model-loading behavior in widely used frameworks and can result in remote code execution or silent model compromise. The risk exists in application code paths that many enterprises already run in notebooks, pipelines, and internal tooling.
Scenario: A data scientist copies a notebook from an internal repository to fine-tune a sentiment analysis model. The notebook references a local path: checkpoints/sentiment-v2. On her machine, the directory exists and the code runs as expected. A colleague clones the same notebook but skips downloading the model artifacts. When he runs the code, the framework finds no local directory, interprets the path as a Hugging Face Hub identifier, and downloads a model from a public repository with a matching name. The model loads without error. If trust_remote_code=True is set, attacker-controlled code executes silently. If not, the application now runs on a model no one intended to use. Neither developer receives a warning.
Modern machine learning frameworks reduce friction by allowing a single identifier to reference either a local directory or a model hosted in a public registry. In the Hugging Face ecosystem, APIs such as from_pretrained() support loading models from a local path or from the Hub using the same string.
This flexibility becomes dangerous when code assumes a local model exists but does not enforce local-only resolution. If the expected directory is missing or misnamed, the library can fall back to fetching a model from a public registry with a matching <user>/<model> pattern. This mirrors dependency confusion in traditional software supply chains, but at the model layer rather than the package layer.
How the Attack Works
Model Confusion relies on a predictable sequence of behaviors:
A developer writes code that loads a model using a relative path such as checkpoints/some-model.
The local directory does not exist, is misnamed, or is absent in the execution environment.
The string matches a valid public registry namespace and model name.
The framework resolves the identifier remotely and downloads the model from the public registry.
If trust_remote_code=True is set, attacker-controlled Python code in the model repository executes during loading.
If trust_remote_code=False, no arbitrary code executes, but the application silently loads a compromised or backdoored model instead of the intended local one.
No exploit chain is required. There is no social engineering, no privilege escalation, and no network intrusion. The attack succeeds through name resolution alone.
Analysis
Model Confusion exposes a structural weakness in how AI tooling balances convenience and safety. Model-loading APIs were designed to fail open to simplify experimentation and reuse. In enterprise environments, that design choice becomes a liability.
The attack does not exploit a flaw in the underlying framework. It exploits a mismatch between developer intent and actual resolution behavior. A missing directory, an incomplete repository clone, or a copied notebook that omits local artifacts is enough to redirect execution to the public internet.
This matters now because enterprises are operationalizing AI faster than they are standardizing ML development practices. Fine-tuned models are commonly stored in generic directories, reused across teams, and loaded through shared examples without consistent validation. In these conditions, Model Confusion turns routine development patterns into a supply-chain exposure that traditional security controls are not designed to detect.
Unlike earlier concerns about malicious models, this attack does not require users to knowingly download or trust a suspicious artifact. The model is fetched automatically as part of normal execution. When combined with trust_remote_code=True, the boundary between configuration and executable code disappears.
Implications for Enterprises
Model Confusion requires reassessment of how models are treated in enterprise systems.
Model loading becomes a security boundary Model identifiers, paths, and resolution rules must be treated as trust decisions, not convenience features. Some organizations mitigate remote code execution risk by standardizing on non-executable model formats such as Safetensors, though this does not address model poisoning or integrity risks.
Public model registries function like package ecosystems Model hubs now carry risks analogous to dependency confusion, typosquatting, and malicious uploads. Model provenance must be managed with the same rigor applied to third-party libraries.
Controls must move earlier in the lifecycle Network monitoring and runtime detection are insufficient. Mitigations need to exist at development time through explicit local-only enforcement, coding standards, and static analysis of model-loading paths. For internal-only pipelines, setting local_files_only=True in model-loading calls prevents remote fallback entirely.
Path syntax determines resolution behavior A “naked” relative path like models/my-v1 is vulnerable because it resembles a Hub namespace. A path prefixed with ./ such as ./models/my-v1 is explicitly local and will not trigger remote resolution.
Operational blast radius increases through reuse Shared notebooks, internal templates, and CI pipelines can all inherit the same ambiguous loading pattern, allowing a single unsafe convention to propagate widely.
Automated pipelines amplify risk CI/CD or retraining pipelines that pull unpinned or “latest” model references can increase exposure to Model Confusion.
This pattern affects notebooks, batch pipelines, CI jobs, and internal tools equally, wherever model identifiers are resolved dynamically.
Risks and Open Questions
Several risks remain unresolved despite available mitigations:
Namespace governance remains ad hoc There is no systematic way to identify, reserve, or protect high-risk directory names that commonly collide with model-loading paths.
Model integrity lacks standardized signals Models do not yet have widely adopted signing, attestation, or SBOM-style metadata that enterprises can verify at load time.
Trust controls are overly coarse Boolean flags such as trust_remote_code can authorize more code than a developer intends when name resolution shifts from local to remote. Some advanced models require custom logic enabled via trust_remote_code, creating a practical tradeoff between functionality and security.
Detection capabilities are limited Automated detection of malicious model behavior or embedded execution logic remains an open research problem.
Enterprise exposure is not well measured While vulnerable enterprise code has been identified, there is no comprehensive data on how widespread this pattern is in production environments.
Further Reading
Checkmarx disclosure on Model Confusion
Hugging Face from_pretrained() documentation
Research on malicious models in public registries
NIST guidance on AI supply-chain risk