Model authenticity and transparency with Sigstore

by | Apr 10, 2025 | AI, Trust

What is the Sigstore model transparency project?

Sigstore’s Model Transparency project is a Sigstore community project aimed at applying the software supply chain security practice of signing to machine learning (ML) models. Hosted on Github at sigstore/model-transparency, this project leverages Sigstore’s concepts, infrastructure, and tooling to sign ML models, helping to verify tamper resistance after model training. The project’s goals are to:

  • protect model integrity through the signing of models with Sigstore by producing a cryptographic hash to detect tampering when signatures are verified, confirming that the model you use is exactly what was intended by its creators.
  • offer transparency by using Sigstore’s transparency log component, Rekor, to record signing events, enabling anyone to verify the model’s signature against the log.
  • make it easy to use when compared to traditional signing mechanisms like GPG by simplifying the signing process and offering a keyless signing option to reduce complexity and ultimately make the process user-friendly for developers and security professionals.

Note: Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future. 

What is model signing?

Model signing is the cryptographic signing of an ML model to ensure:

  • Integrity by verifying the model and datasets have not been tampered with and comes from a trusted source.
  • Authenticity through the traceability of its artifacts to a trusted source.
  • Security to help prevent attacks like model or dataset poisoning and unauthorized modifications.
  • Compliance by supporting regulatory requirements and security policies for ML deployments.

These facets are critical for securing the ML supply chain where ML models are distributed, deployed in production environments, or used within or as a dependency to security-sensitive applications.

How Sigstore model signing works

Let’s go over the key features of the Model Transparency project.

Model hashing

When an ML model is ready for deployment, a cryptographic hash of the model and all its files and metadata is generated and written to a serialized manifest. The manifest describes all the files that are part of the ML model repository by containing entries that list the path to a file and that file’s corresponding cryptographic hash. This format specification supports diverse use-cases by flexibly allowing the embedding of any metadata e.g. model version, dataset information, hardware and infrastructure used, along with any other pieces of information like those captured in what is known in the industry as a model card. A model card is a structured document that provides a standardized way of presenting details about an ML model.

The library currently supports different hashing algorithms: SHA-256 and BLAKE2, along with hashing memory content and files, and files can be whole, sharded, or already opened. In the future there may be support for GPU hashing or distributed hashing.

Model Hashing Format

The resulting manifest makes up the innermost layer of the model signing format. That is, the payload for this manifest is a layer made up of the In-Toto Attestation Framework’s In-Toto Statement, a CNCF graduated project, where each subject is a ResourceDescriptor as shown in the diagram below.

Once this manifest is generated, it is ready for signing.

Model signing

When the manifest is ready for signing, it can be signed using various supported methods:

  • Raw key: Uses a standalone key without any additional metadata, structure, or certificate wrapping. It is simply the raw key material, such as a private key, in its native form, without being embedded in a formal key management structure like a certificate (X.509).
  • Self-Signed Certificate: Signs the manifest using your self-signed certificate. Note that many organizations have security policies that frown upon self-signed certificates so be sure to check yours!
  • Certificate Authority (CA) signed certificate: Signs the manifest using a trusted public or private CA.
  • Sigstore: Utilizes the public good instance (PGI) Sigstore deployment for an easy to use signing solution using keyless signing.

This will cryptographically sign the manifest to prevent future tampering and store it as a detached signature, meaning just a separate file that can be stored alongside the rest of the model’s repository files.

Model Signing Format

The above diagram illustrates the current model signing format in its complete form. We’ve already covered the In-Toto Statement layer as part of the Model Hashing section above. The next layer wraps the In-Toto Statement and signature in a Dead Simple Signing Envelope (DSSE). This avoids having two files, one for the manifest, and one for the signature, such that we can use a single payload format that contains both the manifest and its associated signature in one file.

The outermost and final layer consists of the Sigstore Bundle. The resulting DSSE data is wrapped into a Sigstore bundle that also contains any verification details for verifying the integrity of the model. For non-Sigstore based signing methods, this verification material is simply the public key, or the certificate chain depending on the signing method used. For Sigstore based signing, this will be the certificate and transparency log entry details e.g. log entry ID, timestamp, inclusion proof, etc. This Sigstore bundle layer is created regardless of what signing method is used. But if Sigstore’s “keyless” signing method is used, then the Sigstore bundle format is created and returned entirely by the sigstore-python library after passing it an In-Toto Statement. And when not using Sigstore’s “keyless” signing, the model-transparency library creates the bundle through Sigstore’s protobuf specs library.

Model verification

Once a model has been signed by signing the serialized manifest, the corresponding signature file can be hosted alongside the model itself such as in an OCI registry. Then verification can happen in one of two ways: 

  • If the model was signed using Sigstore’s “keyless” signing, the verification involves checking if the model’s serialized manifest signature matches the public key associated with the signer’s identity, which is verified through a certificate issued by Fulcio (Sigstore’s CA), and ensuring that the signature and related information are recorded in Rekor, a public transparency log, thereby confirming the authenticity and integrity of the signed artifact. See https://www.sigstore.dev/how-it-works for more details.
  • If the model was not signed using Sigstore’s “keyless” signing, verification happens more traditionally by decrypting the signature using the public key of the known trusted identity (also embedded in the Sigstore Bundle format described above), and verifying the hash from the decrypted signature matches the hash of the serialized manifest.

This way when users download the model they can easily verify its integrity by checking if the signature corresponds to a known, trusted identity and if the model hasn’t been altered since it was signed.

Why traditional signing does not work well for ML models

Traditional approaches to signing software artifacts fall short when applied to ML models. Examples of traditional signing approaches are single blob signing (also known as monolithic signing or whole-package signing) and per-file signing (also known as individual artifact signing or file-level signing).

An example of single blob signing is how the industry has traditionally handled signing operating system ISOs where a user downloads the entire OS packages in a single ISO bundle along with its corresponding checksum and checksum signature files. A more modern example is signing container images that contain individual files or packages.

An example of per-file signing is GNU Privacy Guard (GPG) signing used in open-source software verification and package distribution such as RPM based systems where individual .rpm packages are signed using GPG keys before distribution.

Applying these traditional signing approaches to ML models have the following drawbacks:

  1. Small changes require full re-signing
    • If a minor update is needed e.g. modifying a configuration file or retraining a small part of the model, the entire blob needs to be resigned and redistributed even if most files remain unchanged.
    • This is costly for large models e.g. multi-gigabyte neural networks like GPT or vision models, or production ML environments in general where incremental updates are common.
    • This makes it inefficient for scenarios where only a subset of the files needs to be accessed or updated independently.
  2. High computational and storage overhead
    • Large models (e.g. LLMs or multi-modal models) contain multiple components: weights, preprocessing scripts, tokenizers, and metadata. The weight file sizes are often in the order of 10s of GBs where the real limitation today for ML models is often the file size limitations imposed by their Git hosting providers. Today, GitHub’s largest file size support is 5 GB, but only when using Git Large File Storage (LFS) and Enterprise Cloud. When using GitHub Free or Pro that limit goes down to 2 GB. HuggingFace on the other hand has its own file size limits and recommendations. They recommend breaking up model weight files into chunks of around 20GB each for improved performance and helping users with connection issues. And their hard limit for a single LFS file is 50GB. Whereas with ISOs, the entire ISO package may only be no more than around 6 GB in size based on today’s standards. There is at least an order of magnitude difference between these applications
    • Verifying a single monolithic signature requires loading the entire dataset or model into memory, a computationally expensive task.
    • This makes it impractical for resource-constrained environments e.g. edge devices, mobile deployments.
  3. Inefficient for distributed or modular workflows
    • Many ML workflows load model components dynamically (e.g. streaming weights, swapping out specific layers, using external vocabularies) to optimize resource usage, improve adaptability, and enable scalable AI deployments.
    • Some ML systems distribute components separately (e.g. edge devices fetching model weights but not preprocessing scripts) to improve efficiency, scalability, flexibility, and security.
    • With monolithic signing, users must download and verify the entire blob instead of just the necessary subset if only a small part of the model is needed e.g. for inference.
    • This slows down inference and increases network bandwidth costs for cloud-based deployments.
    • Per-file signing does not ensure the model components work together as originally intended – users can mix and match signed files, leading to potential inconsistencies.
  4. Federated and Decentralized Learning Scales Poorly
    • In federated learning or decentralized ML, models are updated across different devices or servers.
    • If the entire model blob is signed as one unit, even a minor contribution from a single client invalidates the entire signature, requiring a full resigning.

Example signing and verification flow

Here’s an example of a signing and verification flow that you can also get directly from GitHub.

Install the model_signing package

pip install model_signing

Obtain an ML model

rm -rf granite-3.2-2b-instruct
git clone https://huggingface.co/ibm-granite/granite-3.2-2b-instruct

Remove .git directory to avoid including it in the signature

rm -rf granite-3.2-2b-instruct/.git
ls -lh granite-3.2-2b-instruct/

Check entire size of model

du -sh granite-3.2-2b-instruct/

Sign the model

This uses the Sigstore signing method by default. When prompted, follow the link and select your preferred identity provider for authentication.

model_signing sign granite-3.2-2b-instruct

Verify the model

IDENTITY="foo@example.com"
IDENTITY_PROVIDER="https://github.com/login/oauth"
model_signing verify sigstore --signature model.sig --identity "${IDENTITY}" --identity_provider "${IDENTITY_PROVIDER}" granite-3.2-2b-instruct
Verification succeeded

Alter the model

rm -f granite-3.2-2b-instruct/config.json

Verify the altered model again

model_signing verify sigstore --signature model.sig --identity "${IDENTITY}" --identity_provider "${IDENTITY_PROVIDER}" granite-3.2-2b-instruct
Verification failed with error: Signature mismatch

Inspect detached signature

The signature was saved in model.sig by default with the sign command. Let’s take a look:

ls -l model.sig
cat model.sig | jq .
cat model.sig | jq .dsseEnvelope.payload -r | base64 -d | jq .

Conclusion: Building a more transparent future for ML models

The Sigstore model transparency project is a significant step toward applying software supply chain signing to the ML supply chain bringing enhanced security, verifiability, and trust to machine learning models. By leveraging cryptographic signing and Sigstore’s transparency logs, this project helps verify that models are not only authentic but also more tamper-resistant and auditable. It also begins to lay the security foundation for unlocking AI models in cloud-native environments by leveraging trusted techniques from supply chain security.

As the ML ecosystem continues to evolve, integrating Sigstore Model Transparency with other tools and platforms could unlock even greater potential. Imagine:

  • Seamless integration with model hubs like Hugging Face, providing every model with a verifiable lineage.
  • Enhanced security in MLOps pipelines, where CI/CD workflows (e.g. Kubeflow, Argo Workflows, etc.) automatically sign and verify models before deployment.
  • Supply chain security for AI applications, preventing unauthorized modifications in critical domains like healthcare, finance, and autonomous systems.
  • Improved performance of LLM signing through offloading to a GPU.
  • Containerized model security where model artifacts and its detached signature are embedded within OCI containers and are signed and verified just like software supply chain components.
  • Kubernetes-native model verification for ML models deployed inside Kubernetes clusters only after being automatically validated using admission controllers, allowing only trusted models to reach production.
  • Trusted AI in cloud marketplaces by having cloud providers use this solution to verify the lineage of models before they are published or consumed in enterprise AI applications.

It’s still early days. Whether you’re an ML researcher, developer, security professional, or simply someone passionate about ethical AI, you can contribute to shaping the future of model transparency. So let’s work together to make the machine learning supply chain transparent and secure. Check out the GitHub repository, experiment with the tools, and join the conversation to help build a more secure, trustworthy AI ecosystem!