It’s almost impossible to count the number of security breaches that were caused by a leaked password or API key. Secret management in software is a tricky thing to get right. Securely deploying secrets to only the places where they are needed, securely rotating them to reduce exposure but also maintain system uptime, revoking compromised secrets to reduce a leak’s blast radius — all of these are non trivial to get right. Shared secrets might seem like a simple way to protect resources, but this simplicity is deceptive and less robust. But do we even really need shared secrets?
Red Hat’s Emerging Technologies blog includes posts that discuss technologies that are under active development in upstream open source communities and at Red Hat. We believe in sharing early and often the things we’re working on, but we want to note that unless otherwise stated the technologies and how-tos shared here aren’t part of supported products, nor promised to be in the future.
When software communicates with other software, we typically use these secrets to affirm the identity of the connecting resource, but it’s only an approximation of identity. Nothing prevents multiple distinct programs from re-using the same shared secrets. Not only does this increase the likelihood that they leak, but also creates confusion as to what is actually being authenticated. And if you ever need to adjust authorization policies for that secret, you are constrained by whatever other pieces of code might be reusing that same secret.
What if we could do better? What if instead of using a brittle approximation for identity we could use the actual identity of code? And be able to cryptographically verify that identity? And what if the underlying host gets compromised? Can we prevent compromised code from accessing sensitive resources? The answer to all of these questions is yes, using SPIFFE/SPIRE and Keylime.
Because the combination of these technologies is still pretty nascent and fast moving, this article is not a tutorial, as many specifics would soon be outdated. But the principles and integrations discussed here are applicable for use and ready for your exploration.
Software Identity
SPIFFE (the Secure Production Identity Framework for Everyone) and SPIRE (the SPIFFE Runtime Environment) are 2 graduated CNCF (Cloud Native Computing Foundation) projects that aim to solve the software identity problem. This article is not intended to provide a full introduction to those projects (as there are much better introductions already out there), given that these concepts and tools are new, a small introduction is warranted. SPIFFE is a specification for using cryptographic certificates for software identity in a vendor agnostic manner. SPIRE is the production ready implementation of SPIFFE. SPIRE agents run on hosts that wish to access other systems and those agents connect to a SPIRE server. Using a highly customizable plugin system, the SPIRE agents and the SPIRE server use various workflows to identify the software that is asking for access as well as attesting the identity of the host the software is running on. This combined software and host identity (along with other plugin specific metadata) is bundled into a short-lived cryptographic certificate (called an SVID or SPIFFE Verifiable Identity Document). The software in question then presents that certificate to services it wishes to connect to so that it may be authenticated.
SPIRE has many built-in plugins for workload attestation (Unix processes, Kubernetes, Docker, etc) and node attestation (AWS, Azure, GCP, Kubernetes, TPM DevIDs, X509 certs, SSH, etc) which can be combined in many different configurations based on your needs. This flexibility might seem complicated at first, but it allows you to craft granular access policies that enable fine-grained control to your services and data. With SPIRE setup for your environment you can have policies like:
- Only production code can access other production services
- Only accounting applications running in on-prem data centers can access a particular database instance
- Only HR applications operating in-country can access PII on personnel in that country
- Only code signed by our institution is allowed access to sensitive research data
Leaked passwords or API keys are no longer a concern because identity is not approximated with a shared secret. This allows for more explicit and precise access control.
Keylime and Remote Attestation
Hopefully you are now convinced that using real workload identities is superior to shared secrets, so let’s take it even further. As mentioned previously, SPIRE uses the host identity as a part of the cryptographic identity for the workload. This host identity can include static attributes such as the cloud provider node type or kubernetes node label. But what if we want that identity to be more dynamic and based on security posture? If a node has been compromised, is it really still the same node that we trusted? This is even more of a concern in IOT and edge scenarios where we don’t physically control the node. How can we trust the workloads on the node without tying it’s identity to it’s security?
Keylime is a CNCF sandbox project that provides remote attestation with a hardware root-of-trust for target machines. This allows for the creation of security policy about what software is allowed to exist on a Linux host, and Keylime leverages that machine’s TPM (Trusted Platform Module) device and the Linux IMA (Integrity Measurement Architecture) subsystem to cryptographically verify that the machine has not been tampered with. Keylime lets device owners craft a policy that fits their needs while being difficult to bypass. Keylime policies can verify things like:
- The boot loader
- The Kernel, it’s modules, and boot parameters
- Every binary and library installed on the host
Every piece of software accessed on that host leaves a cryptographic trace in the machine’s TPM device (think of it as a hardware based root node of a Merkle Tree). These traces can’t be erased or hidden even by something malicious and low-level, like a root-kit of the kernel. These traces can also be remotely and independently verified for a whole fleet of devices and monitored in real-time.
If it were possible to enable SPIRE to use Keylime for the host identity aspect, our policies can be even more dynamic, based on the real-time attested state of our nodes. Taking the example policies above, we can enhance them in the following ways:
- Only production code on servers that pass measured boot policies can access other production services
- Only accounting applications running in on-prem data centers that are running company signed software can access a particular database instance
- Only HR applications operating in-country on machines passing specific Keylime policies can access PII on personnel in that country
- Only code signed by our institution on build servers that were not compromised is allowed access to sensitive research data
This isn’t to say that crafting such policies is easy (we glossed over many details of how you define “non-compromised”). However, these new capabilities enable you to achieve a more robust security enforcement strategy than was previously possible.
Keylime Plugin for SPIRE
Recent API enhancements to Keylime (starting in version 7.11.0) now allow 3rd party systems to query Keylime and independently verify the attestation state of the targets being monitored. This has allowed the creation of a new experimental SPIRE plugin for Keylime that acts as a node attestor. When configured, the SPIRE agent and server can communicate with the Keylime agent and server to verify the identity of the node and its security status in Keylime. The process for this verification is as follows:
- The SPIRE agent queries a local /info API on the Keylime agent to retrieve identity information, like the Keylime UUID
- The SPIRE agent sends a node attestation request to the SPIRE server with this information
- The SPIRE server verifies the node is registered with the Keylime Registrar
- The SPIRE server verifies the node is currently passing it’s attestation policy with the Keylime Verifier
- The SPIRE server sends a challenge request with a nonce to the SPIRE agent
- The SPIRE agent requests a signed TPM identity quote with the nonce from the Keylime agent
- The Keylime agent creates a signed quote with the TPM’s Attestation Key and sends it back to the SPIRE agent
- The SPIRE agent sends the signed quote (with the nonce) back to the SPIRE server
- The SPIRE server validates the signed quote with the Keylime Verifier
- The SPIRE server gives the SPIRE agent the cryptographic certs needed for it to issue workload identities for this node
This might seem like a lot of back-and-forth just to verify the state of a node in Keylime. However this workflow prevents a potentially malicious node from spoofing the identity and status of another node and potentially gaining access to credentials it isn’t authorized for.
All of this communication and verification happens quickly behind the scenes when the node comes online and the SPIRE agent attests its node identity with the SPIRE server. Then whenever a software workload requests its own access certificate, the SPIRE agent can use the cached node identity documents. This means it doesn’t have to go through with this handshake every time access is required as it has already verified the node’s state and identity during credential issuance.
Setup
In order to test this out for yourself you will need the following setup:
- A SPIRE server (configured for your environment and trust domain)
- A Keylime server with a Verifier and Registrar setup for your environment
- Target servers running SPIRE agents and Keylime agents
- Keylime server components set up to monitor those target servers based on your policies..
- A Golang build environment with version 1.21 or higher.
We’re glossing over a lot of details here because how you set SPIRE and Keylime up will depend greatly on your environment. Are you on-premise or in the cloud? Bare metal or virtualized? A single cloud provider or a hybrid setup? What about operating systems? What custom software is installed? Are you validating measured boot or file integrity or both? Are you using IMA signatures or file hashes or both for integrity? Answers to these questions will influence your specific installation and set up.
Once you have this setup and answered all of the above questions regarding your environment (and likely more), you will need to build the SPIRE-Keylime plugin and then configure the SPIRE agent to correctly.
The first step is to build the SPIRE Keylime plugin. Since SPIRE plugins are typically implemented as Golang binaries, a Golang SDK is available. As this is a brand new, experimental plugin, no pre-compiled releases are available. As a result, we need to clone the spire-keylime-plugin git repository and build the keylime-attestor-server and keylime-attestor-agent binaries using the following steps:
git clone git@github.com:keylime/spire-keylime-plugin.git make build
Confirm that both the server and agent binaries have been created:
ls keylime-attestor-*
Now install the keylime-attestor-agent binary on each of the machines running a SPIRE agent. For our example we will place the binary at the location /usr/local/bin/keylime-attestor-agent. Then install the keylime-attestor-server onto the SPIRE in the same location as the agent (/usr/local/bin/keylime-attestor-server).
Next configure the SPIRE server to access the Keylime server components. In the SPIRE server configuration file, add a NodeAttestor section for keylime to the plugin section:
NodeAttestor "keylime" { plugin_cmd = "/usr/local/bin/keylime-attestor-server" plugin_checksum = "a8a9adf8785888ce3d32267d96174d8e1fcf2a7a7b6c15692b2551f9910c5883" plugin_data { keylime_verifier_host = "127.0.0.1" keylime_verifier_port = "8881" keylime_mtls_cert_file = "/var/lib/keylime/cv_ca/server-cert.crt" keylime_mtls_key_file = "/var/lib/keylime/cv_ca/server-private.pem" } }
In the example above, the SPIRE server and the Keylime server components are running on the same host for convenience, but this configuration may not match your environment; so, be sure to specify the appropriate values. The plugin_cmd will need to point to the binary that was built and installed earlier. And the plugin_checksum will need to be adjusted to the actual SHA256 value of this binary. The keylime_verifier_host will obviously be different if the Keylime Verifier is on another system. The keylime_mtls_cert_file and keylime_mtls_key_file are used to specify mutual TLS (mTLS) authentication with the Keylime Verifier. These values can be copied from the above paths for a test Keylime setup. However, for a real production setup, a certificate management system would typically be used to provide these values which are deployed in both locations securely.
Next, to configure the SPIRE agent, add a block similar to the following with the plugin section of it’s configuration file:
NodeAttestor "keylime" { plugin_cmd = "/usr/local/bin/keylime-attestor-agent" plugin_checksum = "e4413a38ae8bb2ce8e19f55a22b05532b2da431bbf7f6364f4709896409331bf" plugin_data { keylime_agent_host = "127.0.0.1" keylime_agent_port = "9002" } }
The plugin_cmd and plugin_checksum options are similar to the server configuration above, but point to different binaries (keylime-attestor-agent vs keylime-attestor-server). Since the SPIRE agent will be communicating with the Keylime agent on the same host, the keylime_agent_host and keylime_agent_port are not required, but it is useful to illustrate what the default values are in the event that it is bound to a different IP and port.
With the configuration in place, you can restart the SPIRE server and when ready, the SPIRE agents on each node. You should see a message similar to the following appear in your SPIRE agent logs when a node has successfully crafted and sent its identity challenge to the SPIRE server:
Keylime Attestation response sent
And then a corresponding entry in the SPIRE server logs when a node passes attestation using Keylime:
Keylime Attestation Successful
Congratulations! You successfully integrated SPIRE and Keylime!
The Future
The new SPIRE Keylime plugin is still in its infancy and needs some real-world tire kicking to become more robust, configurable and future proof. Some of the features and fixes that are already in the works include:
- A full CI/CD pipeline with multiple versions of SPIRE and Keylime to verify each change before release
- More resilient to network timeouts and connections issues between Keylime and SPIRE
- Add Keylime metadata, including the names of Keylime policies, as SPIRE selectors, so they can be used in access control policies
If you find this project interesting, please join us in the Keylime #slack community of the CNCF as well as on GitHub.
Conclusion
Putting all of this together gives a result that’s referred to as “Zero Trust Ambient Credentials“. No shared secrets, no implicit trust in any connection between services. But an end-to-end verified, cryptographically protected authentication system with very robust policy controls based on what the software is and where it’s running. Once we have it working the main issue becomes how detailed do we want to get with our policy choices, not how do we make sure these secrets don’t leak – we’ve removed that problem entirely.