If you run software on someone’s servers, you have a problem. You can’t be sure your data and code aren’t being observed, or worse, tampered with — trust is your only assurance. But there is hope, in the form of Trusted Execution Environments (TEEs) and a new open source project, Enarx, that will make use of TEEs to minimize the trust you need to confidently run on other people’s hardware. This article delves into this problem, how TEE’s work and their limitations, providing a TEE primer of sorts, and explaining how Enarx aims to work around these limitations. It is the next in a series that started with Trust No One, Run Everywhere–Introducing Enarx.
The problem Trusted Execution Environments solve
Until recently, a material reality of running software was that any lower layers of the computing stack on the same machine had control over and inspection into the running software. This applied to layers such as the operating system, Virtual Machine Manager (VMM, or hypervisor), container management stack – if any – and any other middleware. Consequently, anyone with (legitimate or unauthorized) root access to a machine could see, modify, terminate, and otherwise manipulate whatever code and data were running on the machine.
For anyone running a program on someone else’s machine, it was about as close to Game Over as you can get in terms of security and privacy. In a cloud environment, where both the control and safeguarding of thousands of physical machines hosting thousands more VMs are delegated to a service provider, this lack of basic security and privacy guarantees is seen as problematic by some organizations.
Trusted Execution Environments (TEEs) are an answer to this need to maintain data confidentiality and integrity “in use,” that is, during runtime (program execution), regardless of who might own or have access to the machine on which the software is running.
What do TEEs bring that we couldn’t do before?
Trusted Execution Environments (TEEs) are a fairly new technological approach to addressing some of these problems. They allow you to run applications within a set of memory pages that are encrypted by the host CPU in such a way even the owner of the host system is supposed to be unable to peer into or modify the running processes in the TEE instance.
All TEEs provide confidentiality guarantees for code and data running within them, meaning that the running workload can’t be seen from outside the TEE. Some TEEs offer memory integrity protection (4, 5), which prevents the data loaded into the TEE from being modified from the outside (we will come back to this below). As expected, none provide guaranteed availability, since lower stack levels must still be able to control scheduling and TEE launch, and can block system calls.
There are also various attacks (including but not limited to replay, TOCTOU, and Foreshadow) that have been successful against previous or current implementations of TEEs (3, 7). However, TEEs offer the novel capability of running userspace applications that are not visible to the operating system, VMM, or middleware. They have the potential to enable security and privacy features for sensitive workloads in environments where these features were previously unavailable, such as the cloud.
The difference between TEEs and TPMs, HSMs
Other classes of hardware for specialized cryptographic purposes already exist, specifically Trusted Platform Modules (TPMs) and Hardware Security Modules (HSMs). However, TEEs serve a fundamentally different purpose than these other classes of cryptographic hardware.
A TPM is a chip designed to provide a “hardware root of trust” by holding secrets (keys) in such a way that physically trying to open it or removing it from the computer motherboard to which it is soldered in order to access its secret is difficult and immediately evident. TPMs are not designed to provide general computational capacity. They do provide some basic (read: “slow”) computation capabilities: they can generate random keys, encrypt small amounts of data with a secret they hold, and they can measure components of a system and maintain a log of these measurements in Platform Configuration Registers (PCRs).
You could implement many of the capabilities of a TPM within a TEE, but it doesn’t make sense to create a “full” TPM implementation within a TEE: one of the key use cases for a TPM is measuring a boot sequence using the PCRs, whereas TEEs provide a general processing environment. A TEE doesn’t make a good physical root of trust, unlike a TPM. The capabilities of a TPM are also carefully scoped to meet the requirements of the TCG (Trusted Computing Group, the standards body for TPMs), which is more restrictive than requirements for a TEE.
A Hardware Security Module (HSM), on the other hand, is an external physical device that specializes in providing cryptographic operations, typically receiving clear text, encrypting it with a key it holds, and returning the cipher text (encrypted text), so that the operating system does not handle encryption keys. Like TPMs, they are designed to frustrate, detect and/or make evident physical tampering, which makes them a useful tool to keep secrets in a safe place. They generally provide higher levels of protection than TEEs, but are separate modules to the main CPU and motherboard, accessed via PCI bus, network, or similar.
All TEE instances and some HSMs (depending on the model) can be used as general-function processing units or programmed for particular uses (e.g. PKCS#11 modules). In contrast to the TEE, the cost of HSMs is high (typically thousands of dollars), whereas TEEs are integral to a normally-priced chipset. The work to program an HSM for a specific task (beyond a modular use) is typically very difficult and highly skilled.
Summing the three up, one could say:
- TEEs provide a general processing environment. They are built into a chipset.
- TPMs provide a physical root of trust, measurement of other components and the boot sequence, and have limited processing capacities. They are an inexpensive chip built into many computers.
- HSMs provide a safe environment to store secrets, process data, and can offer a general processing environment. They are expensive external devices that often require specialized knowledge to use properly.
Lastly, we should mention earlier approaches to TEEs that don’t fully fit our definition of TEEs. For instance, recent iPhones have a “Secure Enclave,” a fully separate CPU running alongside the main CPU, and Android phones using ARM chips include a system called TrustZone. TEEs must provide a trusted environment in which one can load software from a normal operating system, but these earlier models instead rely on a second operating environment running in parallel to the normal OS. This approach provides some of the functionality we want from a TEE, but also creates several problems and limitations, such as limiting the capacity for normal users to run software in trusted environments from userland.
The different types of TEEs
While some consensus exists regarding their goal, there are multiple approaches to the architecture and implementation of TEEs.
Different approaches, but no standards
As mentioned previously, TEEs provide confidentiality for user space software by encrypting a range of memory with a secret key (or keys) held in hardware and not available to the operating system or any other software, even running at the highest privilege level. Beyond this, however, there currently exists no industry consensus about the most secure or efficient way to create a TEE, and various hardware manufacturers have created fundamentally different implementations.
What each of these implementations shares is reliance on the CPU to create and enforce access to the TEE, and the ability for the end user to specify which processes should run in encrypted memory regions. From here, the industry has currently divided into two divergent models of TEEs: the process-based model (e.g. Intel’s SGX (9)) and the VM-based model (e.g. AMD’s SEV (10)). It is worth noting that CPUs must be specifically designed to support TEEs and provided with accompanying firmware, and most CPUs in 2019 do not have support for any type of TEE.
In the process-based TEE model, a process that needs to run securely is divided into two components: trusted (assumed to be secure) and untrusted (assumed to be insecure). The trusted component resides in encrypted memory and handles confidential computing, while the untrusted component interfaces with the operating system and propagates I/O from encrypted memory to the rest of the system. Data can only enter and exit this encrypted region through predefined channels with strict checks on the size and type of data passing through. Ideally, all data entering or exiting the encrypted memory area is also encrypted in transit, and only decrypted once it reaches the TEE, at which point it is visible only to the software running in the TEE.
An advantage of this model includes a smaller Trusted Computing Base (TCB) compared to the VM-based model, as only the CPU and a component of a specific process are trusted (1). A smaller TCB generally means less room for error, as there are fewer components involved in trusted work. This also allows all inputs and outputs to the TEE to be monitored, arguably increasing security. Additionally, current implementations, such as Intel’s SGX, offer memory integrity protection.
A frequently cited disadvantage of this model is the lack of bidirectional isolation: while the TEE’s process enjoys hardware protection from other processes and lower stack layers, the opposite is not the case. There are no hardware protections preventing software in the TEE from accessing or interfering with other processes or the operating system, which are only protected by standard access permissions. This one-sided protection raises a serious concern for misuse of a TEE to house malware: an OS would find it all the harder to eradicate malware in a TEE because of these hardware protections. Another major disadvantage is the need to develop applications specifically for this type of TEE, for example by developing software for Intel’s SDK for SGX to divide a program into trusted and untrusted components.
While there are many years of academic research and practical experience of using VM boundaries for process isolation, the same cannot yet be said for process-based models. There is some debate as to whether this is an advantage and a disadvantage, as disrupting traditional hierarchical trust models and imposing novel security boundaries creates uncertainty.
Current implementations of the process-based approach include Intel’s SGX (Software Guard eXtensions). The other process-based TEE currently known is OpenPOWER’s Sanctum, which has yet to reach the market at the time of writing.
In this model, memory is encrypted along a traditional VM boundary running on top of a VMM. While traditional VMs (as well as containers) provide some measure of isolation, the VMs in this TEE model are protected by hardware-based encryption keys that prevent interference by a malicious VMM (2). Current implementations, such as AMD’s SEV, provide separate ephemeral encryption keys for each VM, therefore also protecting the VMs from each other.
A significant advantage of this model is that it can provide bidirectional isolation between the VM and the system, so there is less concern about this type of TEE housing malware that is able to interfere with the rest of the system. AMD’s implementation of this model also does not impose requirements regarding software development, meaning that developers do not need to write to a specific API to get code running in this type of TEE. However, this latter advantage is eclipsed by the fact that the VMM running the software must be written to a custom API (8).
Several disadvantages of this model include a relatively large TCB that includes the OS running inside the VM (1), which theoretically increases attack surface. Current implementations, such as AMD’s SEV, allow the VMM to control data inputs to the trusted VM (3), which means that the host machine could still potentially alter workloads that were thought to be secure. It also requires both a kernel and hardware emulation within the VM, and is relatively heavyweight, especially for microservices.
AMD’s SEV is the most fully developed implementation of this model, though others, such as Intel’s MKTME (Multi-Key Total Memory Encryption, 12), exist. A third implementation, which has been announced but is not yet available in the market, is IBM’s Protected Execution Facility or “PEF,” which will be open source (6).
There has been some discussion of TEEs on other hardware platforms including, for instance, the MIPS architecture. The authors would be interested to hear more information about any similar implementations.
Current approaches are highly dependent on specific technologies
As we have seen, there are two broad models for Trusted Execution Environments. But beyond that, how does one actually get code running in these?
The situation here is anything but simple.
Writing an application for a TEE
Given the current lack of standardization regarding TEEs, two different implementations of TEEs will not necessarily provide the same security or performance outcomes. Worse, applications that need to run in a TEE (or the applications’ custom VMMs) must be developed specifically for each of these hardware technologies. This is inconvenient for development, can lead to a lack of compatibility between software versions (those able to take advantage of TEEs versus not), and makes it difficult to move between implementations of TEEs at a time when TEE implementations are highly in flux.
For example, developing an application for Intel’s SGX requires defining all channels of inputs and outputs to the TEE, as well as trusted and untrusted components. However, these definitions would be nonsensical for a version of the application running on a CPU without TEE capabilities, so the TEE-compatible and non-TEE-compatible versions of the software would need to diverge. Recently there have been efforts to reduce the friction for developers wanting to write code for some TEE implementations, most notably the Open Enclave project (11).
It is highly likely that the developer effort required to write an application for a currently offered TEE technology will have to be repeated all over again in order to take advantage of future TEE technologies that may offer preferable security or performance benefits.
A crucial aspect of deploying software to a TEE is the “Trusted” part: ensuring that you are, indeed, deploying to an actual Trusted Execution Environment, and not something masquerading as one. Essentially, the TEE needs to prove that it is genuine before it can be trusted: this process is called attestation.
Only genuine TEEs running on a real TEE-capable CPU should be able to create a valid attestation, and ideally this should be easy to check from the verifier side. The verifier in the cloud computing example would be an individual or organization who wants to use a cloud environment to run a confidential workload on machines they do not own.
Though attestation is critical to making use of any of a TEE’s security features, there are currently no standards surrounding attestation, and the burden of creating and enforcing attestation methods are on those who develop and deploy applications. This makes using TEEs in practice considerably harder and prevents their widespread adoption. Though both TEE models currently rely on certificate chains from the manufacturer to prove that a CPU is genuine and report measurements of a TEE after launch (allowing verification of the contents of the TEE), they differ on the kind and number of keys that must be validated by the certificate chain, as well as on the order of operations for the attestation process.
This lack of standardization in both development APIs and attestation processes means that once code has been written for a TEE implementation associated with a specific platform, the developers and users of the software are locked in. Rewriting the software or the custom VMM that runs it, or having to re-create an attestation validation process for a different platform with a different TEE implementation would require a significant time investment. This principle also negatively affects users of cloud platforms – as well as cloud service providers (CSPs) themselves – as users would be unable to easily take advantage of new TEEs offered by the CSP, their software being tied to a different physical implementation.
With these multiple issues in mind, Enarx, a new open source project, is being developed to make it simpler to deploy workloads to a variety of Trusted Execution Environments in the public cloud, on your premises or elsewhere. Enarx is a framework for running applications in TEE instances – which we refer to as Keeps within the project – without the need to implement attestation separately, without the need to trust lots of dependencies, and without the need to rewrite your application. You can read more about Enarx in the previous article in this series.
Awareness has been growing regarding the importance of encrypting data at rest (using full disk encryption) or in transit (TLS and HTTPS), but we have only recently developed the technical capacity to encrypt data during runtime as well. Trusted Execution Environments are an exciting advance in terms of confidentiality. The ability to encrypt data at runtime offers previously unavailable security and privacy features for developers and users of software. Though this is an exciting time for security, there are currently some formidable gaps in the standardization of this new technology. In the next post, we will look at a characteristic that is currently lacking in the TEE space: runtime portability, that is, the capability to write your software once and run it on various platforms.
Further Resources and Citations
- A Comparison Study of Intel SGX and AMD Memory Encryption Technology
- Intel Follows AMD’s Lead on Full Memory Encryption
- AMD SEV attack surface: a tale of too much trust
- Exploiting Unprotected I/O Operations in AMD’s Secure Encrypted Virtualization
- Security, Performance and Energy Trade-offs of Hardware-assisted Memory Protection Mechanisms
- IBM Cognitive Systems Continues to Enhance Overall Security (PEF)
- Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution (Foreshadow)
- Secure Encrypted Virtualization APIVersion 0.22
- Intel SGX
- AMD SEV
- Open Enclave SDK
- Intel MKTME