Malleable Metal – Integrating SAN-booting with Foreman

by | Aug 16, 2018 | Developer Productivity

The world of multi-tenant bare metal cloud computing in the datacenter is increasingly important.  With tenants being offered their own servers rather than locked-down VMs or compute services, the potential for innovation is much higher.  Mass Open Cloud aims to offer a multi-tenant cloud where hardware would be shared between organizations, such as universities, with tenants able to access bare metal instances directly. Here’s how we propose to create a standardized architecture to provide a seamless elastic bare-metal experience for Mass Open Cloud and similar environments.

Our solution to the bare-metal-as-a-service problem combines two projects: Mass Open Cloud’s Malleable Metal as a Service (M2) and the Red Hat stewarded Foreman Project.  Where M2 provides the means for provisioning servers, Foreman provides the orchestration and user interface.

M2 is a service for quickly provisioning nodes over iSCSI.  Booting bare metal using M2 is fast because the process never writes to the server’s local disk.  Users select computer images through the CLI or API that M2 copies and serves as an iSCSI target. M2 clones first from “golden images” that only an administrator can change.

The original high-level features for M2 are rapid provisioning, rapid snapshotting, rapid cloning, and support for multi-tenancy.  When compared to vanilla Foreman and OpenStack Ironic in a study conducted by researchers at Mass Open Cloud, M2 provisioned a server, from power-on to boot, in about a third of the time, and when a reprovisioning case was run as part of the same study, M2 was able to reprovision a node from a snapshot or node recovery up to five times faster.

Standalone M2 also orchestrates DNS, DHCP, and TFTP to chain-boot servers into iPXE.  However, Mass Open Cloud plans to restructure M2 to also support minimal orchestration.  For example, an M2 `create_disk` call would take a golden image name, copy it, register the copy to the user’s project, and then present an iSCSI target.  This call is extremely powerful for an orchestration service such as Foreman — all the program needs is an image name from M2 to be able to fully prepare a node to boot via iPXE.

The integration of M2 into Foreman consists of two parts.  First, there needs to be a path of communication from Foreman to the M2 API server.  Foreman has a service well-fitted for this called a smart-proxy. A smart-proxy is a simple Ruby Sinatra application that follows standards from a Foreman github repo .  An M2 smart-proxy plugin exposes some M2 API functionality such as image and iSCSI target management.  By having the communication happen across the smart-proxy, the Foreman core server does not need to have many specific firewall rules to cater to M2.  The Foreman user only needs to register the M2 smart-proxy through the Foreman UI and open its one port in the firewall rules.

Second, there needs to be a core Foreman plugin for M2.  This plugin introduces a new M2 compute resource to Foreman that makes host creation easier.  A user can relate M2 images to the compute resource from a drop-down menu so they can be available during host creation.  

To create a host, however, changes to Foreman core need to be made.  Currently, Foreman compute resources are very VM-centric. Furthermore, current Foreman host orchestration either network-boots a host and installs from a media source or creates a host VM directly from a compute resource image.  In the M2 case, the host must be created from an image but also be provisioned over the network. A new hybrid provisioning method may be introduced to solve this problem.

Why Foreman?

Foreman is a great fit for M2 in the bare metal data center because it provides a friendly experience for managing many machines at once.  Plus, it also already supports multi-tenancy natively. However, one requirement of M2 is new to Foreman: elastic reprovisioning. In a multi-tenant and elastic environment, a server could belong to different people running different software at almost any time.  Foreman takes issue with this swapping of hardware, since host information is relatively inflexible.

One approach to solving the reprovisioning case is for each tenant in the data center to have their own Foreman instance.  Since Foreman is a lightweight Ruby on Rails application, the benefits of running multiple instances should outweigh the performance cost for most users.  Tenants would populate Foreman with the hardware they expect to use and configure a smart-proxy to communicate to the appropriate M2 server. That way, when a server is unavailable, it simply appears as if it is off.  The host information remains the same and the server can switch owners as needed.

However, this switching of owners in a shared data center introduces security issues.  Who is to say that hardware returning to a user can be trusted? The previous owner could have bugged the firmware with something malicious.  Introducing continuous attestation to our bare metal data center would provide greater confidence that the hardware can be trusted.

Keylime, a project based out of MIT and supported by Red Hat, could provide this functionality.  By using an immutable piece of hardware on the motherboard called a Trusted Platform Module (TPM), a server can compare itself to the hardware information hash stored in the TPM.  If anything changes, the user is alerted that the machine’s configuration is different from what it should be. Collaboration between Keylime, M2, and Foreman is an upcoming discussion in the future of this project.

Ian Ballou is a BU class of 2019 undergraduate computer engineer.  He is an intern at Red Hat and a member of the Mass Open Cloud HIL and M2 teams.

Github: https://github.com/ianballou