At the time when Ceph was originally designed, the storage landscape was quite different from what we see now. Initially, Ceph was deployed generally on conventional spinning disks capable of a few hundred IOPS of random IO. Since then, storage technology has progressed rapidly through solid-state drives (SSDs) capable of tens of thousands of IOPS to modern NVMe devices capable of hundreds of thousands of IOPS to more than a million. In order to enable Ceph to better exploit these new technologies, the ceph community has begun work on a new implementation of the core ceph-osd component: crimson. The goal of this new implementation is to create a replacement for ceph-osd that minimizes latency and CPU overhead by using high performance asynchronous IO and a new threading architecture designed to minimize context switches and inter-thread communication when handling an operation.
Crimson’s focus is on minimizing CPU overhead and latency—while storage throughput has scaled rapidly, single threaded CPU throughput hasn’t really kept up. With a CPU at around 3 GHz, you’ve got about 20M cycles/IO for a conventional HDD, 300K cycles/IO for an older SSD, but only about 6k cycles/IO for a modern NVMe device. Crimson enables us to rethink elements of Ceph’s core implementation to properly exploit these high performance devices.
Continue reading “Crimson: evolving Ceph for high performance NVMe”