CXL is going to eat OMI's lunch

Date: 2022-07-12T22:10:00-07:00

Location: www.talospace.com

The question is whether that's a bad thing. And as it stands right now, maybe it's not.

High I/O throughput has historically been the shiny IBM dangled to keep people in the Power fold, and was a featured part of the POWER9 roadmap even though those parts never emerged. IBM's solution to the memory throughput problem was the Centaur buffer used in POWER8 and scale-up Cumulus POWER9 systems (as opposed to our scale-out Nimbus POWER9s, which use conventional DDR4 RAM and an on-chip controller), and then for Power10 the OpenCAPI Memory Interface, or OMI. In these systems, the memory controller-buffer accepts high-level commands from the CPU(s), abstracting away the details of where the underlying physical memory actually is and reordering, fusing or splitting those requests as required. Notoriously, OMI has an on-board controller, and its firmware isn't open-source.

But why should the interconnect be special-purpose? Compute Express Link (CXL) defines three classes of protocol: CXL.io, an enhanced CPU-to-device interconnect based on PCIe 5.0 with enhancements; CXL.cache, allowing peripheral devices to coherently access CPU memory; and CXL.mem, an interface for low-latency access to both volatile and non-volatile memory. Both CXL.cache and CXL.mem are closely related and themselves transmit over a standard PCIe 5.0 PHY. Memory would be an instance of a CXL Type 3 device, implementing both the CXL.io and CXL.mem specifications (Type 1 devices implement CXL.io and CXL.cache, and rely on access to CPU memory; Type 2 devices implement all three protocols, such as GPUs or other types of accelerators). The memory topology is highly flexible. If this sounds familiar, you might be thinking of Gen-Z, which aimed for an open royalty-free "memory semantic" protocol; Gen-Z started the merge into the CXL Consortium, led by Intel, in January.

IBM was part of Gen-Z, but eventually let it dangle for OpenCAPI and OMI, and while it is a contributing member to CXL this seems to have been as a consequence of its earlier involvement with Gen-Z. But really, what's OMI's practical future anyway? So far we've seen exactly one implementation from one vendor and that implementation has directly harmed Power10's wider adoption apart from IBM's own hardware. OMI promises 25Gbps per lane at a 5ns latency, but Samsung's new CXL memory module puts 512GB of DDR5 RAM on the bus at nearly 32Gbps. It's a cinch that Power11, whenever it gets on the roadmap, would support at least PCIe 5.0 or whatever it is by then and CXL would appear to be a better overlay on that baseline. Devices of all sorts could share a huge memory pool, even GPUs. Plus, a lot more companies are on board and that would mean a lot more choices and greater staying power, plus more likelihood of open driver support the more devices emerge.

There are still some aspects of CXL that aren't clear. Although it's advertised as an open industry standard, there's nothing saying it's royalty or patent-free (Gen-Z explicitly was, or at least the former), and the download for the specification has an access agreement. The open aspect may not be much better either: Samsung has an ASIC controller in their memory device but it still may need a blob to drive it, either internally or as part of CPU firmware (earlier prototypes used an FPGA), and nothing says that another manufacturer might not require it either.

Still, OMI has the growing stench of death around it, and it never got the ecosystem support IBM was hoping for; CXL currently looks like everything technologically OMI was to be and more, and at least so far not substantially worse from a policy perspective. Other than as a sop to their legacy customers, one may easily conclude there's no technological nor practical reason to keep OMI in future IBM processors. With nothing likely changing on the horizon for Power10's firmware, that may be cautiously good news for us for a future Power11 option.