Systematic approach Intel’s recent announcements regarding infrastructure processing units (IPUs) have prompted us to revisit the topic of partitioning functionality in a computer system.
As we noted in a previous article, The Accidental SmartNIC, there is at least a thirty-year history of trying to decide how much to offload from a general-purpose processor to a more specialized network card, and a Equally long struggle between unloading more specialized engines compared to more versatile engines.
The IPU is just the latest entry in a long line of general-purpose unloading motors, and now we’re seeing a pretty diverse set of options, not just from Intel but others like Nvidia and Pensando. These latter companies use the term DPU (Data Processing Unit) but the consensus seems to be that these devices address the same class of problems.
There are several interesting things going on here. The first is that there is an emerging consensus that the general purpose x86 (or Arm) server is no longer the best place to run the infrastructure functions of a cloud. By “infrastructure functions” we mean everything needed to run a multi-tenant cloud that isn’t actually guest workloads: hypervisor, network virtualization, storage services, and so on.
As the server housed both guest workloads and infrastructure services, these functions are increasingly seen as “overload” that only takes cycles away from guests. An oft-cited article is the Facebook Accelerometer study, which measures overhead in Facebook’s data centers by up to 80%, although this is not generalizable to cloud providers. More plausibly, Google reported [PDF] in 2015 :
Amazon Web Services likely encountered the same problem of overheads reducing revenue-generating workloads and began using specialized hardware for infrastructure services when it acquired Annapurna Labs in 2015, laying the foundation for its architecture. Nitro. The impact was to move almost all infrastructure services off the servers, leaving them free to run guest workloads and nothing else.
Once you’ve decided to move a function from the general-purpose CPU complex to some kind of offload engine, the question becomes how to maintain the appropriate level of flexibility. These discharged functions are not static, so putting them in fixed-function hardware would be a shortsighted move.
This is why we have seen NICs evolve in recent years from offloads of fixed functions such as TCP segmentation to the more flexible architecture of SmartNICs. The goal is therefore to create an offload system that is more optimized for offloaded services than a general-purpose processor, while remaining sufficiently programmable to support the innovation and evolution of offloaded services.
The goal is to create an offload system that is more optimized for offloaded services than a general-purpose processor, while still being programmable enough to support the innovation and evolution of offloaded services.
Intel’s IPU family includes multiple entrants, which take different approaches to provide this flexibility, including FPGA and ASIC-based versions. The Mount Evans ASIC is particularly interesting because it includes both Arm processor cores and programmable network hardware (from the Barefoot Networks team) that is programmable in P4.
It’s a topic close to our hearts here at Systems Approach, as the P4 toolchain is at the heart of much of the technology that we talked about in our SDN book.
Putting a P4 programmable switch in an IPU / DPU makes a lot of sense, as networking functions that might be offloaded include those of a virtual switch. And one thing we learned at Nicira and later in the VMware NSX team is that if you want to move the vswitch to an offload motor, that motor has to be fully programmable.
If a network adapter is not general enough to implement the entire vswitch, you can only move a subset of the vswitch functionality to the offload engine. Even if you could move 90% of the functionality, the remaining 10% that you need to keep doing in the processor is likely to be a bottleneck.
Thus, a P4 programmable unloading engine based on the Protocol Independent Switching Architecture (PISA) provides the required level of flexibility and programmability to make unloading of the entire vswitch possible. Combine this with other programmable hardware (such as Arm cores) and you can see how all of the infrastructure functions including hypervisor, storage virtualization, etc. can be offloaded to the IPU. .
The virtual switch of a network virtualization system is a candidate for offloading
One way to look at the latest generation of DPU / IPU is that the efforts of the SDN movement to create more programmable switches have allowed innovation in a new space.
SDN initially promised to spur innovation in the control plane by decoupling the switching hardware from the software that controlled it. Network virtualization was one of the first SDN applications to take off, with the separation of control and data planes and very flexible softswitches allowing networks to be created entirely in software (over a sub – material layer).
PISA and P4 led to a more flexible form of switching hardware and a new way of defining the hardware-to-software interface (enhancing earlier OpenFlow efforts). All these threads (control plane innovation, network virtualization and flexible and programmable switching hardware) are now brought together in the creation of IPUs and DPUs.
The development of IPU / DPU can also be seen as a continuation of the trend that processors are both very flexible and yet specialized for certain tasks. GPUs and TPUs are really flexible, being used for everything from crypto-mining and machine learning to graphics processing, but nonetheless are quite specialized compared to CPUs. (GPUs were even used for packet processing in an era before PISA and P4.)
DPUs and IPUs now appear to be well established as a new category of highly programmable devices that are optimized for a specific set of tasks that must be performed in a modern cloud data center. With this greater specialization comes greater efficiency, while flexibility remains high enough to support future innovation. ®