Why Is InfiniBand (and other interconnects) So Fast?

This post has already been read 21200 times!

The above title is misleading because “fast” can mean many different things. In the case of HPC, “fast” means whatever it takes to keep the cores busy! In a previous post, I mentioned four parameters that are used to define an interconnect (throughput, latency, N/2, and messaging rate). Of course, applications are the best way to evaluate an interconnect.

The most popular interconnects for HPC are Ethernet (GigE and 10-GigE), InfiniBand, and Myrinet. (At this point, many people lump Myrinet into the 10 GigE category as it supports the standard protocol as well as the Myricom protocols.) Each of these interconnects are used in both mainstream and HPC applications, but one usage mode sets HPC applications apart from almost all others.

When interconnects are used in HPC the best performance comes from a “user space” mode. Communication over a network normally takes place through the kernel. (i.e. the kernel manages, and in a sense guarantees, data will get to where it is supposed to go). This communication path, however, requires memory to be copied from the users program space to a kernel buffer. The kernel then manages the communication. On the receiver node, the kernel will accept the data and place it in a kernel buffer. The buffer is then copied to the users program space. The excess copying often adds to the latency for a given network. In addition, the kernel must process the TCP/IP stack for each communication. For applications that require low latency, the extra copying from user program space to kernel buffer on the sending node and then from kernel buffer to user program space on the receiving node can be very inefficient.

To improve latency, many vendors of high performance interconnects use a “user space” protocol instead of the kernel. Figure One illustrates this difference. For example, the solid lines indicate a standard Ethernet MPI connection. Note the communication passes through the kernel on both send and receive. Interconnects like Myrinet and InfiniBand provide a low latency user space protocol that does not use the kernel or incur any TCP/IP overhead. Instead, it moves data from the memory of one process to the memory of the other process (dashed lines). The fast interconnects also provide TCP and UDP layer layer so that they can be used with regular through kernel network services as well. (i.e. to run NFS etc.)

Figure One: Kernel Space vs User Space Transfer

You can read the rest on: http://www.clusterconnection.com/2009/08/why-is-infiniband-and-other-interconnects-so-fast/