High Performance Computing at the IOL
High Performance Computing (HPC) is no longer exclusively the realm of uber-expensive “Super-Computers” with proprietary transport technologies and software. High Performance, as implied, requires significant compute power and high speed memory, storage, and network interconnect. To achieve High Performance, each processor must communicate rapidly, and with little CPU/kernel overhead to other processors. This Inter-Processor Communication (IPC) technology continues to evolve, with the ever increasing need for faster data transfers with the lowest possible latency. One way to accommodate the HPC industry’s need-for-speed is by the use of Remote Direct Memory Access (RDMA). RDMA is a high-throughput, low latency data transport solution. HPC clusters are often needed to solve advanced computational problems in science, engineering, financial modeling, virtual reality and more. The OpenFabrics Alliance (OFA), in conjunction with the UNH-IOL, strives to promote interoperability between devices of the same technology that utilize RDMA. The term OpenFabrics refers to the OFAs main product, the open-source OpenFabrics Enterprise Distribution (OFED) software stack. This software promotes the development of RDMA-capable applications without restricting the application to the underlying RDMA transport technology. Two such technologies that are currently supported by the OFA are Infiniband and iWARP. To promote awareness and understanding of the OFED software stack, the OFA in conjunction with UNH-IOL, Software Forge, and System Fabric Works has developed a training program for HPC software application developers. The Writing Application Programs for RDMA using OFA Software course, hosted at UNH-IOL, is designed for industry professionals who want to learn how to program using RDMA. This course utilizes the high performance computing cluster at UNH-IOL for demonstration and hands-on learning purposes. The OFA also has engaged UNH-IOL to host and run the OpenFabrics Interoperability Logo Group (OFILG). This group’s mission is to publish the OFA Logo List, validating the latest OFED software stack with the latest Infiniband and iWARP RDMA hardware.
Thanks to generous donations by multiple OFA member companies, the OFA Logo Group at the UNH-IOL has an HPC cluster of 48 centrally managed compute nodes (a total of 544 processing cores). Using this cluster, the UNH-IOL tests Infiniband Host Channel Adaptors (HCAs), iWARP RDMA Network Interface Cards (RNICs), Infiniband switches and storage targets.
The OFA’s OpenFabrics Enterprise Distribution (OFED™) provides all the tools and drivers for the utilization of RDMA “out of the box.” Using the OFED™ package, an application utilizing transport independent methods should not see any difference in regard to what technology it is running on. One such transport independent method is the Message Passing Interface (MPI), which is a standardized message-passing IPC system designed for performance, scalability and portability. Another commonly used method is User Direct Access Programming Library (uDAPL), which is a set of high-performance RDMA user APIs. HPC application developers who wish to maximize performance may benefit from bypassing these upper-layer-protocols (ULPs) such as MPI and uDAPL and programming directly to the RDMA verbs (similar to an API), as addressed in the OFA offered RDMA programming course.
The OFILG publishes the OpenFabrics Interoperability Logo List twice a year on the UNH-IOL’s main website: http://www.iol.unh.edu/. RDMA devices are listed that pass all mandatory interoperability testing requirements as defined by the OFA’s Test Plan. These interoperability tests focus on both the interoperability of the underlying RDMA technology, as well as the functionality of the ULPs with the latest OFED software. This logo list has great benefit to the RDMA device vendor, as well as the end user by providing confidence that both their investment in hardware and software will have long-term value. Having your product granted a logo shows your customers that the device has passed extensive testing and will correctly function with the latest version of OFED, as well as operate correctly when used with other logo certified products. The interoperability testing is done in an all-to-all, cluster-wide manner. Prior to the official logo testing, vendors assemble at the UNH-IOL for a extensive interoperability validation of OFED and their hardware, contributing feedback to the open-source OFED development teams, as well as their own products and related standards bodies. As OFED adoption accelerates, the Logo Program is anticipated to expand coverage to standard Linux Distros, Microsoft Windows, and also emerging RDMA technologies such as RDMA over Converged Ethernet (RoCE).
If you want to learn more about the OpenFabrics Alliance, please visit www.openfabrics.org. To access the current test plan documents that outline the testing procedure, please see http://www.iol.unh.edu/services/testing/ofa/testsuites/.
Christopher Hutchins, Research and Development


















