1.First of all
What is InfiniBand?
An open standard interconnect protocol defined by the IBTA (InfiniBand Trade Association).
InfiniBand software is developed under the OpenFabrics Alliance, which includes multiple vendors.
Three features of the protocol
1. High bandwidth 2. Low latency 3. High RAS (reliability, availability, maintainability)
As a high-speed I/O interconnect between servers and storage, it is mainly used in HPC/enterprise data centers/medical equipment and inspection equipment that require high image processing capability.
2. InfiniBand Architecture Layers
[Upper layer layer]
・Signal levels and frequencies, physical media, connectors
・Interface between application and hardware
・Provision of management functions
[Transport layer]
・Packet transfer to appropriate QP (Queue Pair)
・Message segmentation and reassembly, access rights, etc...
[Network layer]
・Packet routing between subnets
[Link layer]
・Symbols and framing, flow control (Credit based)
・Packet routing from source to destination
[Physical layer]
・Signal levels and frequencies, physical media, connectors, etc...
3. Physical Layer -Physical Layer-
As for the link width, lanes can be bundled from 1x to 12x, and by using the fastest EDR (25Gb/s per lane) currently available, a maximum bandwidth of 120Gb/s can be achieved.
The HCA is linked at 4x per port (up to 100Gb/s), but it is also possible to link at 12x (up to 300Gb/s) between switches.
In terms of cables, we also support industry standard copper and optical cables. (For Mellanox / Emcore cables, please refer to our product introduction page.)
| link width | 1x 1 differential signal for transmit (Tx) and receive (Rx) 4x 4 differential signals for transmit (Tx) and receive (Rx) 12x 12 differential signals for transmit (Tx) and receive (Rx) ![]() |
|
| link speed | ・Single Data Rate (SDR) ・Double Data Rate (DDR) ・Quad Data Rate (QDR) |
2.5 GHz signal (2.5 Gb/s for 1x) 5.0 GHz signal (5 Gb/s for 1x) 10 GHz signal (10 Gb/s for 1x) |
| link bandwidth | ・Bandwidth obtained by multiplying link width and link speed ・4x SDR (10Gb/s), 4x DDR (20Gb/s), 4x QDR (40Gb/s), 4x FDR (56Gb/s), 4x EDR (100Gb/s) |
|
| Coding method (encoding) |
・8b/10b encoding (SDR, DDR, QDR) ・64b/66b encoding (FDR, EDR) |
|
| medium (media) / cable |
・PCB, passive copper cable, active copper cable, optical cable ・Industry standard connectors and cables |
|
4. Link Layer - Link Layer -
InfiniBand packet
InfiniBand splits messages up to 2GB into transmission unit packets ( MTU payload: 256Byte - 4KByte ) and transfers them over the InfiniBand link.
Links are controlled by credit-based flow control to prevent packet loss.
Also, one link can have multiple virtual lanes (virtual lanes), and independent flow control is performed for each virtual lane (VL: Virtual Lane). Multiple service levels (SL: Service Level) can be set for packets, and flexible QoS can be performed by combining with virtual lanes.
In addition, two CRCs, ICRC and VCRC, are implemented in the packet itself to protect data with high reliability.
・End-to-End transfer unit (unit)
・Hardware segmentation and reassembly of messages
・Type: Data packet, Acknowledgment (Ack), Link
IBA packet format
| LRH | Local Route Header | (required for all packets): addressed by LID |
| GRH | Global Route Header | (required when destination subnet is different) : addressed by GID |
| BTHMore | Base Transport Header | (IB transport): addressed by QP number |
| ExTH | Extended Transport Header | Different headers for RDMA/Atomic/Ack |
| Immediate Data Header | Immediate Data | (if any was sent) |
| Msg Payload | message payload | |
| ICRC | Invariant CRC (32 bit) | End-to-end error correction |
| VCRC | Variant CRC (16 bit) | Hop by Hop error correction |
full control
・Credit-based link-level flow control
・Flow control for each virtual lane
・Prevents packet loss within the fabric and realizes QoS for each virtual lane
Virtual lanes and service levels
Virtual Lanes (VLs) - create up to 16 VLs on one physical link (IB spec)
・Independent flow control for each VL (independent Tx/Rx buffers)
16 service levels (mapped to VL per packet)
5. Link layer example
6. Transport Layer - Transport Layer -
There are three types of transfer: "SEND", "RDMA (Remote DMA) Write" and "RDMA Read". RDMA can read and write data directly to the host memory between different computers, with almost no CPU processing overhead. Since it does not take much time, the CPU resource can be efficiently used for the actual application.
In addition, there are two types of transfer services, Reliable / Unreliable, and you can choose the service according to your purpose.
| RC (Reliable Connection) | Ack/Nak capable QP |
| UC (Unreliable Connection) | QP without Ack/Nak support |
| RD (Reliable datagram) | EEC with Ack/Nak support |
| UD (Unreliable datagram) | UDP (no Ack/Nak support) |
| RAW | Non-IB protocols ( IPv6 or Ethernet packets ) |
Transfer type
・SEND
・RDMA Write
・RDMA Read
RDMA (Remote Direct Memory Access) is the ability to move information directly into memory between different computers, requiring minimal memory bus bandwidth and CPU processing overhead.
Image of RDMA write
1. HCA consumes this WQE, acquires data, transfers it to the remote side, and creates a complication (Completion Queue).
2. When the packet is received at the HCA, its address and memory key are verified and written directly to memory.
3. The application allocates a buffer for receiving and sends the address and key to the remote side.
4. The sending side expands the transfer data into host memory, places a WQE with the virtual address of the remote side (receiving) in the Send Queue, and rings the DoorBell.
7. InfiniBand Fabric Administrative Manager
InfiniBand requires that there always be one subnet manager in the subnet. This manager can exist on both the HCA or the switch and is selected by administrator configuration. In addition, it manages the subnet manager, device discovery within the subnet, FDB initialization, LID allocation, etc., and periodically inspects (sweeps) the fabric to constantly detect whether there has been any change within the subnet. . Only after this manager is started will the InfiniBand link be successfully established.
There are two types of InfiniBand switches: Managed type with built-in subnet manager and Unmanaged type.
8. Summary of InfiniBand Features
[1] Broadband - Eliminates I/O bottlenecks and processes data at high speed -
・40Gb/s, 56Gb/s, 100Gb/s HCA link bandwidth
・120Gb/s, 300Gb/s inter-switch link bandwidth
[2] Ultra-low latency - Significant reduction in application processing time -
・End-to-end latency of 1us or less
[3] Hardware-based transport - Reduced CPU load -
・Reliable data transfer with Ack/Nak support
- CPU-friendly data transfer using RDMA
[4] Network with no packet loss and flexible quality service (QoS) - high quality -
・Link level flow control
・Congestion Management
・QoS per virtual lane and multiple service levels (SL)
