Executive Summary

The Mechanism: Legacy sockets copy data twice (User ↔ Kernel). This kills 100Gbps links.
The Solution: io_uring shares ring buffers. AF_XDP bypasses the kernel stack entirely.
The Result: 200 cycles/packet instead of 1200. Wire-speed processing in TypeScript/Rust.

The trajectory of modern Linux networking has been fundamentally altered by the physical limitations of memory bandwidth. As network speeds scale to 100Gbps+, the legacy Berkeley Sockets API—reliant on synchronous system calls and data copying—has become a prohibitive bottleneck.

This analysis dissects the three primary technologies developed to address these bottlenecks: io_uring, XDP, and AF_XDP.

The Physics of Packet Processing

To appreciate the solution, one must understand the cost of the legacy path. Moving data between User Space and Kernel Space requires context switches (polluting CPU caches) and physical memory copies (saturating the bus).

The Cost of "Hello World"

User Space

Application

buffer = "Data"

write() syscall

Kernel Space

Socket Buffer (sk_buff)

Kernel Copy

NIC Driver

Ring Buffer

Context Switch

CPU loads kernel state. Cache pollution.

Data Copy

Memory bus saturated moving bytes twice.

The Goal

Zero-Copy: Map kernel memory to user space.

For a 100 Gbps link, a copy-based network stack essentially triples the memory bandwidth requirement to ~37.5 GB/s, stalling the CPU while waiting for data fetches.

io_uring: The Asynchronous Evolution

io_uring addresses the "Control Cost" by using shared ring buffers. Instead of a syscall for every operation, the application places requests in a Submission Queue (SQ) and reads results from a Completion Queue (CQ).

Shared Ring Buffers

Submission

Completion

mmap(CORE_SQ_RING | CORE_CQ_RING)

The kernel and application share read/write access to these memory regions. No syscalls are needed to queue work, only to notify readiness (unless IOPOLL is used).

Zero-Copy Send/Recv

Newer opcodes like IORING_OP_SEND_ZC (Linux 6.0+) and IORING_OP_RECV_ZC (Linux 6.1+) map user pages directly to the NIC via the driver, eliminating the final data copy while retaining the robust kernel TCP stack.

XDP & AF_XDP: The Kernel Bypass

While iouring optimizes the _interface, XDP (eXpress Data Path) optimizes the path. It allows running eBPF programs in the driver, before the OS allocates heavy metadata structures (sk_buff).

The Kernel Bypass Hierarchy

User Application

Reads Socket

Linux Network Stack

TCP/IPNetfiltersk_buff

XDP Hook

Driver Level (eBPF)

NIC (RX Queue)

Standard Path

Packets traverse the full kernel stack. Heavy sk_buff allocation and copying overhead.

AF_XDP: The "Golden Path"

AF_XDP is a socket that leverages XDP to redirect packets into a shared memory region (UMEM) accessible by userspace. This provides near-raw hardware performance (like DPDK) while remaining compatible with standard Linux tools.

Performance Analysis

The selection between these technologies depends on your specific constraint: Interface overhead (use io_uring) vs. Stack overhead (use AF_XDP).

Performance Lab

Simulation Config

Packet Load (PPS)

100k1.0M PPS10M

Payload Size

Efficiency

Standard: ~1200 cycles/pkt

AF_XDP: ~200 cycles/pkt

Throughput (Mpps)

CPU Cycles per Packet (Lower is Better)

Conclusion

The Linux networking landscape has evolved. io_uring is the future standard for Web Servers and Databases needing compliant TCP/IP. AF_XDP is the modern replacement for DPDK in high-performance packet processing. Together, they allow Linux to scale to 200Gbps+ without abandoning the safety of the kernel.

Conclusion