Executive Summary
The Mechanism: Legacy sockets copy data twice (User ↔ Kernel). This kills 100Gbps links.
The Solution: io_uring shares ring buffers. AF_XDP bypasses the kernel stack entirely.
The Result: 200 cycles/packet instead of 1200. Wire-speed processing in TypeScript/Rust.
The trajectory of modern Linux networking has been fundamentally altered by the physical limitations of memory bandwidth. As network speeds scale to 100Gbps+, the legacy Berkeley Sockets API—reliant on synchronous system calls and data copying—has become a prohibitive bottleneck.
This analysis dissects the three primary technologies developed to address these bottlenecks: io_uring, XDP, and AF_XDP.
To appreciate the solution, one must understand the cost of the legacy path. Moving data between User Space and Kernel Space requires context switches (polluting CPU caches) and physical memory copies (saturating the bus).
CPU loads kernel state. Cache pollution.
Memory bus saturated moving bytes twice.
Zero-Copy: Map kernel memory to user space.
For a 100 Gbps link, a copy-based network stack essentially triples the memory bandwidth requirement to ~37.5 GB/s, stalling the CPU while waiting for data fetches.
io_uring addresses the "Control Cost" by using shared ring buffers. Instead of a syscall for every operation, the application places requests in a Submission Queue (SQ) and reads results from a Completion Queue (CQ).
mmap(CORE_SQ_RING | CORE_CQ_RING)The kernel and application share read/write access to these memory regions. No syscalls are needed to queue work, only to notify readiness (unless IOPOLL is used).
Newer opcodes like IORING_OP_SEND_ZC (Linux 6.0+) and IORING_OP_RECV_ZC (Linux 6.1+) map user pages directly to the NIC via the driver, eliminating the final data copy while retaining the robust kernel TCP stack.
While iouring optimizes the _interface, XDP (eXpress Data Path) optimizes the path. It allows running eBPF programs in the driver, before the OS allocates heavy metadata structures (sk_buff).
Packets traverse the full kernel stack. Heavy sk_buff allocation and copying overhead.
AF_XDP is a socket that leverages XDP to redirect packets into a shared memory region (UMEM) accessible by userspace. This provides near-raw hardware performance (like DPDK) while remaining compatible with standard Linux tools.
The selection between these technologies depends on your specific constraint: Interface overhead (use io_uring) vs. Stack overhead (use AF_XDP).
Standard: ~1200 cycles/pkt
AF_XDP: ~200 cycles/pkt
The Linux networking landscape has evolved. io_uring is the future standard for Web Servers and Databases needing compliant TCP/IP. AF_XDP is the modern replacement for DPDK in high-performance packet processing. Together, they allow Linux to scale to 200Gbps+ without abandoning the safety of the kernel.