Pseudo-Headers: A Source of Controversy
UNH-IOL Staff - July 3, 2012
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) constitute the vast majority of packets the average Internet user sends and receives. Inevitably, errors will at some point work themselves into the data being transferred from one node to another. Errors might originate from one’s internal networking interface, some intermediate device (routers for instance), noise on the line, or what have you. To guard against this, error detection measures are necessarily implemented.
As one element of their error detection schemes, the UDP and TCP protocols both contain a sixteen-bit checksum field. The general idea behind the derivation of this checksum is to sum all the bytes in the packet then find the complement of this sum1. The result is then inserted into the packet’s previously vacant checksum field and the packet is sent out. Following suit, the receiving end sums up the packet’s bytes, and should arrive at an answer of all binary ones. If the receiver does not achieve this result, something malignant has occurred during the transmission process and the packet must be re-sent.
It just so happens that the checksum is not calculated exclusively over the UDP/TCP packet itself however, but rather on the packet plus something known as a ‘pseudo-header’. First things first, UDP/TCP packets each may be broken down into a payload and a header – the header contains information important in transmission as well as in interpreting the protocol and payload. The pseudo-header is redundant data drawn down from the IP (Internet Protocol) layer in which TCP and UDP packets are sent. Data such as the destination and source IP addresses is included1. This pseudo-header is entirely included in the summation process of the checksum calculation.
The pseudo-header is useful for a couple of reasons, perhaps most importantly in the circumstance that one’s packets are inadvertently delivered to the wrong destination address. In theory, only the destination with the correct IP address will be able to correctly verify the checksum.
In today’s day and age these routing issues are easily mitigated. Even if IP routing errors weren’t terribly infrequent, networking devices using TCP packets necessarily perform a multi-step ‘handshake’ before sending and receiving actual data2, and UDP packets often transmit in short bursts of identical packets (multi-packet, systematic IP routing errors would be generally unlikely). Each TCP packet also contains a sequence field and an acknowledge field, delineating the exact progression of transmitted packets2. If a packet were to get lost, its loss would not go unnoticed. At the same time, the lost packet itself would be ignored by the undeserving recipient.
It is the IP layer’s job to send and receive to the correct IP addresses and the pseudo-header is a rare breach of the Internet Protocol Suite’s layered system. With NAT (Network Address Translation), this breach complicates things significantly. NAT was originally invented to deal with the impending shortage of IPv4 addresses. Routers using NAT actually alter the IP address information in the IP header3. After an address translation not only does the checksum in the IPv4 header need to be recalculated, but the check sequence in the link layer above the Internet layer (such as the Ethernet layer’s powerful 32-bit cyclic redundancy check) must be recalculated. With the pseudo-header’s layer breach, every time the IP address information is altered, the UDP/TCP packet checksum must also be recalculated.
Of course, checksums are not perfect error detection schemes. Cyclic redundancy checks are much more accurate, but require significantly more overhead to compute. The virtually lengthened packet (as a result of the pseudo-header) actually increases the likelihood of an undetected error (albeit by a very, very small amount – still, a step in the wrong direction)4,5.
One intention behind the creation of the pseudo-header was to prevent man-in-the-middle attacks6. Such an attack being a circumstance where an IP route is corrupted and traffic redirected to an attacker. Unfortunately, this intent is also rather irrelevant since the attacker can easily spoof his IP address or even spoof his MAC address (in response to an ARP request for instance)7. TCP and UDP packets themselves are not encrypted on the transport layer (or else NAT, essentially a man-in-the-middle scheme, would not work), meaning they are easily maliciously modified.
All said and done: is the pseudo-header completely useless? Not exactly. Does it have a clear purpose? Is it worth the overhead? Perhaps the elimination of an IPv6 check sequence is a telling change. My personal opinion: no.
Annotations
[1] Pages 3-4 contain a guide to deriving the TCP pseudo-header and calculating the checksum (using 1s-complement addition): http://www.personal.uni-jena.de/~p6lost2/DC/software/tutorials/TCP_IP_checksum.pdf
[2] Reference material for the TCP handshake and sequence/acknowledgment fields: http://www.firewall.cx/networking-topics/protocols/tcp/134-tcp-seq-ack-numbers.html
[3] A little bit on Network Address Translation: http://www.hasenstein.com/linux-ip-nat/diplom/node4.html
[4] Pages 10-12 contain IEEE 802.3 standards for various ethernet speed bit error rates: http://www.national.com/AU/design/courses/120/120_basics_of_ethernet_networking.pdf
[5] A comprehensive analysis of the accuracy of the checksum field field in detecting bit errors. Page 14 for the TCP packet’s 1s-complement method, though page 18 may be of additional interest (the ethernet header uses a cyclic redundancy code). The TCP checksum catches all possible 1-bit errors and around 97% of all 2-bit errors: http://www.ece.cmu.edu/~koopman/thesis/maxino_ms.pdf
[6] On the prevention of middle-man attacks in the creation of the pseudo-header: http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html
[7] Reality of the ease of a man-in-the-middle attack (including IP and MAC address spoofing): http://it.toolbox.com/wiki/index.php/Man-in-the-Middle_Attack
Ansel Renner, Research and Development
