Chapter 3 – Review of TCP/IPv4
This chapter is a brief review of TCP/IPv4, the foundation of the First Internet. Its purpose is to help you understand what is new and different in TCP/IPv6. It is not intended to be comprehensive. There are many great books listed in the bibliography if you wish to understand TCP/IPv4 at a deeper level. The reason it is relevant is because the design of IPv6 is based heavily on that of IPv4. First, IPv4 can be considered one of the great achievements in IT history, based on its worldwide success, so it was a good model to copy from. Second, there were several attempts to do a new design “from the ground up” with IPv6 (a “complete rewrite”). These involved really painful migration and interoperability issues. You need to understand what the strengths and weaknesses of IPv4 are to see why IPv6 evolved the way it did. You can think of IPv6 as “IPv4 on steroids”, which takes into account the radical differences in the way we do networking today, and fixing problems that were encountered in the first 27 years of the Internet, as network bandwidth and number of nodes increased exponentially. We are doing things over networks today than no one could have foreseen a quarter of a century ago, no matter how visionary they were.
3.1 – Network Hardware
There are many types of hardware devices used to construct an Ethernet network running TCP/IP protocols. These include nodes, NICs, cables, hubs, switches, routers and firewalls.
A node (sometimes called a host) is a device (usually a computer) that can do processing and has some kind of physical connection (wired or wireless) to a network. Examples of nodes are: desktop computers, notebook computers, netbooks, smart phones, smart switches, routers, network printers, network aware appliances, and so on. A node could be as simple as a temperature sensor, with no display and no keyboard, just a connection to a network. It could possibly have a management interface accessed via the network (e.g. with Telnet, SSH or web). All nodes on a network must have at least one IP address (per interface). If a node has multiple interfaces connected to different networks, and the ability to forward packets between them, then it is called a gateway. Routers and firewalls are special types of gateways that can forward packets across networks and or control traffic in various ways. Gateways make it possible to build internetworks. They are described in more detail under IPv4 Routing in this chapter.
A NIC (or Network Interface Card) is the physical interface that connects a node to a network. It may also be called an Ethernet adapter. It should have a female RJ-45 connector on it (or possibly coax or fiber optic connector). It could be an actual add-in PCI card. It could be integrated on the device’s motherboard. It could also be a something that makes a wireless connection to a network, using Wi-Fi, WiMAX or other standard. Typically all NICs have a globally unique, hard-wired MAC address (48 bits long, assigned by the manufacturer). A node can have one or more NICs (also called interfaces). Each interface can be assigned one or more IP addresses, and various other relevant network configuration items, such as the address of the default gateway and the addresses of the DNS servers.
Network cables today are typically unshielded twisted pair (UTP) cables that actually have four pairs of plastic coated wires, with each pair forming a twisted coil. They have RJ-45 male connectors on each end. They could also be fiber optic cables for very high speed or long run connections. Often today, professional contractors install UTP cables through the walls, and bring them together at a central location (sometimes called the wiring closet) where they are connected together to form a star network. Cables typically are limited to 100 meters or less in length, but the maximum acceptable length is a factor of several things, such as network speed and cable design. Modern cables rated as “CAT5” or “CAT5E” are good up to 100 Mbit/sec, while cables rated as “CAT6” are good up to a gigabit. Above that speed, you should be using fiber optic cables. It is also possible for twisted pair cables to be shielded if required to prevent interference from (or with) other devices.
A network hub is a device that connects multiple cables together so that any packet transmitted by a node connected to that hub is replicated to all the other nodes connected to the hub. It typically has a bunch of female RJ-45 connectors in parallel. In effect it ties together the network cables plugged into it into a star network. Hubs have a speed rating, based on what speed Ethernet they support. Older hubs might be only 10 Mbit/sec. More recent ones might be 10/100, which means they support both 10 Mbit/sec and 100 Mbit/sec (or “Fast Ethernet”). If you have 5 nodes (A,B,C,D and E) connected together with a hub, and node B sends a packet to node D, all nodes, including A, C and E will see the traffic. The nodes not involved in the transaction will typically just ignore it. This dropping of packets not addressed to a node is often done by the hardware in the NIC, so that it never interrupts the software driver. Many NICs have the ability to be configured in promiscuous mode. When in this mode, they will accept packets (and make them available to any network application) whether those packets are addressed to this node or not. If this mode is selected, the dropping of packets not addressed to you must be done in software. However sometime you want to see all traffic on the subnet. For instance, this would be useful with Intrusion Detection, for diagnostic troubleshooting, or for collecting network statistics. Hubs come in various sizes, from 4 ports up to 48 ports, and can even be coupled with other hubs to make large network “backbones”. You can also have a hierarchy of hubs, where several hubs distributed around a company actually connect in to a larger (and typically faster) central hub. Actually, hubs are quite rare today, most such devices today are actually switches.
A network switch is similar to a network hub, but has some control logic in it that limits traffic to go out only ports that have nodes that are involved in a transmission. Again, say you have a switch with cables going to nodes A,B,C,D and E. If B sends a packet to D, that packet will not be seen on the ports to which A, C and E are connected. This holds down excessive traffic that would normally just be dropped anyway, and can help prevent broadcast storms. It also provides a small degree of privacy, even if someone enables their NIC in promiscuous mode. In order to do this, switches must snoop on the MAC addresses of packets and maintain tables of MAC addresses. Most switches are oblivious to IP addresses – they work only with MAC addresses. Because of this, they are IP version agnostic. This means they will carry IPv4 or IPv6 traffic (or even other kinds of Ethernet traffic), so long as it uses MAC addresses.
If you are using a switch, but one of your connected nodes really does want to see all traffic on a network segment, some switches have a mirror port function that will still allow all traffic from the entire switch to be copied to one port, to which you connect the node that wants to monitor all traffic. Typically this must be configured, which requires a management interface of some kind. If you configured all ports on a switch to act as “mirror ports”, you would effectively have a hub. Like hubs, switches come in various speeds, from 10 Mbit/sec up to 1000 Mbit/sec (Gigabit). They also come in sizes from 4 ports up to 48 ports, and better ones can be “stacked” (linked together) to build larger network backbones. Lower end (cheaper) switches may have few if any configuration options, and may not even have a user interface. Smart (or managed) switches typically have a sophisticated GUI management interface (accessible via the network, usually over HTTP) that allows you to configure various things and/or monitor traffic. They also typically include support for monitoring or control using SNMP (Simple Network Monitoring Protocol). Very advanced switches have the ability to configure VLANs (Virtual LANs), which allow you to effectively create multiple sub-switches that are not connected together.
3.2 – RFCs: The Internet Standards Process
Anyone studying the Internet, or developing applications for it, must understand the RFC system. RFC stands for Request For Comments. These are the documents that define the Internet Protocol Suite (the official name for TCP/IP) and many related topics. Anyone can submit an RFC. Ones that are part of the Standards Track are usually produced by IETF (Internet Engineering Task Force) working groups. Anyone can start or participate in a working group. Submitted RFCs begin life as an Internet Draft, each of which has a lifespan of six months or less. Most drafts go through considerable peer review, and possibly several revisions, before they are approved, are issued an official number (e.g. 793) and become part of the official RFC collection. There are other kinds of documents in addition to the Standards Track, including information memos (FYI), humor (primarily ones issued on April 1) and even one obituary, for Jon Postel, the first RFC Editor and initial allocator of IP addresses, RFC 2468, “I Remember IANA”, October 1998. There is even an RFC about RFCs, RFC 2026, “The Internet Standards Process, Revision 3”, October 1996. That is a good place to start if you really want to learn how to read them.
The Internet standards process is quite different from the standards process of ISO (The International Standards Organization) that created the OSI network specification. ISO typically develops large, complex standards with multiple 4 year cycles, with hundreds of engineers and much political wrangling. This was adequate for creating the standards for the worldwide telephony system, but is far too slow and hidebound for something as freewheeling and rapidly evolving as the Internet. The unique standards process of the IETF is one of the main reasons that TCP/IP is now the dominant networking standard worldwide. By the time OSI was specified, TCP/IP was already created, deployed, and being revised and expanded. OSI never knew what hit it.
Learning to read RFCs is an acquired skill, one that anyone serious about understanding the Internet, and most developers creating things for it, should master. There are certain “terms of art”, like the usage of MUST, SHOULD, MAY and NOT RECOMMENDED that are precisely defined and used. As an example, the IPv6 Ready Silver (Phase 1) tests examine only the MUST items from relevant RFCs, but the IPv6 Ready Gold (Phase 2) tests also examine all of the SHOULD items.
RFCs are readily available to anyone for free. Compare this to the ISO standards, which can cost over $1000 for a complete set of “fascicles” for something like X.500. Today you can obtain RFCs easily in various formats by use of a search engine such as Google or Yahoo. The “official” source is the URL:
http://www.rfc-editor.org/rfc/rfcXXXX.txt (where XXXX is the RFC number)
There is also an official RFC search page, where you can search for phrases (like “TCP”) in different tracks, such as RFC, STD, BCP or FYI, or all tracks. You can retrieve the ASCII or PDF versions. It is at:
https://www.rfc-editor.org/rfcsearch.html
There are over 5500 RFCs today. I have included many references to the relevant RFCs in this book. If you want to see all the gory details on any subject, go right to the source and read it. You may find it somewhat tough going until you learn to read “RFC-ese”. A number of books on Internet technology are either just a collection of RFCs, or RFCs make up a large part of the content. There is no reason today to do that – anyone can download all the RFCs you want, and have them in soft (searchable) form.
Most of the topics covered in this book also have considerable coverage on the Internet outside of the RFCs, such as in Wikipedia. Again, if you want to drill deeper in any of these topics, crank up your favorite search engine and have at it. The information is out there. What I’ve done is to try to collect together the essential information in a logical sequence, with a lot of explanations and examples, plus all the references you need to drill as deep as you like. I taught cryptography and Public Key Infrastructure for VeriSign for two years, so I have a lot of experience trying to explain complex technical concepts in ways that reasonably intelligent people can easily follow. Hopefully you will find my efforts worthwhile.
3.3 – TCP/IPv4
The software that made the First Internet (and virtually all Local Area Networks) possible has actually been around for quite some time. It is actually a suite (family) of protocols. The core protocols of this suite are TCP (the Transmission Control Protocol) and IP (Internet Protocol), which gave it its common name, TCP/IP. Its official name is The Internet Protocol Suite.
TCP was first defined officially in RFC 675, “Specification of Internet Transmission Control Program”, December 1974. The protocol described in this document does not look much like the current TCP, and in fact, the Internet Protocol did not even exist at the time. Jon Postel was responsible for splitting the functionality described in RFC 675 into two separate protocols, (the new) TCP and IP. RFC 675 is largely of historical interest now. The modern version of TCP was defined in RFC 795, “Transmission Control Protocol – DARPA Internet Program Protocol Specification”, September 1981 (seven years later). It was later updated by RFC 1122, “Requirements for Internet Hosts – Communication Layers” October 1989, which covers the link layer, IP layer and transport layer. It was also updated by RFC 3168, “The Addition of Explicit Congestion Notification (ECN) to IP”, September 2001, which adds ECN to TCP and IP.
Both of these core protocols, and many others, will be covered in considerable detail in the rest of this chapter.
3.3.1 – Four Layer TCP/IPv4 Architectural Model
Unlike the OSI network stack, which really does have seven layers, the DoD network model has four layers, as shown below:
Figure 3.3-a: Four Layer TCP/IPv4 Model
It just confuses the issue to try to figure out which of the seven OSI layers the layers of TCP/IP fit into. It is simply not applicable. It’s like trying to figure out what color “sweet” is. The OSI seven layer model did not even exist when TCP/IP was defined. Unfortunately, many people use terms like “layer 2” switches versus “layer 3” switches. These refer to the OSI model. Books from Cisco Press are particularly adamant about using OSI terminology. Unfortunately hardly anyone is using actual OSI networks today. In this book we will try to consistently use the four layer model terminology, while referring to the OSI terminology when necessary for you to relate the topic to actual products or other books.
The Application Layer implements the protocols most people are familiar with (e.g. HTTP). The software routines for these are typically contained in application programs such as browsers or web servers that make “system calls” to subroutines (or “functions” in C terminology) in the “socket API” (an API is an Application Program Interface, or a collection of related subroutines, typically supplied with the operating system or C programming language compiler). The application code creates outgoing data streams, and then calls routines in the API to actually send the data via TCP (Transmission Control Protocol) or UDP (User Datagram Protocol). Output to Transport Layer: [DATA] using IP addresses.
The Transport Layer implements TCP (the Transmission Control Protocol) and UDP (the User Datagram Protocol). These routines are internal to the Socket API. They add a TCP or UDP packet header to the data passed down from the Application Layer, and then pass the data down to the Internet Layer for further processing. Output to Internet Layer: [TCP HDR[DATA]], using IP addresses.
The Internet Layer implements IP (the Internet Protocol) and various other related protocols such as ICMP (which includes the “ping” function among other things). The IP routine takes the data passed down from the Transport Layer routines, adds an IP packet header onto it, then passes the now complete IPv4 packet down to routines in the Link Layer. Output to Link layer: [IP HDR[TCP HDR[DATA]]] using IP addresses.
The Link Layer implements protocols such as ARP that convert IP addresses to MAC addresses. It also contains routines that actually read and write data (as fed down to it by routines in the Internet Layer) onto the network wire, in compliance with Ethernet or other standards. Output to wire: Ethernet packet using MAC addresses (or the equivalent if other network hardware is used, such as Wi-Fi).
Each layer “hides” the details (and/or hardware dependencies) from the higher layers. This is called “levels of abstraction”. An architect thinks in terms of abstractions such as roofs, walls, windows, etc. The next layer down (the builder) thinks in terms of abstractions such as bricks, glass, mortar, etc. Below the level of the builder, an industrial chemist thinks in terms of formulations of clay or silicon dioxide to create bricks and glass. If the architect tried to think at the chemical or atomic level, it would be very difficult to design a house. His job is made possible by using levels of abstraction. Network programming is analogous. If application programmers had to think in terms of writing bits to the actual hardware, things like web browsers would be almost impossible. Each network layer is created by specialists that understand things at their level, and lower layers can be treated as “black boxes” by people working at higher layers.
Another important thing about network layers is that you can make major changes to one layer, without impacting the other layers much at all. The connections between layers are well defined, and don’t change (much). This provides a great deal of separation between the layers. In the case of IPv6, the Internet layer is almost completely redesigned internally, while the Link Layer and Transport Layer are not affected much at all (other than providing more bytes to store the larger IPv6 addresses). If your product is “IPv6 only”, that’s about the only change you would need to make to your application software (unless you display or allow entry of IP addresses). If your application is “dual stack” (can send and receive data over IPv4 or IPv6), then a few more changes are required in the application layer (e.g.
to accept multiple IPv4 and IPv6 addresses from DNS and try connecting to one or more of them based on various factors, or to accept incoming connections over both IPv4 and IPv6). This makes it possible to migrate (or “port”) network software (created for IPv4) to IPv6 or even dual stack with a fairly minor effort. In comparison, changing network code written for TCP/IP to use OSI instead would probably involve a complete redesign and major recoding effort.
3.3.2 – IPv4: The Internet Protocol, Version 4
IPv4 is the foundation of TCP/IPv4 and accounts for many of its distinguishing characteristics, such as its 32-bit address size, its addressing model, its packet header structure and routing. IPv4 was first defined in RFC 791 “Internet Protocol”, September 1981.
The following standards are relevant to IPv4:
3.3.2.1 – IPv4 Packet Header Structure
So what are these packet headers mentioned above? In TCP/IPv4 packets, there is a TCP (or UDP) packet header, then an IPv4 packet header, then the packet data. Each header is a structured collection of data, including things such as the IPv4 address of the sending node, and the IPv4 address of the destination node. Why are we getting down to this level of detail? Because some of the big changes from IPv4 to IPv6 have to do with the new and improved IP packet header architecture in IPv6. In this chapter, we’ll cover the IPv4 packet header. Here it is:
Figure 3.3-b: IPv4 Packet Header
The IP Version field (4 bits) contains the value 4, which in binary is “0100” (you’ll never guess what goes in the first 4 bits of an IPv6 packet header!).
The Header Length field (4 bits) indicates how long the header is, in 32 bit “words”. The minimum value is “5” which would be 160 bits, or 20 bytes. The maximum length is 15, which would be 480 bits, or 60 bytes. If you skip that number of words from the start of the packet, that is where the data starts (this is called the “offset” to the data). This will only ever be greater than 5 if there are options before the data part (which is not common).
The Type of Service field (8 bits) is defined in RFC 2474, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 headers”, December 1998. This is used to implement a fairly simple QoS (Quality of Service). QoS involves management of bandwidth by protocol, by sender, or by recipient. For example, you might want to give your VoIP connections a higher priority than your video downloads, or the traffic from your boss higher priority than your co-worker’s traffic. Without QoS, bandwidth is on a first-come-first served basis. 8 bits is not really enough to do a good job on QoS, and DiffServ is not widely implemented in current IPv4 networks. QoS is greatly improved in IPv6.
The Total Length field (16 bits) contains the total length of the packet, including the packet header, in bytes. The minimum length is 20 (20 bytes of header plus 0 bytes of data), and the maximum is 65,535 bytes (since only 16 bits are available to specify this). All network systems must handle packets of at least 576 bytes, but a more typical packet size is 1508 bytes. With IPv4, it is possible for some devices (like routers) to fragment packets (break them apart into multiple smaller packets) if required to get them through a part of the network that can’t handle packets that big. Packets that are fragmented must be reassembled at the other end. Fragmentation and reassembly is one of the messy parts of IPv4 that got cleaned up a lot in IPv6.
The Identification (Fragment ID) field (16 bits) identifies which fragment of a once larger packet this one is, to help in reassembling the fragmented packet later. In IPv6 packet fragmentation is not done by intermediate nodes, so all the header fields related to fragmentation are no longer needed.
The next three bits are flags related to fragmentation. The first is reserved and must be zero (an April Fool’s RFC once defined this as the “evil” bit). The next bit is the DF (Don’t Fragment) flag. If DF is set, the packet cannot be fragmented (so if such a packet reaches a part of the network that can’t handle one that big, that packet is dropped). The third bit is the MF (More Fragments) flag. If MF is set, there are more fragments to come. Unfragmented packets of course have the MF flag set to zero.
The Fragment Offset field (13 bits) is used in reassembly of fragmented packets. It is measured in 8 byte blocks. The first fragment of a set has an offset of 0. If you had a 2500 byte packet, and were fragmenting it into chunks of 1020 bytes, you would have three fragments as follows:
The Time To Live (TTL) field (8 bits) is to prevent packets from being shuttled around indefinitely on a network. It was originally intended to be lifetime in seconds, but it has come to be implemented as “hop count”. This means that every time a packet crosses a switch or router, the hop count is decremented by one. If it reaches zero, the packet is dropped. Typically if this happens, an ICMPv4 message (“time exceeded”) is returned to the packet sender. This mechanism is how the traceroute command works. Its primary purpose is to prevent looping (packets running around in circles).
The Protocol field (8 bits) defines the type of data found in the data portion of the packet. Protocol numbers are not to be confused with ports. Some common protocol numbers are:
1 ICMP Internet Control Message Protocol (RFC 792)
6 TCP Transmission Control Protocol (RFC 793)
17 UDP User Datagram Protocol (RFC 768)
41 IPv6 IPv6 tunneled over IPv4 (RFC 2473)
83 VINES Banyan Vines IP
89 OSPF Open Shortest Path First (RFC 1583)
132 SCTP Streams Control Transmission Protocol
The Header Checksum field (16 bits). The 16-bit one’s complement of the one’s complement sum of all 16 bit words in the header. When computing, the checksum field itself is taken as zero. To validate the checksum, add all 16 bit words in the header together including the transmitted checksum. The result should be 0. If you get any other value, then at least one bit in the packet was corrupted. There are certain multiple bit errors that can cancel out, and hence bad packets can go undetected. Note that since the hop count (TTL) is decremented by one on each hop, the IP Header checksum must be recalculated at each hop. The IP Header Checksum was eliminated in IPv6.
The Source Address field (32 bits) contains the IPv4 address of the sender (may be modified by NAT). The Destination Address field (32 bits) contains the IPv4 address of the recipient (may be modified by NAT in a reply packet).
Options (0 to 40 bytes) Not often used. These are not relevant to this book. If you want the details, read the RFCs.
Data – (variable number of bytes) The data part of the packet – not really part of the header. Not included in the IP Header checksum. The number of bytes in the data field is the value of ‘Total Length’, minus the value of ‘Header Length’.
3.3.2.2 – IPv4 Addressing Model
In IPv4, addresses are 32 bits in length. They are simply numbers from 0 to 4,294,967,295. For the convenience of humans, these numbers are typically represented in dotted decimal notation. This splits the 32 bit addresses into four 8-bit fields, and then represents each 8-bit field with a decimal number from 0 to 255. These decimal numbers cover all possible 8 bit binary patterns from 0000 0000 to 1111 1111. The decimal numbers are separated by “dots” (“.”). Leading zeros can be eliminated. The following are all valid IPv4 addresses in dotted decimal:
123.45.67.89 A globally routable address
10.3.1.51 A private address
255.255.255.255 The broadcast address
127.0.0.1 The loopback address for IPv4
IPv6 addresses use a simpler scheme I call coloned hex notation. If something similar were used with IPv4, the address 192.168.1.2 would look like c0:a0:1:2, and the subnet mask 255.255.255.240 would look like ff:ff:ff:f0. Hexadecimal is also called base 16. Hexadecimal is just like decimal, if you have 6 extra fingers! Instead of the ten decimal digits 0,1,2,3,4,5,6,7,8 and 9, hexadecimal has 16 “digits”, which are 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E and F. Both systems use “place value notation”, but decimal is based on powers of 10 (1, 10, 100, 1000, etc), while hexadecimal is based on powers of 16 (1, 16, 256, 4096, etc). For example, 123 decimal is 1 x 100 + 2 x 10 + 3. 123 hexadecimal is 1 x 256 + 2 x 16 + 3. One of the advantages of hexadecimal is extremely simple conversion to and from binary. Each hexadecimal digit converts to (or from) exactly 4 binary digits (bits), from 0000, 0001, 0010, up to 1111