Network stack

Our definition of a distributed system mentions a network as a medium for communication. This medium is a very interesting and complex subject per se but here we only review some core concepts relevant to our study.

Physically nodes (or hosts, think, computers) in the network are connected by means of network interface controllers (NIC, also network adapter) - a special hardware1 allowing a computer to connect to the network via a cable (Ethernet) or a radio (WiFi)2. This physical layer is called a link in TCP/IP stack3. Nodes at the link layer are identified by Media Access Control Address (MAC) - a unique hardware identifier of every network interface of the host4.

We can use ifconfig (or more modern ip addr and ip link show) utility to manage network interfaces. We just describe the most common data:

  • <UP,BROADCAST,RUNNING,MULTICAST> means that the interface supports broadcasting and multicasting, is loaded (UP as opposite to DOWN), is ready to accept connections (RUNNING). Other popular options are LOOPBACK (it is a loopback interface) and PROMISC (interface is in the promiscuous mode when it captures all packets in the network).
  • mtu 1500 - Maximum Transmission Unit is 1500 bytes. So IP packet greater than 1500 bytes will be fragmented.
  • ether bc:d0:74:ec:76:17 means MAC is bc:d0:74:ec:76:17
  • inet 192.168.43.174 netmask 0xffffff00 broadcast 192.168.43.255 means IPv4 address 192.168.43.174 with network mask 255.255.255.0 so we can have hosts 192.168.43.1-254, 192.168.43.255 is a broadcast address and 192.168.43.0 is the network itself.
  • inet6 fe80::88f:5ab4:1ad8:eed5 prefixlen 64 secured scopeid 0xe means IPv6 address fe80::88f:5ab4:1ad8:eed5

So we have a network of nodes (also called LAN - local area network5). How do we interconnect networks (to get WAN - wide area network) so that we can send a packet with data (called a datagram) between two hosts from different networks? Here a routing problem arises. Computers called routers6 are relaying datagrams from source to destination accross networks borders7. But how do we distinguish a host in one network from a host from another network? We need some addressing at an internet layer. IPv4 addressing assigns each host an IP address of the form xx.xx.xx.xx (4 octets of 8 bits each). But with 2^32 unique IP addresses we cannot mark all hosts so IPv6 was developed (128 bits). IPv4 address exaustion was partially mitigated by so called Network Address Translation (NAT) and private IPs8 which results in violation of the end-to-end principle9. Current adoption of IPv6 you can check with Google data. Internet layer is independent of the network topology and the physical way of hosts connection.

Within LAN we need to establish a mapping between an IP addresses of hosts and their corresponding hardware addresses (MAC) to send packet from one host to another. Such a mapping is called ARP table (or ARP cache) and is filled at each host with an Address Resolution Protocol (ARP) broadcast messages. In Linux/Unix ARP table can be manipulated with an arp utility. For IPv6 Neighbor Discovery Protocol handles that.

Routing is mapping of IP addresses to interfaces and can be displayed with netstat -rn. You can trace packet path with tracepath.

After we identified a route between to hosts, we can use a transport protocol to, at least, specify ports on hosts to connect specific application processes running on hosts involved. Because internet layer is only responsible for routing and for reliability of communication, transport layer’s protocols can also offer some reliability mechanisms like congestion control, preserves data ordering, eliminate packet loss, provides packet deduplication.

So we have a connection between hosts and we can exchange application data over the transport protocol.

1

Network interface can also be virtual to allow userspace applications to intercept raw packets (see Universal TUN/TAP device driver). Also the famous loopback interface (127.0.0.1 for IPv4 and ::1 for IPv6) is virtual.

2

we consider only packet networks where the same communication channel can be shared between several communication sessions by means of packetizing data. In contrast, in circuit networks (for example, public switched telephone network, PSTN) the channel capacity is completely used only by a single communication session between two nodes.

3

Note that common reference to TCP/IP stack (also called Internet protocol suite) includes not only Internet Protocol and Transmission Control Protocol (TCP) but also other protocols such as a User Datagram Protocol (UDP) and QUIC at a transport layer and Internet Protocol Security (IPSec) at an internet layer. TCP/IP stack as a practical network stack predates the OSI theoretical model for general networks. Since the adoption of TCP/IP in ARPANET in 1983 several proprietary implementations of the stack for different operating systems emerged. But the TCP/IP popularity increased when the University of California at Berkley had open sourced its implementation for BSD Unix becoming famous BSD sockets (see Wikipedia Internet protocol suite). Among alternatives to TCP/IP were IPX/SPX and AppleTalk.

4

Many network interface controllers allow to overwrite its MAC. In this case the MAC should be unique within the network where the NIC lives. Moreover, software defined MACs are used in VM managed by hypervisors such as QEMU and MAC randomization is gaining popularity to avoid mobile device tracking (see Wikipedia MAC address)

5

the main criteria here is that nodes are physically located nearby. See also Computer Networking Introduction: Ethernet and IP (Heavily Illustrated) by Ivan Velichko

6

Do not confuse with network bridges. Routers allow separate networks to communicate while bridges join networks making them a single network. While routers interconnect networks at the internet layer, a gateway is a more general term - gateways can interconnect PSTN (Public switched telephone network) and VoIP network acting as VoIP gateway or even the internet and satellites orbiting Earth acting as Internet-to-orbit gateway. A default gateway is the destination of IP packets if the destination IP of the packet doesn’t belong to the network mask.

7

When a packet crosses a border of networks it is called a hop. The time-to-live (TTL) in IPv4 (and hop limit in IPv6) prevents infinite cycling of an IP packet between networks as it is decreased by 1 at every router the packet visits.

8

Private IPs are defined to belong to the following subnets: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Number after the slash is the network prefix (CIDR notation) and denotes number of bits which is common for all hosts in the network. So, for example, 192.168.0.0/16 means that the first 16 bits are the same for all hosts and 192.168 in decimal and the rest 16 bits of total 32 bits define the host (2^16 possible addresses in the network). To be correct, the total number of possible addresses should be decreased by two as all binary zeroes hosts denotes the network itself, while all binary ones host is the broadcast address. So when you encounter an ip 192.168.12.134, you already know that this ip is not reachable publicly (from the Internet), it is some internal private host. For IPv6 private ip address is not desirable because its main goal is to provide globally unique addressing. So for the LAN use a unique local unicast address starts with a prefix fd (8 bits), then 40 bits of global id (choosen randomly by the operator of the network), then 16 bits for a subnet and the rest 64 bits define the host. So with IPv6 local private ips are essentially globally unique if 40 bits of global id indeed are random and this global uniqueness of local addresses allows to merge two networks without reassigning addresses or using NATs. For IPv6 there is no broadcast - there is only a multicast - when packets are received only by hosts which declared an interest in specific packets.

9

End-to-end principle means that end hosts themselves (and not other parties) participating in the communication are responsible for communication reliability and security. Lower routing layer is connectionless, i.e. each packet goes its own route rather than a prearranged one. Interesting fact: the principle was pioneered by CYCLADES - an INRIA network started the same time as the well-known ARPANET.