1-2: The TCP/IP family of Internet protocols – Bioinformatics Web Development

The Internet is populated by a great variety of different hardware. Dozens of different PC brands are available and around. More importantly, many different operating systems are available and installed on the various computers. Among the best known, Windows (in different flavors and versions), MacOS, Linux, Unix, Solaris, the BSD family (FreeBSD, OpenBSD, NetBSD etc…) and others.

Clients and Servers

Some of these computers are “just” clients, that are used as terminals to access resources and data provided by other computers, the servers (Figure 1-2-1).

Figure 1-2-1: The client-server paradigm – Source: cellbiol.com

Operating Systems

Clients operating systems

Some operating systems are natively more suitable for clients (windows, macos) while others are typically in use on servers (Linux). Still, we can find all these operating systems both on clients and on servers, although with different shares.

It is interesting to note that according to Netmarketshare reports, on the client side Linux is still lagging behind (around 2% as of April 2017), while Windows is dominating with a share larger than 90% of total clients. MacOS has a fair “niche” share at 6%.

Desktop operating systems market share in April 2017. Source: Netmarketshare

The Netmarketshare data are in good overall agreement with more detailed statistics on the usage of various brands of OS from StatCounter, featured on a Wikipedia page dedicated to usage shares of operating systems, which shows that as of April 2017 Windows 7 is still the largely predominant OS on client computers with a 38,89% share. Windows 10, introduced in 2012, is catching up with a current share of 29,96%.

Web clients' OS family market share according to StatCounter for April 2017, as assessed by user agent information supplied by clients to web servers. — Figure 1-2-2: Web clients’ OS family market share according to StatCounter for April 2017, as assessed by user agent information supplied by clients to web servers. Source: Wikipedia

See how (compare the previous figure to the next) Windows XP, that scored at 21% in 2012, is slowly disappearing (now at 4%), but still viable despite the fact that it is now discontinued and does not get any updates anymore, not even security updates, making it an highly insecure and vulnerable OS at this time.

Figure 1-2-2: Usage share of operating system on client computers, as estimated by analysis on wikimedia traffic – 2012 – Source: Wikipedia

It is fascinating to witness the moment, around August 2016, in which operating systems for mobile phones (Android and iOS) overtake Windows 7, used on desktop computers, as assessed by the user agents of devices visiting the Wikipedia website.

Operating systems of devices visiting the Wikipedia website from April 2016 to April 2017. Source: Wikimedia

Server operating systems

A typical server rack. Image licence CC BY-SA 3.0. Source: Wikipedia. Author: Wikimedia user Jfreyre, likely Jerome Freyre, @j_freyre

Not surprisingly in the world of web servers the operating systems shares vary dramatically with respect to those of the desktop/clients. According to a recent survey by W3Techs, Linux jumps from the 1.6-2% mark for the desktops to a whooping 66% for servers (There are solid reasons for this, for one Linux is free and open source, that will be discussed in the Linux pages of this tutorial.). Windows server shares are at around 33%. MacOS, which is not in the server business anymore, is estimated to be at less than 0.1% (ref).

Server Operating Systems market shares according to W3Techs. MacOS, at less than 0.1%, does not even make it in the figure. — Web server OS market shares in May 2017, according to W3Techs. MacOS, at less than 0.1%, does not even make it in the figure, while Linux systems dominate.

With respect to the web server software itself (rather than the OS) used on web server computers, W3Techs provides the following data.

Web server software, May 2017. Source: W3Techs

This somewhat long overview underscores the variability of operating systems used on clients and server computers. How can such deeply diverse systems communicate and smoothly exchange data on the Internet? After all, it is common knowledge that a Windows .exe program will not work on a Mac, and that Mac software cannot be run on a PC, just to highlight one of the several differences between these OS.

The answer to this question is that both Macs and PCs (and Linux and all other OS used on computers and the wealth of other devices connected to the Internet nowadays) use a common language, a common set of rigidly defined rules, to communicate over the internet. This common language is indeed TCP/IP.

TCP/IP

TCP/IP, the Transmission Control Protocol/Internet Protocol, can be defined as a set of rules, or protocols, used to exchange data between hardware devices connected to the Internet, including (but not limited to) the client and servers computers. These rules include all the packets exchange mechanism described above, and a set of specific protocols such as SMTP, FTP, HTTP, DHCP and many more, each designed to allow the exchange a particular kind of data (e-mail, files, web pages, connection information etc..).

TCP/IP is subdivided in “layers”. The above mentioned data exchange protocols “live” in the so-called application layer, the upper layer of TCP/IP (Figure 1-2-5).

Figure 1-2-5: TCP/IP layers- Source: cellbiol.com

The files remounted from the individual packets are passed to the application layer by the transport layer, just underneath (Figure 1-2-5). The transport layer manages, among other things, the fragmentation of the files it receives from the application layer into packets, that are then passed to the Internet layer, and the remounting of packets received from the Internet layer (see Figures 1-2-5 and 1-2-6), to yield files that are passed to the application layer.

Figure 1-2-6: Data transmission over the Internet through TCP/IP- Source: cellbiol.com

Let us follow the path of an hypothetical file (an e-mail message, just as a matter of example) from computer A to computer B (Figure 1-2-6).

The e-mail file in computer A is processed at the application layer by one of the TCP/IP protocols, the Simple Mail Transfer Protocol (SMTP).

The file is then passed down to the Transport layer for the fragmentation in packets. Packets then travel down to the Internet layer and then to the link/physical layer. The physical layer is where the packets physically move, by using ethernet cables, satellites, fibers, wireless systems, depending on what is available between computer A and B. If the computers are in the same room, they might connect through a couple of ethernet cables with a router in between, or maybe wirelessly, if the computer are connected via wireless to the same router. If A and B are far away, the physical transmission mode might well be mixed (ethernet + fiber + wireless for example).

The packet, thanks to the routers, will find the correct route from A to B. It is worth reminding that not all packets will take the same route, as at any given time a particular route among the various available (connections on the Internet are redundant, remember?) might be better that another. A moment later, maybe when the next packet is sent, this might change and the best route will be different. At the end of the transmission, all the packets will be received from the TCP/IP software on computer B.

In computer B, packets will travel up from the physical layer, to the Internet layer and then to the transport layer that re-assembles the packets into the original file, the e-mail that was sent from computer A. The e-mail file can then be passed to the application layer of computer B where it can be handled and processed (most likely with the Post Office Protocol, POP, or the Internet Message Access Protocol, IMAP), so that the user of computer B will be able to read it.

This is of course a simplified scheme where quite a few details were intentionally omitted (did we mention there must be a mail server in between A and B for instance?), with the purpose on concentrating on the generic flow of TCP/IP packets between two computers. The same process description applies to any kind of data that travels from A to B through TCP/IP protocols.

TCP/IP Application Protocols

Here is a non-exhaustive list of Internet application protocols, many names will sound familiar:

SMTP: Simple Mail Transfer Protocol
SSH: Secure Shell
FTP: File Transfer Protocol
SFTP: Secure File Transfer protocol
SCP: Secure Copy Protocol
DHCP: Dynamic Host Configuration protocol
HTTP: Hyper Text Transfer Protocol
POP: Post Office Protocol (the version currently in use is 3, POP3)
IMAP: Internet Message Access Protocol

You can find an exhaustive list and detailed information here

Domain Name System: DNS, matching network IP addresses to domain names

Would you visit:

http://130.14.29.110

or would you rather visit:

http://www.ncbi.nlm.nih.gov/ ?

The Domain Name System (DNS) allows the translation of a domain name into the corresponding IP address. For the purpose of this tutorial, you can think of it as a big table with two columns and several rows. Each row contains a domain name in the first column and an IP address in the second column. Domain Name Servers can, on request, read this table and translate a domain name with the corresponding IP, that is then used for processing network requests. Domain names are a semantical layer that sit on top of the IP addresses system. DNS links the names layer with the IP layer.

DNS is used every time you (and the other billions of Internet users) visit a website by using a domain name. Every time the same people send an e-mail at an address like bill@domain-name.net (as opposed to bill@123.23.112.245). So it is one of the foundations of the Internet as we use it every day.

__________

“The Domain Name System (DNS) is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. A Domain Name Service resolves queries for these names into IP addresses for the purpose of locating computer services and devices worldwide. By providing a worldwide, distributed keyword-based redirection service, the Domain Name System is an essential component of the functionality of the Internet.”

Source: Wikipedia

_________

You may have noticed, on configuring internet access for a device that DNS is indeed one of the parameters you should somehow deal with. Sometimes DNS can be obtained automatically from your router or ISP through DHCP, other times it has to be configured manually. DNS is so important that you generally need to enter a primary DNS server and a secondary DNS server that will be used if the primary DNS fails.

Internet addresses or Uniform Resource Locators (URLs) will be discussed in detail in a subsequent chapter of this book.

The OSI model, MAC addresses and packets structure

The TCP/IP layers model is actually an evolution of the original layers model called the OSI (Open Systems Interconnection) model, which is divided in 7 layers instead of 4 and is still considered a reference for the TCP/IP model. In the image below we show how the layers of the two models relate to each other.

Comparison between the OSI and the TCP/IP models that show how the OSI layers relate to the TCP/IP layers. The formal parts of TCP/IP packets, segment, datagram and frame, are mapped to the respective layers. For the OSI model, layer numbers are indicated.

The OSI model provides an important framework to better understand the structure of the TCP/IP packets and how data are exchanged between connected devices within a network and across networks through physical networking devices, namely switches and routers, that will be discussed in more detail in the next section of this chapter. For now, let us layout some foundations by discussing MAC addresses and the structure of TCP/IP packets.

MAC addresses

A MAC address is “a unique identifier assigned to network interfaces for communications at the data link layer of a network segment” (Wikipedia). Therefore the MAC address is a property of a network interface rather that of a device itself (a computer, a router, your connected fridge or toaster).

If a computer has, for instance, two ethernet cards and a WiFi card, each one of those will have it’s own MAC address. Therefore three different MAC addresses can potentially be associated to this computer. The MAC address “of the computer” during an Internet connection will actually depend on which one of these three network interfaces will be used for the specific connection.

For example, the computer will use a different MAC address if connected through an ethernet cable rather than through WiFi.

A computer with three network interfaces: two ethernet cards and a WiFi card. The actual MAC address "of the computer" will depend on which one of the three interfaces is used for the connection. — A computer with three network interfaces: a WiFi card (1) and two ethernet cards (2 and 3). The actual MAC address “of the computer” will depend on which one of these three interfaces is used for the connection. The photo is ©cellbiol.com

There are different formats for MAC addresses. The most commonly used is referred to as MAC-48 and it is composed by 48 bits or 6 octects (bytes). It is generally seen in hexadecimal format, such as:

00:0a:95:9d:68:16

00-0a-95-9d-68-16

where the hexadecimal representation of each octect is separated by the next by either : or -.

It is very common that the MAC address of a network card is displayed on the corresponding device, or the card itself, through a dedicated label or tag. If the device has more than one NIC, several MAC addresses could be displayed, such as in the case of the SOHO router in the image below where both the MAC address for the WAN (Wide Area Network, the Internet) and the LAN (Local Area Network) network cards are indicated.

A label on the bottom side of a SOHO router indicates the MAC addresses of the WAN and wireless NICs — A label on the bottom side of a SOHO router indicates the MAC addresses of the WAN and LAN NICs

The MAC address is a physical address (as opposed to a “logical” address such as an IP address) that is assigned to the NIC (Network Interface Controller, a network card) by the manufacturer. Indeed, the first 3 octects (24 bits) of a MAC address generally identify the manufacturer (the brand of the device, CISCO, dell, HP etc…), while the remaining 24 are a unique identifier for the specific NIC.

In a MAC address the first 3 octects define the manufacturer (brand) of the network card, (Organizationally Unique Identifier, OUI) while the last 3 octects are a unique identifier for the NIC

In an home or small office environment, when a device connects to the SOHO (Small Office Home Office) router by either WiFi or through an Ethernet cable, it is typically automatically assigned an dynamic (that may be different at the next connection) IP address through the DHCP (Dynamic Host Control Protocol) protocol by the router. Within the network, this IP address will become associated to the MAC address of the network card of the device. Therefore, each device connected to a network will have both a physical address, the MAC address, and a virtual address, the IP address, that identify it inside the local network.

Comparison between MAC addresses and IP addresses

TCP/IP packets

In this last part of this section, let’s add a few details on the structure of the TCP/IP packets. The following is far from being an comprehensive analysis of how packets are structured and all the information they contain, please see the Wikipedia page on TCP for more details on this topic.

Full data (web pages, emails, etc…) are managed at the top layers of the OSI or TCP/IP models. In TCP/IP terms, the application layer. When data need to be sent from the application layer to another device on the network or the Internet, they are passed down to the transport layer (OSI layer 4). It is on this layer that data is segmented into several packets, and the packets journey begins. The packets produced by TCP in the transport layer are technically called “segments“. Each segment comprised a data part and a TCP header, with information about this data.

Segments are then passed down to the internet/network layer (OSI layer 3). This is where they are encapsulated in a so-called datagram. The datagram adds information about the source and destination IP addresses for the data.

Datagrams are then passed down to OSI link layer (OSI layer 2) where they are further incapsulated into a frame that contains information about the MAC address of the sender device, before going to the physical layer for transmission over the network.

This encapsulated structure of TCP/IP packets is summarized in the next figure.

The encapsulated structure of TCP/IP packets. In OSI layer 4 data is fragmented into segments. In OSI layer 3 each segment is encapsulated in a datagram with information on source and destination IP. In OSI layer 2 the packet is further encapsulated in a frame that contains the source and destination MAC addresses. If the destination of the packet resides outside of the local network, the destination MAC address is the one of the gateway (the router).

In the next section we will explore network hardware (switches and routers) and follow the journey of a TCP/IP packet across those devices, from one network to another.

Chapter Sections

[pagelist include=”36″]

[siblings]