A note on comparative understanding of Throughput and Latency
For computer networking devices two parameters rank very high in importance, throughput and latency. Their definitions create some confusion if not looked at more carefully.
Throughput: The rate of completing a specific action. For computer networks it’s typically measured in bits per second, bytes per second or packets per second. For instance, a layer 2 switch can boast throughput of 40Gbps. What it means is if you pump in 40Gbps traffic into the switch you’ll see 40 gigabits flowing out of the switch per second.
Latency: It’s defined as the time taken to complete an action. In the computer networking context, it’s the time taken to take a packet or PDU from one point to another – spatially or logically. For instance, when I send a ping packet from my PC to another host on my LAN I see that it reaches there in 1 millisecond. That means, the latency of the network is 1 ms.
Now, how are the two related? Does more throughput always need less latency? Does increased latency always result into reduced throughput? Many such questions! The reciprocal relationship of their units doesn’t help clearing the confusion.
Let’s take the help of a bank ATM. There is a single Automatic Teller Machine housed in my neighborhood bank. It typically takes about a minute to complete the cash delivery for a typical ATM user. That means, I can expect that from the time I get hold of the ATM (after standing in a long queue if I happen to be withdrawing cash on the evening of any day of first week of the month) I’ll spend a minute before I get the cash and leave the premise. In other words, the latency of the ATM is 60 seconds (or 60000 ms for the networking-inclined folks). What is the throughput? It’s 1/60 person per second. i.e. if 1/60 th persons are entering the ATM 1/60th persons will be leaving it every second. Or, if 1 person is entering every minute 1 person will be exiting every minute. Simple math.
So, throughput = 1/latency? Not so fast.
Let’s say that the ATM got a better network connectivity (having better throughput and better latency, if you will). Now the speed of transactions has improved dramatically. It can finish delivering cash within 30 seconds now. What is the latency? 30 seconds. What is the throughput? 1/30 persons per second or 2 persons per minute. Latency halved, throughput doubled.
Hmm. Looks like the reciprocal relation is right, after all.
Keep in mind that that the bank users are really concerned about the service of cash delivery and not about the means. So, for the bank users the above numbers are the numbers of latency and throughput of the cash withdrawal service afforded by the bank. Let’s say that the bank decides to improve its services even further. How can it do? It can make the ATM faster again. Or, it can install another ATM in the same premise. Let’s say the ATM is saturated for performance in the available technology of the time. Bank installs a new ATM in the premise. Now if I go to bank, I know that I’ll have to stand in front of the ATM for 30 seconds. But, I can see that the queue is halved. Why? Because, the cash withdrawal is happening at double the earlier rate. Any given time, there can be two users using the ATM. That means in a minute, there’ll be 4 complete transactions instead of 2. That is the throughput has changed from 2 persons per minute to 4 persons per minute. The latency is unchanged at 30 seconds. That means, throughput can be increased without the need to improve latency. Wow!
Bank decided to conduct a survey about its ATM services. There was kiosk terminal that held the survey. Each ATM was followed by a kiosk terminal. Each user needs typically 30 seconds to complete the survey and exit. Every customer now spends 60 seconds in the premise i.e the latency is gone down to 1/60 per second. What about throughput? Nothing changed here. At any given time there will be 4 people (one each in front of the survey terminal and one in front of the ATM) inside the ATM premise against 2 (one in front of each ATM). Still, 4 people will be entering and exiting the premise. Throughput remained unchanged even though latency increased twofold.
(Here is a small exercise. What will happen if the kiosk terminal is single instead of 2? What if the kiosk takes 5 seconds instead of 30?)
What is happening here? Adding more stages in the pipeline affects the latency. If the stages are connected to each other via independent queues the throughput becomes a function of the stage that has the smallest throughput or the bottleneck. Obviously, the details of actual values are much more complex than the simplified model I put here. For instance, the probability distribution of inter-arrival times would play significant role in the latency faced by an individual due to build up of queues. However, the averages should converge to the values described here.
In short, the throughput is a function of how many stages are in parallel while latency is a function of how many are in series when there are multiple stages in the processing. The stage with the lowest throughput determines the overall throughput.