In the storage world, we frequently encounter the words “latency” and “response time.” Most of the time people treat the two terms as synonymous. However, when analyzing I/O performance it is important to differentiate the true meaning of each so we can precisely describe, and therefore properly attribute each term’s contribution to observed I/O performance.
The word “latency” has a more precise and narrow definition. It is the set amount of time a command takes to complete, mostly due to physics. “Response time,” on the other hand, is what a command experiences taking all other factors into consideration.
In a storage system, latency is determined by the following:
- Finite signal speed for an electromagnetic wave, e.g., light, through various media, such as optical fiber, transmission line, twisted pair, and microwave link, etc. For a given distance and type of medium between the client and the storage system, the minimum roundtrip time required for an I/O command is fixed.
- Electronic delay due to capacitive components in buses, HBA’s, switches, and other hardware in the path.
- Logical operations from the client’s OS, driver, to the storage controller software, and anything in between that requires CPU cycles to perform the logic involved.
- Mechanical movement, which is the time it takes for the arm of the magnetic reader to move to the area of the disc surface where the desired data resides.
Very often people automatically ascribe the term “latency” to the distance the signal has to travel. However for most storage operations it is actually the mechanical movement of the magnetic head that is the predominant factor contributing to latency. For random access, the seek time from one location to another can be in tens of milliseconds, and the process is essentially serial – meaning multiple commands can only be executed one after the other in a sequence. RAID systems were largely created to allow multiple disk spindles to increase parallelism so multiple commands or large data set could be processed the same time. To measure the latency of a system, typically a small data packet is used. If a 1KB read is performed, the time it takes to complete the read is the I/O response time. In this case, the response time is exactly the latency.
Now what happens if we increase the number of I/O by, say, doubling it to two. Most likely the time to complete both I/O will be the same. This means the response time remains the same as the latency. In most systems this process can be repeated a few more times, and the response time will stay more or less the same until another parameter – the bandwidth of the system – is reached. That is when the response time of the I/O is no longer the latency of the system. “Bandwidth” here is used loosely to describe the capacity of a complex system to process data, not necessarily in its more precise meaning, which is the range of frequency of the signal being carried in a transmission medium.
The “system” here is the entire channel in which the I/O operation is involved. This encompasses the client OS, driver, to the storage controller, and the physical links in between, including any other components that may be inserted along the path. While everything adds a bit of latency, it is the effective bandwidth of the combined elements that mostly affects the response time of I/O’s, since very few systems perform single small I/O’s. Latency itself is almost never the significant factor. What’s more, response time is not necessary a good indicator of system performance because it is directly affected by the queue depth of the system.
From the client application perspective, the apparent system queue depth determines how many simultaneous I/O commands can be sent. However the effective queue depth experienced by the client is a complicated combination of the various queues in each component involved.
For the client, the response time is calculated from the time a command is sent to the time it returns. Again, if it is a single small I/O, then the time will be the latency of the path. However, when many asynchronous I/O’s are sent continuously, the number of commands queued up along the path may not get processed immediately, due to a saturation point or bottleneck in one of components when its bandwidth is reached. Therefore the more commands sent by the client, the longer the average response time it will experience. Meanwhile the overall IOPS and throughput remains the same at the limit of the system. In this case, the long response time displayed in the client side may simply be caused by the excessive number of I/O being sent and there may be no true reflection of the performance bottleneck.
Additionally, if another component in the path has the ability to report I/O response time (such as the CDS appliances), the reports will originate from that component’s perspective, which may be totally different from the perspective of any other component in the system. There are two factors at play. First, the measured response time is only part of the total response time experienced. Second, if a component has a shorter queue it will also have calculated a different average response time.
CDS appliances report pending I/O in the queue. If the pending I/O count is close to zero, then the response time is pretty much the latency experienced by the appliance. If many I/O’s are pending, then the response time will be shown to be longer, while all the time the system performance does not change. Of course, if the pending I/O is frequently zero, the response time is the shortest, but the performance may actually decrease since the system may be idling waiting for I/O’s to arrive. All this can be easily demonstrated using benchmark tools such as FIO or IOMeter.
Since the CDS appliances provide the complete and granular set of metrics for storage performance analysis, many of our customers eagerly use the information to identify and troubleshoot system bottlenecks. Therefore it is important to interpret this displayed information properly and accurately.
This brings us to the question that is often asked of CDS when our appliances are inserted into the storage path: How will your appliances affect our I/O performance?
The short answer is that the appliances will not affect I/O since the added latency is minuscule compared to the other factors explained above. If you’re looking for a long answer, keep checking back here on this blog for an even more detailed and in-depth explanation.
For more information on this topic, check out the second part of this article, “Does Latency Affect Performance? Yes, but No.”