Merits of Data Reuse Hints
White Paper: Merits of Data Reuse Hints
Traditionally, much of the I/O system architecture and design was done with the assumption that I/O data rates and latencies are very large when compared to the throughput and latencies observed inside a CPU core or the memory subsystem. This has changed with the advent of higher data rates, for example 10 GbE, approaching cache miss latencies on the wire. Current device to CPU communication does not take full advantage of system capabilities such as the system cache hierarchy and effective utilization of system resources (memory bandwidth, etc.). We can improve I/O features to allow for better performance/power in future systems. Relevant system metrics include CPU utilization, memory bandwidth, latency, and power. This paper discusses how, by providing explicit cache management hints, we can significantly reduce I/O to memory bandwidth utilization, system interconnect bandwidth, and associated power consumption.
CPU accesses data DMA’d in and out by an I/O device:
• Control data, for example, descriptors
• Headers for protocol processing
• Payload for data copies
The White Paper describes the problem statement and the power/ performance impact to system architecture for I/O access: For many applications there is temporal reuse of these data structures, for example recycling of data buffers, updates to descriptors, etc. The rate of data control exchanged between I/O device and CPU depends on application and other system variables. For example, at 10 GbE, the rate of control exchange is such that there is a very high likelihood that the accessed data is resident in the system cache hierarchy when accessed by I/O device or CPU, if this is allowed by the cache system implementation. However, keeping I/O data in system cache is not always desirable. So, many existing system architectures perform to “invalidate and write to memory” for device initiated accesses, for example, I/O initiated reads and writes cause cache evictions if addressed data present in the cache hierarchy. For the case where keeping data in cache is desirable, the penalty caused by the above behavior is in current systems on the order of ~50-80 ns. For reference, packet arrival rate using 10 GbE for 64 bytes is ~67 ns.
Read the full Merits of Data Reuse White Paper.