DDR3 Memory Systems
The following animations show a Begin-of-Life (BOL) DDR3 system with external memory controller.
The data exchange between the CPU and the cache is very fast. But in case of a cache miss, data is sent to a LOOOONG journey. Fist a memory request is sent to the memory controller. The controller needs to forward the request to the memory. In worst case, several commands need to be sent to the DRAM. If a wrong page is opened, this page needs to be closed with a precharge. After some time (tRP), a new row can be opened by an activate command. Again, the controller needs to wait (for tRCD ) before it can send the column command – a Read in this case. All these steps are only shown in a single command by this animation.
Due to the DDR3 FlyBy CA bus, each DRAM gets the Read at a slightly different point in time (about 150ps difference between two adjacent DRAMs). And after receiving the read command, the DRAM needs the CAS latency before it delivers the read data. Read data is received by the controller and sent back to the CPU. During the time where the first core was waiting for the data to come back from the DRAM, the second core could do thousands of memory accesses to the cache.
To limit the overall latency, new CPU designs do have the memory controller integrated. This integration solves already the problem of the frontside bus being the bottleneck. However, the overall Latency added by the DRAM is a real problem. Just doubling the frequency with each new generation is not compensating the lack of performance on the memory bus.