Technical Report: 18 November 2002, 11 pages


The RAMpage hierarchy moves main memory up a level to replace the lowest-level cache by an equivalent-sized SRAM main memory, and uses the TLB to cache page translations in that main memory. Earlier RAMpage evaluation used a relatively small L1 cache and TLB. Given that TLB misses can take up a significant fraction of run time, better TLB management in general is worth pursuing. For the RAMpage hierarchy, the effect is clearer than with a conventional hierarchy, because it is more feasible to make a TLB which maps a high fraction of main memory pages. This paper illustrates how more aggressive components higher in the memory hierarchy make time spent waiting for DRAM more significant as a fraction of total execution time, and, hence, approaches to hide the latency of DRAM become more important. For an instruction issue rate of 1 GHz, the simulated standard hierarchy waited for DRAM 10% of the time; with the instruction issue rate increased to 8 GHz, the fraction of time spent waiting for DRAM increased to 40%, and was higher for a larger L1 cache. The RAMpage hierarchy with context switches on misses was able to hide almost all DRAM latency. Increasing the processor speed in a standard hierarchy by a factor of 8 and increasing L1 cache size by a factor of 16, with DRAM speed unchanged, resulted in a speedup of 6.12. Adding the RAMpage model and introducing context switches on misses, with similar processor speed and L1 improvements, resulted in a speedup of 10.7 over the slowest conventional hierarchy. A larger TLB was shown to increase the viable range of SRAM page sizes in the RAMpage hierarchy.


(PDF 303K, BibTeX)