Introduction
LIL is a general purpose virtual machine, created and maintained for the purpose of executing PHIL generated code. This page decribes LIL, and is the reference point for the virtual machine used in the predictable cacheing simulations. The LIL machine is RISC based with modified load and store instructions for partitioning. Extra instructions are defined to manage partitions in the cache.LIL Architecture
From an architecture point of view, the LIL machine is very similar to the HEP supercomputer. A general block diagram of the machine is shown below :
The Multiprogramming Kernel
The LIL machine is able to run a number of processes at once, swapping between them in order to simulate a multiprogramming environment. Although the kernel software which implements this multiprogramming behaviour could be written in LIL assembly code, the current implementation views the kernel as part of the physical machine as it would be in the HEP or a transputer. This model allows the implementation to be carried out in the high level language used to write the simulator which makes for more efficient, easily maintainable code.The current implementation borrows heavily from the HEP in providing each process with a context which describes the process state. The processor executes instructions based on the state of the currently active process context and hence swapping to a different process is simply a matter of changing the active context. At the present, the machine is assumed to have as many process context modules as there are processes but future implementations will cope with fewer modules by swapping information to and from reserved space in main memory. A scheduler oversees the swapping of processes and makes a swap based on the scheduling stratagy being employed. The scheduler can be changed on a per-processor basis and is potentially hot-swappable.
The scheduler goes to sleep for a predefined length of time between performing the actual scheduling operations. In this time, the processor will be executing the current process. However, the scheduler is interruptable and can be restarted prematurely in the event of the process halting, a semaphor operation being performed or some sort of I/O going on.
The Cache System
The LIL machine includes an abstract cache interface which means that a number of cache options are availible and can be easily configured from the command line. The LIL machine has a primary goal of studying cache behaviour, with a focus on partitioned caches, and so the freedom to easily alter the cache configration is very valuable. The range of supported caches include traditional caches from simple, direct mapped caches through set-associative caches to fully associative caches. The traditional caches are used for comparison with the partitioned caches. currently availible caches are :- Traditional Caches
- Partitioned Caches
Traditional Caches
These caches are those you will find on most of todays microprocessors. For example the Intel Pentium II uses a segregated pair of two and four way set associative caches for instruction and data streams respectively.Direct Mapped Cache
The most simple (and fastest in terms of latency) is the direct mapped cache. Each memory address can be cached in only one place in the cache. This location is computed from portions of the address bits. The following options are an example of how to use a direct mapped cache as the data cache in the LIL machine :
-dCache machine.cache.NWaySetAssociativeCache -dCacheNWay 1
Set Associative Cache
Notionally a set associative cache is a collection of direct mapped caches, where each memory address may appear any cache. That is, a memory address may be cached by a number of cache lines. In an s-way set associative cache, each memory address may appear in any of s different positions in the whole cache. The following options are an example of how to use an s-way set associative cache as the data cache in the LIL machine :
-dCache machine.cache.NWaySetAssociativeCache -dCacheNWay s
Fully Associative Cache
A fully assicitive cache is the end of the spectrum in these basic cache implementations. This is a cache where a memory address may appear in any line of the cache (you can think of it as an s-way set associative cache with s lines). The following options are an example of how to use a fully associative cache with n lines as the data cache in the LIL machine :
-dCache machine.cache.NWaySetAssociativeCache -dCacheNWay n -dCacheLines n
Victim Cache
A victim cache extends a direct mapped cache by adding a small, fully associative cache which stores addresses that are ejected from the main cache on a capacity/conflict miss. The thinking is that these ejected addresses are going to needed again so by storing them in the secondary cache, an increase in performance can be made over simply discarding them. The following options are an example of how to use a victim cache, with n lines in the main cache and v lines in the victim cache, as the data cache in the LIL machine :
-dCache machine.cache.VictimCache -dCacheLines n -dCacheVictimLines v
Column Associative Cache
A column associative cache is a single set cache with multiple hash functions for the line number. Each address a can be located in a number of places in the cache. The lines that a may reside in are determined by the set of hash functions. The following options are an example of how to use a column associative cache, with n lines in the main cache and f functions, as the data cache in the LIL machine :
-dCache machine.cache.NFunctionColumnAssociativeCache -dCacheLines n -dCacheNFunctions f
Partitioned Caches
Partitioned caches expose another level of the memory heirarchy to the programmer. These caches offer explicit control of the pathways from the registers to memory. Partitioned caches can be thought of as protective caches, where each partition for an object (area memory) is no longer involved in competition (interference) with other objects in the cache.We currently have two implementations of partitioned caches the OPC and the HPC. With further investigation, the ideas and projected benefits of each may be combined to one unified partitioned cache strategy. This is an area of ongoing research and will be reflected in the cache implementations available.
Both partitioned caches create, delete, and invalidate partitiones in exactly the same way. They differ only by their bit selection mechanisms for tag, line and word numbers in the cache.
The OPC Partitioned Cache
The OPC partitioned cache uses a bit selection mechanism for tag, line and word that is considerably more complex than HPC. Consider a partition p of an OPC partitioned cache. Partiiton p was created with l lines and stride s. Let the least significant 1 bit of s be at position si.- The word number is given by bits si and si+1.
- The line number is the concatenation of the bits si+1+floor(log(l)/2) ...si+2 and bits 0...ceil(log(l)/2). If these regions overlap either each other or the word number bits, then the sections are extended accordingly.
- The tag is the remaining address bits concatenated.
The following options are an example of how an OPC cache can be used as the data cache in the LIL machine :
-dCache machine.cache.PartitionedCache
The HPC Partitioned Cache
The HPC partitioned cache uses a very simple bit selection mechanism for tag, line and word numbers. This implies feasible hardware implementation. Consider a partition p of a HPC partitioned cache. Partiiton p was created with l lines and stride s. Let the least significant 1 bit of s be at position si.- The word number is given by bits si and si+1.
- The line number is given by the bits si+2...si+2+log(l).
- The tag is the remaining address bits concatenated.
The following options are an example of how a HPC cache can be used as the data cache in the LIL machine :
-dCache machine.cache.HenksPartitionedCache

