<< 2012-3 >>
Department of
Computer Science
 

Introduction

LIL is a general purpose virtual machine, created and maintained for the purpose of executing PHIL generated code. This page decribes LIL, and is the reference point for the virtual machine used in the predictable cacheing simulations. The LIL machine is RISC based with modified load and store instructions for partitioning. Extra instructions are defined to manage partitions in the cache.

LIL Architecture

From an architecture point of view, the LIL machine is very similar to the HEP supercomputer. A general block diagram of the machine is shown below :

LIL Machine Architecture

The Multiprogramming Kernel

The LIL machine is able to run a number of processes at once, swapping between them in order to simulate a multiprogramming environment. Although the kernel software which implements this multiprogramming behaviour could be written in LIL assembly code, the current implementation views the kernel as part of the physical machine as it would be in the HEP or a transputer. This model allows the implementation to be carried out in the high level language used to write the simulator which makes for more efficient, easily maintainable code.

The current implementation borrows heavily from the HEP in providing each process with a context which describes the process state. The processor executes instructions based on the state of the currently active process context and hence swapping to a different process is simply a matter of changing the active context. At the present, the machine is assumed to have as many process context modules as there are processes but future implementations will cope with fewer modules by swapping information to and from reserved space in main memory. A scheduler oversees the swapping of processes and makes a swap based on the scheduling stratagy being employed. The scheduler can be changed on a per-processor basis and is potentially hot-swappable.

The scheduler goes to sleep for a predefined length of time between performing the actual scheduling operations. In this time, the processor will be executing the current process. However, the scheduler is interruptable and can be restarted prematurely in the event of the process halting, a semaphor operation being performed or some sort of I/O going on.

The Cache System

The LIL machine includes an abstract cache interface which means that a number of cache options are availible and can be easily configured from the command line. The LIL machine has a primary goal of studying cache behaviour, with a focus on partitioned caches, and so the freedom to easily alter the cache configration is very valuable. The range of supported caches include traditional caches from simple, direct mapped caches through set-associative caches to fully associative caches. The traditional caches are used for comparison with the partitioned caches. currently availible caches are :

Traditional Caches

These caches are those you will find on most of todays microprocessors. For example the Intel Pentium II uses a segregated pair of two and four way set associative caches for instruction and data streams respectively.

Direct Mapped Cache

The most simple (and fastest in terms of latency) is the direct mapped cache. Each memory address can be cached in only one place in the cache. This location is computed from portions of the address bits. The following options are an example of how to use a direct mapped cache as the data cache in the LIL machine :
    -dCache machine.cache.NWaySetAssociativeCache -dCacheNWay 1 
  

Set Associative Cache

Notionally a set associative cache is a collection of direct mapped caches, where each memory address may appear any cache. That is, a memory address may be cached by a number of cache lines. In an s-way set associative cache, each memory address may appear in any of s different positions in the whole cache. The following options are an example of how to use an s-way set associative cache as the data cache in the LIL machine :
    -dCache machine.cache.NWaySetAssociativeCache -dCacheNWay s 
  

Fully Associative Cache

A fully assicitive cache is the end of the spectrum in these basic cache implementations. This is a cache where a memory address may appear in any line of the cache (you can think of it as an s-way set associative cache with s lines). The following options are an example of how to use a fully associative cache with n lines as the data cache in the LIL machine :
    -dCache machine.cache.NWaySetAssociativeCache -dCacheNWay n -dCacheLines n
  

Victim Cache

A victim cache extends a direct mapped cache by adding a small, fully associative cache which stores addresses that are ejected from the main cache on a capacity/conflict miss. The thinking is that these ejected addresses are going to needed again so by storing them in the secondary cache, an increase in performance can be made over simply discarding them. The following options are an example of how to use a victim cache, with n lines in the main cache and v lines in the victim cache, as the data cache in the LIL machine :
    -dCache machine.cache.VictimCache -dCacheLines n -dCacheVictimLines v
  

Column Associative Cache

A column associative cache is a single set cache with multiple hash functions for the line number. Each address a can be located in a number of places in the cache. The lines that a may reside in are determined by the set of hash functions. The following options are an example of how to use a column associative cache, with n lines in the main cache and f functions, as the data cache in the LIL machine :
    -dCache machine.cache.NFunctionColumnAssociativeCache -dCacheLines n -dCacheNFunctions f
  

Partitioned Caches

Partitioned caches expose another level of the memory heirarchy to the programmer. These caches offer explicit control of the pathways from the registers to memory. Partitioned caches can be thought of as protective caches, where each partition for an object (area memory) is no longer involved in competition (interference) with other objects in the cache.

We currently have two implementations of partitioned caches the OPC and the HPC. With further investigation, the ideas and projected benefits of each may be combined to one unified partitioned cache strategy. This is an area of ongoing research and will be reflected in the cache implementations available.

Both partitioned caches create, delete, and invalidate partitiones in exactly the same way. They differ only by their bit selection mechanisms for tag, line and word numbers in the cache.

The OPC Partitioned Cache

The OPC partitioned cache uses a bit selection mechanism for tag, line and word that is considerably more complex than HPC. Consider a partition p of an OPC partitioned cache. Partiiton p was created with l lines and stride s. Let the least significant 1 bit of s be at position si. This sometimes makes for complicated bit selection procedures, and implementation in actual silicon may be difficult. The purpose of distributing the line number bit selection in this way is to address the problems of caching combinations of references such as a[ni] and a[ni+1] where the stride n is efficiently captured by the HPC cache, however the window of 2, may be cached poorly.

The following options are an example of how an OPC cache can be used as the data cache in the LIL machine :

    -dCache machine.cache.PartitionedCache
  

The HPC Partitioned Cache

The HPC partitioned cache uses a very simple bit selection mechanism for tag, line and word numbers. This implies feasible hardware implementation. Consider a partition p of a HPC partitioned cache. Partiiton p was created with l lines and stride s. Let the least significant 1 bit of s be at position si. Thus the tag is one or two segments of the address, that can be implemented easily. Given it's simplicity, this cache works well for most cases. An important feature of the HPC partitioned cache is the more simple predictability of perforamnce compared to that of the OPC cache.

The following options are an example of how a HPC cache can be used as the data cache in the LIL machine :

    -dCache machine.cache.HenksPartitionedCache
  
© 1995-2013 University of Bristol  |  Terms and Conditions  |  Use of Cookies
About this Page