An Optoelectronic Cache Memory System Architecture

نویسندگان

  • Donald M. Chiarulli
  • Steven P. Levitan
چکیده

We present an investigation of the architecture of an optoelectronic cache which can integrate terabit optical memories with the electronic caches associated with high performance uniand multiprocessors. The use of optoelectronic cache memories will enable these terabit technologies to transparently provide low latency secondary memory with frame sizes comparable to disk-pages but with latencies approaching those of electronic secondary cache memories. This will enable the implementation of terabit memories with effective access times comparable to the cycle times of current microprocessors. The cache design is based on the use of a smart-pixel array and combines parallel free space optical I/O to-and-from optical memory with conventional electronic communication to the processor caches. This cache, and the optical memory system to which it will interface, provides for a large random access memory space which has lower overall latency than that of magnetic disks and disk arrays. In addition, as a consequence of the high bandwidth parallel I/O capabilities of optical memories, fault service times for the optoelectronic cache are substantially less than currently achievable with any rotational media. Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 2 Introduction Hierarchical memory architectures for computing systems are based on two fundamental paradigms of computer architecture: a hardware paradigm that smaller is faster, and a software paradigm that programs access memory in patterns that exhibit spatial and temporal locality. Thus the latency inherent in the access to a large memory can be hidden in a pyramid with small and fast memory modules at the top, closest to the processor, and larger, slower, memories at the bottom. Optical and optoelectronic memory devices offer the potential for building very large memories at the lowest level of the hierarchy. Unlike magnetic disks, optical memory provides random access throughout the address space as well as high bandwidth and highly parallel data transfers. Recent developments in the integration of silicon and optoelectronic technology such as FETSEEDs or VCSELs [1, 2, 3, 4, 5] have provided the devices necessary to integrate optical memories into a hierarchical optoelectronic memory system. Key to the successful design of such a system is the resolution of architectural issues such as the address translation mechanism, frame size at each level, write policy, replacement algorithms, and coherency support mechanism[6]. By utilizing an optoelectronic cache memory we can resolve both these architectural issues, as well as provide the necessary technology interface, and thus provide a seamless optoelectronic memory hierarchy which is compatible with modern uniand multi-processor computing systems. As shown in Figure 1, in the conventional description of a memory hierarchy, a distinction is made between secondary memory, primary memory, and each level of cache memory. This distinction was originally based on the visibility of the memory relative to a machine language instruction. In this historical context, shown in Figure 1(a), primary, or main, memory was defined by the program address space (e.g., 16 bit addresses) and secondary memory, or backing store, was associated with input and output. Cache memories, to the extent they existed, were invisible, and were first implemented as a buffer between the processor and primary memory. In modern systems, shown in Figure 1(b), caches are implemented routinely and typically exist in multiple levels, with the first level cache integrated into the processor itself. The distinction between primary and secondary memory has been significantly blurred by address segmentation and virtual Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 3 memory systems. Typically, secondary memory now supports a much larger program address space, parts of which are swapped on demand into a semiconductor RAM primary memory level. In the following discussion, we dispense with the notion of distinct primary and secondary memories. As shown in Figure 1(c), we merge these levels into a single optoelectronic memory at the lowest level of the hierarchy. The processor address space is directly supported in the optical memory. All levels between this optical memory and the processor are transparent to the processor and therefore are referred to as cache levels. Figure 2, shows a block diagram of a physical realization of an optoelectronic memory system for a uniprocessor and Figure 3 shows a realization for a multiprocessor application. Reflected in these designs is the fact that most state-of-the-art processors use a two level cache at the top of the memory hierarchy with a cache controller for these levels integrated on the processor chip. The top level, or primary cache is a small on-chip memory. The secondary cache is somewhat larger and is off-chip. These caches typically have access times on the order of the processor clock period and data transfers between them are in the range of ten to one hundred words. At the lowest level, the optical memory provides high capacity data storage. Between these levels, the optoelectronic cache level links the secondary cache with the optical memory level. The optoelectronic cache is a dual ported electronic memory with an optical port which connects to the optical memory and an electronic port which connects to the levels above. In the shared memory multiprocessor application shown in Figure 3, the optoelectronic cache serves the same functions. However in this design, it is also necessary that the cache be either multiported or banked to provide multiple access points on the electronic interface. Further, an interconnection network is necessary between the optoelectronic cache and the processor secondary caches. Alternative designs may eliminate this interconnection network by duplicating the optoelectronic cache at each processor and providing interconnect at the optical memory level. Similarly, both local and distributed memory models might be supported since it is possible to implement the optical memOptoelectronic Cache Memory System Architectures Chiarulli & Levitan 4 ory such that both local and networked banks of memory exist. Thus, the optoelectronic cache is also an enabling technology for future multiprocessors that use large shared optical memory systems. In this paper we present an investigation into the design of a new optoelectronic cache level that will interface a terabit optical memory to the electronic caches associated with one or more processors. The cache level is based on the use of smart-pixel array technology and will combine parallel free-space optical I/O to an optical memory with conventional electronic communication to the processor caches. The optoelectronic cache, and the optical memory system to which it will interface, will provide for a large random access memory space that will have a lower overall latency than magnetic disks or disk arrays. In the next section we briefly present the context of current, or proposed optical memory systems and present our model for the optoelectronic interface to these memories. Next we outline a specific design for an optoelectronic cache and cache controller. We then present a preliminary performance analysis based on analytical results and simulation data. We conclude with a discussion of the ramifications of our results. Background There are a number of competing optical memory technologies currently being investigated. We focus on non-rotational read/write media. This is because the access time of all rotational media based systems precludes their use as operating system transparent memories. The latency of these devices would necessitate a process context switch in the case of a fault. That is, in a multiprocessing environment, a different program would be run while waiting for the disk operation to complete. “3-D” optical memory systems on the other hand, have the potential of both fast access times as well as large capacities [7]. Typical examples of these systems are: • Spectral hole burning for memories at low temperatures [8, 9], and the possibility of room temperature devices [10]. • Photorefractive materials for holographic storage [11]. Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 5 • Two photon systems [12]. All have the common characteristics of high access bandwidth, supported largely by parallel access. Specifically, each reference returns a frame of data, where the term frame refers a large collection of bits typically related by membership in a specific data structure such as an image bitmap. In this discussion we select a less restrictive and technology independent model for the optical memory. The model assumes only that it is a high capacity memory with access parallelism modelled as a long word length. As with a conventional memory hierarchy, the access time is assumed to be significantly longer (two to three orders of magnitude) than the clock period of the processors. Input and output ports for the optical memory are assumed to be a free space optical interconnect with the number of channels corresponding to the number of bits in the memory word. However, given current or near term technology limits, it may be necessary to multiplex the optical system in order to accommodate limitations on the density of optoelectronic device integration. Optoelectronic Cache Architecture In this section we present a realization of the optoelectronic (OE) cache level in the OE memory hierarchy. As shown in Figure 1(c), the cache is in the same position as the primary memory in a conventional hierarchy. However, unlike primary memory it is transparent to both the processor and the operating system. This level is the interface between the optical memory backing store, and the secondary cache associated with the processor. Another distinguishing feature of the OE cache is its significantly larger line size then is typical for primary memory. A memory line, (also commonly known as a cache line) is the amount of data transferred between levels of the hierarchy when a memory fault (or equivalently, a cache miss) occurs. Thus, the size of a line at a particular level, is a trade-off between the locality supported within the memory traffic and the efficiency to which the cache is utilized. A large cache line more loosely constrains memory access locality. However, large cache lines will also bring into the cache fragments of unused memory. This effect is called internal fragmentation. In a conventional memory the cost associated with internal fragmentation can be significant since the fault service time is Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 6 typically linearly related to the line size. However, in the OE cache, the (much larger) line size is determined by the width of the optical memory word. The parallel access characteristics of an optical memory make it possible to transfer cache lines to and from the optical memory in a single access time. This is substantially faster than the equivalent transfer from a magnetic disk which must allow for both rotational latency and serial transfers. This is a significant advantage. However, it has an effect on the organization of the cache itself, and also impacts the mechanism for address translation and, in multiprocessor systems, coherency issues. Figure 4 shows a block diagram of a design for the OE cache. In this diagram, optical I/O is transmitted and received by an array of SEED devices shown on the right. The electronic bus, drawn vertically on the left, connects the OE cache to the electronic secondary cache level. The cache itself is modelled as a two dimensional array of bits. Each column holds one cache line which corresponds to one word (frame) from the optical memory. Each column is subdivided into words, each the width of the processor memory bus. Each of these words is in turn connected to the electronic I/O bus. When a fault occurs in the secondary cache, the optoelectronic cache controller processes the address to determine if there is a cache hit in the OE cache. In other words, if the requested location is present in the OE cache. If a hit occurs, the controller translates the address of the requested location from its location in the processor address space, to an address within the OE cache. This address is partitioned as shown in Figure 5. Once translated, a pair of decoders handle the cache address. One decoder reads the high order address bits and selects all of the bits in a single column. Another decodes the low order bits and selects one electronic memory word within the selected column. Thus, when the memory is accessed from the optical memory side, an entire cache line is read or written simultaneously. While on the electronic side, a single word is selected by enabling both the corresponding cache line (column) and the corresponding word offset onto the electronic bus. Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 7 We assume that, although Figure 4 shows the optoelectronic cache as a monolithic implementation, it is likely that an actual implementation may partition the memory both along the word width and memory depth. Also, given the relative bandwidth of the optical interconnect to the two memories, it is possible that the optical system may be multiplexed in order to reduce the device count. Error detection and/or correction mechanisms can also be built into the optoelectronic cache by connecting this hardware to the horizontal bus carrying the optical memory word. Controller Architecture The discussion of address decoding in the previous section assumed a cache hit. In other words, the cache controller shown at the top of Figure 4 interpreted an incoming memory address, determined that the desired location was in the cache, and translated the memory address into a cache address. In this section we will briefly describe how such a translation might take place in the optoelectronic cache controller. Consider an n-bit memory address corresponding to a location in the data memory space of a processor. Assuming that the word size in the processor memory space and the word size of the optical memory are powers of two, this location is at some specific offset within a larger optical memory word. Thus the n-bit address is partitioned into fields corresponding to a location in optical memory and the offset of the word. This organization is identical to the one shown in Figure 5 with the exception that the high order bits now identify an optical memory word within the address space of the processor. In fact, it is identical to the organization of addresses at any level of the memory hierarchy where the high order bits select a specific memory line and the low order bits select the offset within the line. The number of bits in the high order partition of these addresses is determined by the relative sizes of the memory address space and each of the cache levels. Address translation is the operation of mapping from a value for the high order bits in the memory address space to the high order bits (cache line number) of a cache address. There are a number of methods for accomplishing this translation which are well documented in the literature on memory systems [13]. They include low latency solutions which use direct Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 8 and fixed mappings. More complex methods use associative memory lookups, and others use hierarchical tables. Each has different characteristics for latency, implementation efficiency, and cache utilization. In general, address translations mechanisms with higher latencies tend to use the cache more efficiently and tend to lower fault rates. Thus, if the cost of fault is high, such as the case for swapping to and from disk, then a designer is willing to tolerate a higher latency in address translation (for either a hit or a miss) in order to minimize the frequency of faults. For example, in a conventional memory hierarchy between primary memory and a swapping disks, fault costs can typically be on the order of milliseconds. Hence, the dynamic address translation algorithms used in a virtual memory system may add latency of two or three times the normal memory latency as overhead, in order to implement nearly optimal replacement strategies. With an optical memory at the lowest level of the hierarchy, these fault costs are reduced to microseconds. Thus a significantly faster (but less optimal) address translation mechanism can be utilized. Throughout this discussion we have assumed that the optical memory replaces both the primary memory and disk backing store of a conventional memory system. Thus the traditional notion of a “virtual memory” as a process level address space is replaced by a single, large, processor level address space. This design is consistent with the current trends in processor design in which 64-bit address spaces are quite common. When an optical memory technology is used to populate these huge address spaces, conventional mechanisms for memory management in operating systems will be obsolete. Both virtual memory mechanisms as well as file system organizations will be replaced by common name space object oriented operating systems [14][15]. In the near term, however, it is still possible to integrate the proposed optical memory system into conventional virtual memory operating systems, which assume a unique address space for each process, by simply making an association between the upper bits of a optical memory address with the process-id of a running process. Optoelectronic Cache Memory System Architectures Chiarulli & Levitan 9 Performance Analysis In this section, we present an analysis of the relative performance of the OE memory system architecture in comparison to traditional memory hierarchies. The average memory latency Lx at any level, x, of a memory hierarchy can be calculated as: where px the fault probability, (1 px) is the hit probability, and Lx is the memory access time at level x. L0 is the latency associated with the memory at the lowest level of the hierarchy, commonly known as the backing store. In this expression we approximate the miss penalty, at all but the lowest level, to the average latency of the next lower level. This approximation is accurate if we assume that memory banking, or other prefetching techniques have been implemented between these levels. At the L0 level, specifically when disk drives are used as the backing store, is it necessary to consider the transfer time of a memory line as part of the latency. In this case, if Ts is the average seek time, Tr, is the average rotational latency and Tx is the transfer rate of a disk based backing store, the miss latency of a memory line of size nm is: Alternatively, when an optical memory is used as the backing store and the entire cache line is transferred in parallel, only To, the access time of the optical memory, needs to be considered: Taking only this difference into account, Figure 6, is a plot of the average latency versus the hit rate for two, single level, memory systems. One uses disk technology as the backing store, the other uses an optical memory as the backing store. For this plot, L1 is set to 10ns, To is set to 1us and the average disk latency is assumed to sum to 1ms. Latency on the y axis is plotted on a log scale and hit rates are varied only in the range of 90 to 100 percent. Clearly, the large region between the lines represents the potential latency Lx 1 px – ( ) = Lx pxL x 1 – ( ) + L0 LBackingStore = L disk 0 Ts Tr nmTx + + ( ) =

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optoelectronic-cache memory system architecture.

We present an investigation of the architecture of an optoelectronic cache that can integrate terabit optical memories with the electronic caches associated with high-performance uniprocessors and multiprocessors. The use of optoelectronic-cache memories enables these terabit technologies to provide transparently low-latency secondary memory with frame sizes comparable with disk pages but with ...

متن کامل

A Comprehensive Hdl Model of a Line Associative Register Based Architecture

Modern processor architectures suffer from an ever increasing gap between processor and memory performance. The current memory-register model attempts to hide this gap by a system of cache memory. Line Associative Registers(LARs) are proposed as a new system to avoid the memory gap by pre-fetching and associative updating of both instructions and data. This thesis presents a fully LAR-based arc...

متن کامل

Error Detection and Correction in an Optoelectronic Memory System

This paper describes the implementation of error detection and correction logic in the optoelectronic cache memory prototype at the University of Pittsburgh. In this project, our goal is to integrate a 3D optical memory directly into the memory hierarchy of a personal computer. As with any optical storage system, error correction is essential to maintaining acceptable system performance. We hav...

متن کامل

Low-Power L2 Cache Architecture for Multiprocessor System on Chip Design

Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...

متن کامل

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring rchitectures

Parallel computing performance on scalable share& memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an eff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996