LUCAS is a 256 processor SIMD (Single Instruction stream Multiple Data stream) computer, which was designed and built around 1981-83. It is based on bit-serial processors, i.e. each processor has a one-bit arithmetic-logic unit, and processors are often referred to as processing elements (PEs). All processing of data items larger than one bit (i.e. 99.99999% of data items of practical use - Boolean flags excluded) is done bit-serially under the control of a microprogrammed control unit. The usual approach to achieve high processing speed consists in using wide data elements (for example 32 or 64 bits). As the observant reader has already noticed, LUCAS does not achieve speed by processing wide data elements. Instead processing speed is achieved by processing many data elements in parallel at the same time. The SIMD model of processing uses a single control unit that executes a single operation at one point in time, but applies that operation to a large number of data items. LUCAS being a 256 PE computer means that (at best) 256 data items can be processed in parallel. At the time LUCAS was built, affordable computers rarely exceeded 32 bits, which meant that LUCAS had the potential to run much faster - which it did, to the fraction of the cost of a "mini computer" of that time. Applications suitable for SIMD processing can be found in different areas, the most obvious ones being matrix processing in scientific computation, or processing of elements in relational databases. LUCAS falls into the category of "associative" - or "content addressable" - computers. Each PEs is not only equipped with an arithmetic-logic unit (ALU), but also with it's own memory - and in fact all the main memory of the entire machine resides within the PEs. The terms "associative" and "content addressable" refer to the primary working mode of the machine, where operations often are preceeded by (parallel) searches over the PE array, resulting in only PEs where matching contents was found being turned on for the subsequent operation. The LUCAS project studied and implemented parallel processing of relational database operations, using this principle. A particularly interesting and powerful aspect of LUCAS, is the way the different PEs are interconnected in order to allow operations with operands residing in different PEs. Different interconnection patterns are directly implemented in hardware. For simple image processing applications, a "nearest neighbour" interconnection allows adjacent pixels stored over the PE array to be processed in parallel. A more complex interconnection pattern is the "perfect-shuffle/exchange" network. Perfect shuffle divides the PE array into two chunks and performs a perfect merge: think of a deck of playing cards that you separate into two parts and then merge. The "exchange" bit means that you can interchange two items on the fly after doing the shuffling. It turns out that this kind of interconnection network has interesting properties: it is immediately useful to perfrom FFT (Fast Fourier Transform) - a very common task in signal processing applications, but it also makes it possible to perform any possible permutation of data in log(n) steps (where n is the number of data containers: 256 in the case of LUCAS). Using different interconnection patterns, different signal processing algorithms, many centred around FFT, were studied. Some none-obvious algorithms were also studied involving prime number calculations, used in cryptography. Different programming languages were developed around LUCAS: assemblers and microcode assemblers, a high-level language for microprogramming the control unit for developing fast parallel algorithms, and a Pascal-based high-level language for application programming. LUCAS used a two-level host computer system with a Digital VAX used for compilation and bulk data storage and a Zilog Z80 for running the compiled application code. All parts are seamlessly integrated and appear (more or less) as a unified system to the users. |
||
|
Christer Fernström: |