Discussion 10: Execution Unit
The execution unit contains the data registers and the ALU. It selects the registers to operate on, feeds them to an ALU, selects the appropriate result, and sends it back to the registers. Below is a diagram showing a typical three-address datapath with two sources and a destination. The sources are used when reading the registers (e.g., Src 0 specifies the register to connect to the blue ALU bus and Src 1 specifies the register for green bus. The ALU result returns to the register selected by Dest on the purple bus. The Operation Code tells the ALU which operation to perform on the operands.
It is possible to build an execution datapath that operates in a single cycle if the registers output to the source buses on the rising edge of a clock, and then write the ALU result from the purple bus on the falling edge of the clock. The ALU has available the time while the clock is high to perform its computations. If an operation take more time than this, then multiple clock cycles are required, and the ALU may need to have internal registers to hold data across this longer period of time.
In addition to computing on data, the ALU typically performs computations for branch decisions. Some systems do this in a separate branch pipeline however. There are two approaches to supporting branches. In one case, the ALU computes a set of condition flags as a side effect of each computation. A branch then merely tests these flags and makes its decision. The advantage is that branch instructions are simplified. The other approach is to have the branch compute its own flags (e.g., branch on Src 0 > Src 1). This has the advantage of avoiding a separate flag register, and having to compute the conditions for every operation, regardless of whether they will be used.
In addition to the execute instructions, the register file is referenced by load and store operations. Thus, we need to add paths to support these transfers. We can simply select one of the data path buses to output to memory when we execute a store instruction. But the destination bus must be shared between the ALU and memory read operations. When we share wires between two sources, we use a multiplexer between the sources and the destination:
Depending on the control input signal, the multplexer selects either of its inputs to transmit on its output. So, when the Memory/ALU signal indicates Memory, the Memory input is then connected to the Destination Bus output. The multiplexer is just a simple AND-OR circuit. The control signal is ANDed with one of the inputs. Its inverse is ANDed with the other input, and the OR of the two AND gates is output.
Here we see how the multiplexer is inserted into the datapath
In a load or store, the ALU may be bypassed so that the data moves from the registers to the memory unchanged (and vice versa). Some designs allow the ALU to operate on data as it is loaded or stored. However, this ties the fast ALU and slow memory together for an operation, which requires additional control complexity. Given how seldom this ability is used in practice, most modern designs keep the ALU and load/store data paths separate so that they can each be streamlined for their specific tasks. The ALU may also serve to compute a branch address, or a separate, simpler ALU may be provided for this purpose.
With a RISC instruction set, the selection of the source and destination registers is straightforward because they are fixed fields in the instruction register (IR), and it is simply a matter of connecting them to the register file's address inputs. In a CISC instruction set, the source and destination registers may be specified in different parts of the IR, and even in different words. Thus, the IR fields must pass through a steering selector that is controlled by the control unit.
In a RISC ISA, there are just a few types of instructions, and they are either pure register to register, or load/store in their treatment of data. With a CISC architecture, the source inputs to the ALU can include memory (actually a hidden memory data register) and a register, so there would be an extra input to the multiplexer, and a second multiplexer on the SRC1 input, to handle memory data. The result data also can go directly to memory, so a demultiplexer would be needed to steer the ALU result either back to the registers or out to memory.
A CISC ISA has more addressing modes than a RISC ISA, which would result in greater complexity in the address calculation step. The RISC processor can basically use a register together with the address in an instruction. A CISC address may be generated from an indirect reference (requiring another memory fetch to the memory data register), or multiple registers (e.g., base + offset), or an extra instruction word (the extended IR), and possibly with a pre- or post-increment (or decrement) of a register (requiring a source value to pass through an increment/decrement unit either before or after the ALU, and be passed back to that register). Given this specialized use of the ALU, it is common in CISC architectures to have a separate addressing ALU.
© Copyright 1995, 1996, 2001 Charles C. Weems Jr. All rights reserved.