Notes from Lecture 1
CmpSci 535: Computer Architecture

Instructor: Chip Weems

Office: CS-342

Phone: 545-3163

E-mail: weems@cs.umass.edu

Syllabus


Course Calendar


Computer Generations

0 - Mechanical / Electromechanical

1 - Vacuum tube

2 - Transistor

3 - Integrated circuit

4 - Very Large Scale Integration (VLSI) / Microprocessor

5 - Homogeneous parallel processors


Mechanical

Mechanical computers were built with trains of gears, much like clocks. Typically, they used decimal arithmetic, and each gear or wheel had ten positions. The hardest part of designing such a machine was to get the carry to propogate cleanly from one digit to the next (so that there wouldn't be any ambiguous, half visible, numbers showing in the display windows). The other difficulty was that the sheer amount of complexity of a large calculator, together with the friction of all of the gears, made construction very difficult prior to the advent of modern machining technology.

Storage in a mechanical computer was by the position of the gears. In the later electromechanical machines, relays were able to store some of the machine's state. The program, however, was always stored in a separate medium, typically a punched paper card or tape. Some analog mechanical computers could be programmed by changing the gear train, but this was really just equivalent to changing parameters to the program, since they generally just computed one type of function (e.g. differential equations).

Relays work on the principle that a voltage is applied to a coil, driving a magnetic rod (solenoid) outward so that a hinged or flexible electrical contact is forced to touch a fixed contact, thus closing a circuit. We thus have an electrically controlled switch.

The significance of the electrically controlled switch is that information (the state of switch in one place) can be transmitted over significant distances without loss, and without interference. In a mechanical system, carrying the state of one wheel to another at a distance involves long shafts and often extra gears to allow the shafts to bypass other shafts.

Also of major significance is that the relay is more naturally used with a binary number system (rather than decimal), because of the on/off nature of circuits.


Mechanical


Gunter's scale, based on Napier's bones, was the first slide rule. Multiplication and division could be done using sliding sticks insribed with a logarithmic scale.

Schickard's calculator was destroyed in a fire and never rebuilt -- we know of it only through a letter written to Johannes Kepler. If Schickard's claims are true, it was much more sophisticated than Pascal's box.

Pascal's box could add and subtract amounts of money. He invented it to aid his father, a tax collector. Note that even today, taxes are still a major application area for computers. Pascal's box was such a sensation that people even made and sold non-working replicas as showpieces. Several copies still exist in museums.

Liebniz's calculator was much like Pascal's but could also multiply. Division required a long sequence of steps. Interestingly, Liebniz's goal was to reduce thought to a logical abstraction that could be performed automatically -- although he didn't use the term, he was actually seeking to create artificial intelligence.

Lepine (1725), Hillerin (1730), Pereire(1751), Earl Stanhope (1775) etc. built calculators similar to those of Pascal and Liebniz, with minor improvements.

Jaquard's loom is programmable by feeding it a chain of cards with holes punched in. The loom can weave any pattern, including a portrait of the inventor. Although it doesn't calculate, it is the first programmable machine.

The Thomas Arithmometer, the first commercially manufactured mechanical calculator, remained in production until 1926.


Mechanical


Babbage's difference engine was an automated calculator for numerical tables.

The analytical engine was the first programmable computer, using punched cards for storing instructions. Babbage got the idea from the Jacquard automatic loom. Neither of Babbages engines were ever completed. But, in 1853, a difference enginebased on Babbage's design was built by George and Edward Scheutz in Sweden, and was sold to the Dudley Observatory in Albany, NY, for calculating astronomical tables. Babbage is also associated with other important figures in the history of computing -- see: Boole, DeMorgan, Lovelace.

Later on Herman Hollerith would use the same sort of cards as Babbage, but for entering data into a tabulating machine he built for the 1890 census; his company would eventually become IBM. The tabulator was also novel in its use of electricity to carry the information from the cards to the calculator.

Bush's Differential Analyzer was an electromechanical analog computer for computing differential equations. Programming was limited, and was accomplished by replacing gears in the drive mechanism.

Zuse's electromechanical calculators used relays and were very similar in concept to Babbage's analytical engine. They were the first working programmable computers. Zuse also had a plan for an electronic computer using 1500 vacuum tubes. Unfortunately, most of Zuse's early work was destroyed in World War II, although he continued to build computers after the war. Later in
life he became a painter.

Howard Aiken's Mark I was a 52 x 8 feet sized programmable calculator. It used decimal arithmetic, and was built largely from parts used in commercial tabulating machines. Addition took 0.3 seconds, multiplication took 6 seconds.

Pictures of some other electromechanical computers.


Vacuum Tube

A vacuum tube is, reasonably enough, a sealed glass tube containing a vacuum in which are present several electronic elements: the cathode, anode, grid, and filament. When the cathode and anode are heated by the filament, and a voltage is applied across them, current flows between the cathode and anode. If a grid is inserted between them, the flow can be controlled by changing the grid between a positive and negative voltage.

The grid voltage can be quite small, and the plate voltages can be quite high, thus providing an amplifying capability. More importantly for computers, switching the grid voltage causes the tube to act as a switch with respect to the plates. Thus, we have an electronically controlled switch that is much faster than a relay.

A type of vacuum tube also served as a popular storage mechanism, the Cathode Ray Tube (CRT). Other memory devices used during the period include mercury or glass delay lines, and magnetic core memory.

Vacuum tubes, however, are large, require a lot of power, and produce a lot of waste heat. In fact, for one rather large vacuum tube machine, it was once estimated that if its four turbine-powered air conditioners were to fail, the heat buildup in 15 minutes would be sufficient to melt the concrete and steel building containing it (of course, it would simply catch fire and stop working long before that). It has also been estimated that if a modern computer were built with vacuum tubes, it would be the size of the Empire State Building.


Vacuum Tube


John Atanasoff developed an electronic switch based on vacuum tubes, and used this in a special purpose computer that had capacitors as memories (essentially the same principle as modern dynamic RAM).

The COLOSSUS machines were developed by British Intelligence during WW II to crack coded messages. They also used vacuum tubes as logic elements. Much of their design remains secret.

ENIAC (Electronic Numerical Inegrator and Calculator) was developed at the Moore School of Engineering as a specialized programmable computer for computing ballistics tables for the Army. It was programmed by changing wires in patch panels, and flipping switches. John von Neumann became involved with ENIAC and saw the need for storing the program in the machine itself, resulting in the EDVAC (Electronic Discrete Variable Calculator) design.

The Manchester Automatic Digital Machine (MADM) was the first machine built with a stored program, but it was really just for testing a new memory device.

EDSAC (Electronic Delay Storage Automatic Calculator) was based closely on the EDVAC design and was the first true stored program computer to become operational. It used an ultrasonic glass delay line for a memory.


Vacuum Tube


UNIVAC, built by Presper Eckert and John Mauchley, from the Moore School, became the first commercially produced programmable digital computer. It used Cathode Ray Tubes (the type used in oscilloscopes) as its memory. It was based on the EDVAC design.

Von Neumann left the Moore School for the Institute for Advanced Studies at Princeton, where he became involved in another computer design (known as the IAS machine), which was also based on EDVAC. The JOHNNIAC, named in honor of von Neumann, was an IAS-like machine built by the Rand Corporation in Santa Monica, CA. There were several otehr machines built along this line, almost all a result of a Summer course given by Eckert and Mauchley at the end of the war, about their work with ENIAC and the EDVAC design. These included the ILLIAC (not to be confused with the ILLIAC IV parallel processor), MANIAC, WEIZAC, AVIDAC, ORACLE, ORDVAC.

The Whirlwind, built at MIT, is notable mostly for the development of magnetic core memory, which would eventually replace CRT and delay line storage during the 1960's.

The IBM 701 was their first commercial computer and grew out their work with Harvard on the last successor to the Mark I (the Mark IV). The 701 was developed to directly compete with the UNIVAC.

The IBM 709 was the last of the major vacuum tube computers. It was a faster 704, which had 4K 36- bit words of core memory. During this period, IBM also sold a model 650, which had a magnetic drum memory, but was low enough in cost that many were sold to universities -- it was the basis for the first user's community.


Transistor

The transistor, invented in 1948, performed the same basic function as the vacuum tube, but with much lower voltage and current, and very little waste heat. It is interesting to note that many engineers at that time pronounced it a useless device precisely because it couldn't handle the power of a tube!

By using a material called a semiconductor, which conducts electricity when a charge is applied to it and acts as an insulator when then the charge is removed, an electronic switch can be built. Current flows between the collector and emitter when a charge is applied to the base.

Transistors are also much smaller than tubes. Rather than being an inch in diameter and three inches long (the size of a typical tube), transistors are about 1/4 inch in diameter and 3/16 inch long. They also require fewer wires, since there are no filaments.

Computers of this era mostly used magnetic core memory, although registers were built from transistor circuits -- eventually leading to modern solid-state memories.

Still, a computer equivalent to a modern day microprocessor, built with transistors, would have occupied several floors of the Empire State Building. A typical machine of the period had 16K 32-bit words of core, and filled a large room.


Transistor


The TX-0 was the first computer built with transistors. It was an experimental system developed at MIT in 1955. It was developed to test the concepts of the TX-2, which was to be a major new computer, but the TX-2 wasn't completed.

One of the people who worked on the TX-0 was Ken Olsen, who went on to found Digital Equipment Corp. (DEC). Initially, DEC built transistorized logic modules called Flip Chips (printed circuit boards), which could be used to build specialized control logic. They also formed the basis of the PDP-1, the first computer built by DEC, and which was very similar to the TX-0. The PDP-1 was an 18-bit machine that resulted in a family of similar machines which culminated in the PDP-15. You can see a PDP-1, still playing Space War (the world's first video game), at the Computer Museum in Boston.

DEC had two other lines of machines, one with 12-bit words and the other with 36-bit words (in thosedays, the 8-bit ASCII code had not been standardized, nor had numerical representations, so there wasn't as strong an incentive to build machines with a word size as a multiple of 8 bits).

The 12-bit machines resulted in the PDP-8, which in 1967 was sold for $8900 (plus teletype). It became the worlds first minicomputer, and was affordable even by high-schools. The PDP-12 was a variant combining the PDP-8 with a Laboratory INstrument Computer (LINC) to form an affordable system for controlling experiments and gathering data in the laboratory.

The 36-bit machines resulted in the PDP-10, a large time-sharing mainframe.


Transistor


IBM switched to using transistors with the 7090, which had the same architecture as the 709. In fact, it was sometimes referred to as the 709T (for transistor). Like the PDP-6, it was a 36-bit machine. The 7094 came out a little later, and was especially oriented towards scientific computing using floating point. The 7094 formed the basis of a venture to build a huge new computer, called the Stretch. Before the Stretch project was completed, it nearly bankrupted IBM

IBM was also building a small machine for business (what the B stands for, after all), the 1401, which used BCD arithmetic (base 10, represented by 4-bit words) where operations were carried out digit-serially on values of arbitrary length. This same sort of scheme would later become the basis of most pocket calculators.

Another machine that IBM built in this period was the 1620, which was even simpler than the 1401. It earned the nickname "CADET" from its users, which stood for Can't Add, Doesn't Even Try. The 1620 did not have a normal ALU -- instead, it used table lookup, and the table was user- programmable. Thus, you could program it to compute many different functions in place of addition. Like the 1401, it used a BCD representation for numbers.

Control Data Corporation got its start by taking a design for a machine built for the military at another company, and turning it into a commercial product. The 1604, built in 1958, was Seymour Cray's first large design, and the earliest large transistorized computer on the market. After the 1604, there were two major product lines, the 3000 series of 24-bit or 48-bit machines (depending on model) and the 6000 series of 60-bit machines. The 6600 was Seymour Cray's first vector supercomputer design (1964), and is considered to be one of the first modern supercomputers.

The Burroughs B5000 was the first in a series of unusual machines with many features designed to simplify programming. It had many novel features such as a hardware stack and tagged memory. Burroughs also pioneered virtual memory, although they didn't coin the term.


Integrated Circuit

The concept behind the integrated circuit is that transistors can be formed by crossing two semiconducting materials on a silicon substrate. Wherever a "wire" or "line" of polysilicon crosses a line of silicon with ions diffused into it, a transistor is formed. If a charge is applied to the polysilicon line, current flows through the junction. If thecharge is taken away, the junction becomes non-conductive. Thus, we have another example of an electronically controlled switch.

The important point about the integrated circuit is that multiple transistors can be formed on a single substrate. Thus, a logic circuit that occupied a whole PC board can be reduced to fit on a single chip of silicon. Also, because the transistors can be connected directly on the chip, they can be smaller and need less power to communicate. Thus, IC's require less power, and generate less waste heat.

Early ICs contained just a few transistors. These were later called SSI, for Small Scale Integration. Later on, there were chips with a hundred or so transistors that were called Medium Scale Integration (MSI). These might contain a whole register, or part of an arithmetic unit. Then came Large Scale Integration (LSI) in which as many as a thousand transistors could be placed on a chip, so that fairly complicated building blocks fit into one IC.

The other novel use of the IC was pioneered by Texas Instruments as part of the Illiac IV parallel processor project: solid state memory. Just as John Atanasoff had used small capacitors as storage devices, it was found that lengths of IC "wire" also store charge, and could be used as memory. Thus, the IC revolutionized computer design in two ways: by shrinking the size of computers, and by making the memory technology compatible with the processing technology.


Integrated Circuit


IBM replaced the 1401 and 7090 with the System 360, an architectural family with machines at many different levels of cost and performance. Thes ranged from the 360/20 with an 8K-word core memory and no secondary storage to the 360/91with a large memory, high-speed floating point, and a vast array of peripherals. The machines ran the same instruction set, and from the 360/40 up, the same operating system. The 360 series was succeeded by the 370 series, 4300, 3080, 3090, etc., all having basically the same architecture. Another novelty of these machines is their virtualizability -- it is possible for the operating system to emulate an empty machine for each user. Thus, for example, one can even run the operating system as a task under another copy of the operating system.

DEC moved into integrated circuits with the PDP-8I. (The PDP-15 and PDP-10 also used integrated circuits.) The PDP-8 was the most successful of DEC's machines, but it was very limited in its capabilities. For example, special operations were required to access more than 4K words, and even then the limit was 32K words. The were several other versions of the 8, but all had roughly the same performance and just having different features or technology.

The "successor" to the PDP-8 and PDP-15 was the PDP-11, a 16-bit machine that could directly access 64KB and indirectly up to 256KB. Like the IBM 360, the PDP-11 was an architectural family ranging from the 11/05 (intended for embedded control applications) to the 11/70 (a super minicomputer). Actually, the PDP-8 lived on long after the PDP-11 went into production, eventually being sold as a single chip, the Intersil 6100 microprocessor.

The successor to the PDP-11 and PDP-10 (by then called the DECsystem 20) was the VAX (Virtual Architecure eXtension), a 32-bit machine drawing on the experience of both families. The first version, VAX-11/780, could even emulate the PDP-11. The VAX was designed as a mini- mainframe, and became DEC's main product line for several years. It appeared in versions ranging from workstations to multiprocessors.


Integrated Circuit


Texas Instruments Advanced Scientific Computer was a complex machine with many intersting features. It was intended to be a scientific supercomputer, as was used for applications such as analysis of seismic data for oil exploration. The ASC used pipelining of operations and interleaving of memory to achieve high processing speeds on long vectors of data. Another interesting feature was its complete lack of interrupts, permitting a shorter basic instruction cycle.

The CDC-Cyber was the integrated circuit version of the 6600 (and 7600) supercomputers. CDC tried to make these scientific machines into business data processors by adding instructions to support character string and BCD processing, but the result was not very successful. CDC also developed another supercomputer, the Star-100, but sold very few of these. It was about this time that Seymour Cray left CDC to form his own supercomputer company.

The University of Illinois had continued to develop machines after the initial ILLIAC. ILLIAC II was a successor to ILLIAC, but ILLIAC III was an entirely different machine -- a parallel processor for analyzing bubble chamber photographs for nuclear physics. It was destroyed in a fire before being completed, but in 1963 was well ahead of its time. The last and most notable of the ILLIAC series was the IV, a large parallel processor that broke a great deal of new ground (although it was obsolete by the time it was completed). Perhaps the greatest contribution was the development of solid-state memory for the ILLIAC IV. The ability to use the same technology for memory as for the processor resulted in the explosion of cheap, large memory machines that we see today.

The STARAN was the first commercially built massively parallel processor. In 1967 you could buy a system with up to 8192 processors and a sophisticated communication network. Its programming model was based on the notion of associative or content addressable processing. A modern version is still used in military aircraft radars.


VLSI / Microprocessor

Very Large Scale Integration (VLSI) was simply the next step beyond LSI, to thousands of transistors on a chip. For a while, a few people tried calling new levels of integration ULSI (Ultra), but the name never caught on.

Basically, the division between LSI and VLSI is the difference in design approach between the two: With LSI, you think in terms of standard modules that are wired together on a circuit board to build a computer or custom logic system. With VLSI, an entire system can be placed on a chip, and the design of chips is standard practice.

A test run of a small VLSI chip costs about $10,000, and something the size of a microprocessor costs about $300,000, so it's no longer the case that a prototype circuit can be hacked together and tweaked until it works, and then sent out for production. The current generation of designers works extensively with CAD tools and many levels of simulation before a design is cast in silicon. Test chips are expected to be fully functional (and actually are more than half of the time), and are typically used to analyze performance and yield before some final circuit tweaking and full production. imagine how much differently you would program if it cost you $300,000 each time you compiled your code and it took a month or more to get the object code back from the compiler for testing.


VLSI / Microprocessor


Intel developed the first microprocessor, the 4-bit 4004, in 1971, as a basis for a desktop calculator. In the next year, they produced the 8008, an 8-bit microprocessor to control a terminal. Fortunately for Intel, the terminal manufacturer chose not to use the 8008 -- thus forcing Intel to look for other uses for the device. It was not easy to sell a microprocessor in 1972 -- most engineers designed in MSI and LSI, and were leery of this stuff called software. However, computer hobbyists caught on to the potential and a new industry sprang up overnight.

Intel followed the 8008 with the 8080, a more powerful 8-bit system and the 8085 which required fewer support chips.

Shortly after that, Zilog introduced the Z80, which was compatible with the 8085 but had more registers and a more symmetrical instruction set. Zilog tried to follow up with the Z8000, a much more powerful processor, which was not commercially successful. They also marketed the Z8, which in one version had a BASIC interpreter in on-chip ROM, but that version was likewise unsuccessful. A different configuration of the Z8 architecture was produced as a microcontroller, and in that form it has had a long life in use within disk drive controllers, network interfaces, and other embedded applications.

Intel was not the first to jump to 16-bits (National Semiconductor's IMP-16 was actually contemporary with the 8085), but they were the first to market one successfully. It is widely agreed that the Intel architectures are poorly conceived, but that their success was always due to the fact that Intel has been the first to bring a usable product to market. The 80186 and 80286 are extensions of the 8086 16-bit processor. The 8088 is an 8086 with an 8-bit external data path. The 80386 is a 32-bit extension of the family and the 80486 adds floating point and virtual memory support to the processor. The Pentium (IA-32) family added cache memory and pipelined execution. Subsequent generations of the Pentium have continued to increase on-chip cache and pipeline depth (the P4 has a 20-stage branch pipe) as well as adding special features such as multimedia instructions. It is interesting to note that Intel's "family" of processors is not architecturally consistent; there is upward compatibility from the 8086, but not downward compatibility.

Intel recognized that the legacy of the x86/IA32 would make it difficult to extend the architecture to 64 bits, and so they set out to design an entirely new preocessors family, called Itanium. The IA-64 draws heavily on two prior architectures, the Cydrome Cydra series and the Multiflow series of Very Long Instruction Word (VLIW) machines. In a VLIW architecture, each instruction word carries multiple instructions that can be executed in parallel. This is in some ways an extension of the Cray designs of the 1960s which could pack as many as four instructions into a word, although the Cray dispatched those instructions in a pipelined fashion. Intel calls their variation of VLIW Explicitly Parallel Instruction Computing (because of the commercial failure of earlier VLIW architectures, due to a variety of market factors, use of the VLIW acronym carried negative connotations that Intel did not want to associate with its new product line). The Itanium packs three instructions in a 128-bit instruction word, and operates on 64-bit data. It also provides far more registers (128 integer, 128 floating point) than previous designs. Although the instruction set is more RISC-like, the architecture is ver complex. It features techniques called predication (to allow both outcomes of a branch to execute in parallel, picking the correct one when the branch direction becomes known) and speculation (to allow operations to proceed under the assumtion that data is available, and correct the result in cases where the data was delayed). These enhance the throughput of the processor pipelines. As a result of the complexity, the first generation of Itanium (Merced) was actually slower than the corresponding IA-32 generation. The second generation is comparable in performance to the contemporary P4, and by the third generation Intel expects Itanium to pull ahead on computationally intensive applications, such as scientific computing and servers, although probably not on desktop applications. Considering that that Pentium line has generally delivered less performance than competing RISC designs such as MIPS and PowerPC, the Itanium has a way to go before it is sold on the basis of speed. However, Intel has the advantage that it sells the processors to other companies to package and market, while its competitors keep their designs in house.


VLSI / Microprocessor


About a year after Intel introduced the 8080, Motorola came out with a direct competitor, the 8-bit 6800. It was generally acknowledged that the 6800 was a better architecture, but it was late getting into the market place and had some flaws that made it difficult to use. Some of the 6800 team left Motorola and formed MOS Technology to build an improved 6800, which they called the 6502. That chip was moderately popular because it was quickly picked up and sold as a single board computer (SBC) -- the KIM and SYM were the most popular versions. This allowed many engineers (and students) to learn and try out microprocessors for their applications. In particular, Steven Jobs and Steven Wozniak started a company out of their garage by building a 6502-based personal computer called the Apple II.

Motorola decided to leapfrog the competition by producing a 32-bit processor, the 68000. But although it was internally a 32-bit design, it could access only 16-bits at once. Motorola was also beaten to the punch by Intel, which delivered its 16-bit 8086 slightly earlier, and the 8086 could run most 8080 code whereas the 68000 was a complete break from the 6800. The 68010 fixed a couple of minor problems in the 68000 that made it impossible to write a virtual operating system for it. The 68020 added a 32-bit external data path, and virtual memory support through a coprocessor. The 68030 added on-chip virtual memory support and cache. The 68040 increases the performance of the 68030.

DEC was slow to jump into the microprocessor business, having been caught somewhat off guard by it. They licensed the PDP-8 architecture to Intersil to produce the 6100, and the PDP-11 architecture to Western Digital to produce a 3-chip implementation which DEC sold as the PDP-11/03.

Finally, DEC developed its own microprocessor, an implementation of the VAX which became the MicroVAX and eventually the VAXstation product line.

There were several other microprocessors developed, most of which were rather unremarkable. Texas Instruments, however, produced a novel design called the 9900. In that machine, there was only one register -- the register pointer. All other registers were actually just memory locations. This made the machine somewhat slow at processing data, but made context switching (for subroutine calls, interrupts, etc.) very fast because saving the machine state required only that the register pointer be saved.


VLSI / Microprocessor


All of the microprocessors to this point had been Complex Instruction Set Computers (CISC). There was a movement in architectures toward simplifying instruction sets and the corresponding control logic, so that the basic machine cycle would be faster. Code would be larger, and more instructions would be required for infrequently used operation, but overall speed would increase. The first of these new microprocessors to be sold commercially was the SPARC, developed at UC Berkeley. Shortly after that came the MIPS R2000, developed at Stanford. IBM also had an entry, the RISC 6000, which was operating in a laboratory well before the other two. There are a lot of claims and counterclaims about who first developed the concept of Reduced Instruction Set Computers (RISC). In reality, the first machines were mostly RISC designs, and it is really a question of who rediscovered this and repopularized the notion. The folks at Berkeley coind the popular terms RISC and CISC in the 1980a to distinguish their approach from other machines of the day (such as the VAX). But a close look at the Cray designs of the 1960s reveals them to have many features in common with RISC architectures. At that time, however, there was nothing else sufficeintly complex for the Cray to be considered a "reduced" architecture.

IBM's RS6000 became the Power family of architectures, which today are used in high-end workstations, parallel processors, and servers. In the early 1990's, IBM, Motorola, and Apple joined forces to revise the Power architecture into a 32-bit design which they called the PowerPC (first produced in 1993). It is interesting to note that in its first quarter of selling Power Macintosh computers, Apple sold more RISC-based desktop machines than had been previously sold by all other manufacturers combined. At the time, Apple accounted for 10% of the PC market, and other RISC systems were being sold as high-performance workstations. The PowerPC continues to outsell all other RISC designs. When these low volumes are considered in comparison to the cost of desgning and producing a new microprocessor, one must conclude that either the business isn't viable for the smaller producers, or that Intel is making huge profits, or both. This is a major factor that leads analysts to believe that diversity in processor architectures is going to continue to decline.

The MIPS and SPARC architectures continue to be produced. However, MIPS has said that it will switch to using the Itanium design once that processor reaches performance levels comparable to its own products. Sun Microsystems continues to use SPARC processors in its servers, where throughput is more significant than raw computational power. A 32-bit version of the MIPS continues to be poluar in embedded applications, as are some versions of the SPARC.

DEC introduced yet another RISC machine, their 64-bit Alpha, which executed at 200 Million Instructions Per Second (MIPS). At the same time, competitors were struggling to reach 33MHz and 50 MHz. The Alpha philosophy was to exploit the simplicity of RISC to push the clock rate higher, and only later to add sophistication to the processor to imporve performance in other ways. Increasing clock rate improves overall performance as long as memory and I/O are able to keep up with the increase in CPU speed. The Alpha went through three generations before DEC was sold to Compaq Computer Corp., which produced a fourth generation of the Alpha before cancelling further development in 2001. The cost of maintaining a cutting edge fabrication facility was one of the factors that contributed to DEC going out of business, and their fabrication plants and many of their patents were sold to Intel when Compaq bought the company.

Intel's entry in the RISC market was the i860, which was originally a back-door project, then it obtained corporate support for a while, before falling out of favor again. It was especially popular with high performance embedded applications and graphics processing. At one point SGI had a graphics engine wiht 16 i860s running in parallel. Intel and Hewlett-Packard signed an agreement to merge some of the HP Precision Architecture into the Intel Itanium family processor.

Other notable designs are the Intel 432, a very CISC-oriented machine that had many OS kernal operations built into hardware (and nearly bankrupted the company), and the many DSP (Digital Signal Processor) chips, such as the Texas Instruments TMS320Cxx series, which are widely used in embedded applications (such as modems, digital phones, etc.).


Parallel Processors


Staran was the first commercially successful massively parallel processor, and is still being sold as the ASPRO.

MPP was a one-of-a-kind processor built for NASA to process the vast amount of data being gathered by earth-resources satellites. It had 16,384 1-bit processors arranged in a grid. Another similar design is the AMT/CPP DAP.

The Transputer was a parallel processor cell with communication channels and the ability to be configured into different network topologies. The first version was marginally successful. A second generation was repeatedly set back by delaysand never went into volume production.

The Connection Machine CM-2 was like the MPP with the addition of a second communication network for routing data. A significant contribution of the system was the software support for virtual processors, which simplified programming of problems with large data sets. The CM-5 used up to 64K SPARC processors (none was ever built with this many) arranged in a fat-binary-tree topology with separate networks for data, control, and diagnostics. It operated in a mode known as Single Program, Multiple Data (SPMD) in which a single program is replicated across the nodes, but unlike SIMD, the processors can take branches independently and communicate with each other asynchronously. Neither machine is in production any longer.

Based on work at CalTech, Intel developed a multiprocessor consisting of up to 128 of their microprocessors (80286, 80386, i860) connected via a network. Based on work at Carnegie Mellon, they also developed a systolic array processor, the iWarp. They then built a machine (Paragon) that combined the communication facilities of iWarp with the processing power of the i860. Subsequently, they built a one-of-a-kind parallel processor for Sandia National Laboratory containing 9632 Pentium-Pro processors. This was the first general purpose machine to exceed 1 trillion floating point operations per second of peak performance. It has since been exceeded by production machines from IBM (SP series).

Since the early 1990s, parallel processing has largely turned to inexpensive clusters of PCs operating with standard or slightly enhanced networking hardware. Although clusters suffer severe performance penalties (sometimes running slower than a single processor) for some applications, they are highly cost effective for applications in which computations are largely independent of one another. This phenomenon has undercut the supercomputing industry, and contributed to Cray Research (which had shifted to parallel processing, selling a system (T3D, and T3E) based on the DEC Alpha and a high-performance custom interconnection network) being sold to SGI and then later to Tera Computer Systems (which had worked for many years to deliver a high-performance multithreaded parallel processor supercomputer).

A small segment of the user community still depends on the kind of processing that can only be obtained with vector supercomputers like the Cray. Even high performance parallel processors like the SP and Origin are not suitable to their applications, which require an extremely high rate of communication among the processors. But there is not enough of a market to justify the huge investment of resources required to build a new supercomputer from scratch. Remaining manufacturers of such machines besides Tera/Cray include Hitachi, NEC, and Fujitsu.

Other formerly produced parallel systems include nCube, Kendall Square Research, WaveTracer, MasPar, Butterfly, etc.


© Copyright 1995, 1996 Charles C. Weems Jr. All rights reserved.

Back to Chip Weems' home page.
Back to courses index page.
Back to Computer Science Department home page.