Much of this course is about using micro-controllers to control other devices. At this point in your computer science student career, you have undoubtedly had a fair bit of experience programming. It is less likely that you have had a chance to use a computer to control other devices (like the LEDs in lab 0). This course will teach you how to do just that. We will also be writing simple device drivers which expand the kernel's ability to communicate with devices you build.
Embedded computing tailors the environment for a specific problem. In this course, we will be dealing extensively with AVR micro-controllers, produced by Atmel, and programming them on their STK500 development boards. Though the development boards are quite small, the AVR programs will still need to be developed on a host computer, and downloaded to the AVR. Note that the computers you are using for the first three labs will be well suited for this task. However, if you happen to have a smaller computer (like a laptop or wearable computer) which is running Debian Linux, you may prefer to use this computer. The combination of a laptop which the STK500 would be a very portable and practical working environment. Just be sure that the computer has at least one serial port, as the STK500 uses a serial port for the programming of the AVR.
The 4004 had 46 instructions, using only 2,300 transistors in a 16-pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8 microseconds) - the original goal was 1MHz, to allow it to compute BCD arithmetic as fast (per digit) as a 1960's era IBM 1620.
The 4040 (1972) was an enhanced version of the 4004, adding 14 instructions, larger (8 level) stack, 8K program space, and interrupt abilities (including shadows of the first 8 registers). Should Pioneer 10 and Pioneer 11 ever be found by an extraterrestrial species, the 4004 will represent an example of Earth's technology.
[for additional information, see Appendix E]
It was bit serial to reduce connections between chips, with highly parallel design and high clock rate to compensate. Words were 20 bits (required by the precision of the sensor and control values) and ALU units could perform operations on input bits as they were read in, while bits of the previous result was read out. "Steering Logic" (SL) units switched input signals to output lines (and added or subtracted, if two inputs went to one output), which could be directed to multiplication, division, and special logic units (which acted a little like a Transfer Triggered Architecture). Bits read serially from the ROMS (eight banks with 128 20-bit words, each with its own program counter) directed the data movement and unit operations, but had to be synchronized with data movement making programming difficult (basically microcode). RAM (called Random Access Storage) consisted of units with sixteen 20-bit words. Programming consisted of using the SLs to direct instruction and data words to the function units, which could be hooked to other function units in a pipeline, along with other pipelines in parallel. A separate set of eight ROMs could be used for data.
It took until 1998 to declassify a paper on the 1970 design. Although impressively elegant, it probably didn't warrant that length of secrecy.
It included a 4-bit accumulator, 4-bit Y register and 2 or 3-bit X register, which combined to create a 6 or 7 bit index register for the 64 or 128 nibbles of on chip RAM. A 1-bit status register was used for various purposes in different contexts. The 6-bit PC combined with a 4 bit page register and an optional 1 bit bank ('chapter') register to produce 10 or 11 address bits to 1KB or 2KB of on-chip program ROM. There was also a 6-bit subroutine return register and 4-bit page buffer, used as the destination on a branch, or exchanged with the PC and page registers for a subroutine (amounting to a 1-element stack, branches could not be performed within a subroutine).
An interesting feature of the PC is it was incremented using a feedback shift register, not a counter, so instructions were not consecutive in memory, but since all memory was internal, this was not a problem. Instructions were 8 bits with twelve hardwired, and with a 31X16 element PLA allowing 31 custom microprogrammed instructions. All hardwired instructions were single cycle, and no interrupts were allowed.
It gained fame in the movie "ET: The Extraterrestrial" as the brains in the Texas Instruments "Speak and Spell" educational toy.
The 8080 was used in the Altair 8800, the first widely-known personal computer (though the definition of 'first PC' is fuzzy. Some claim that the 12-bit LINC (Laboratory INstruments Computer) was the first 'personal computer'. Developed at MIT (Lincoln Labs) in 1963 using DEC components, it inspired DEC to design its own PDP-8 in 1965, also considered an early 'personal computer'). 'Home computer' would probably be a better term here, though).
Intel updated the design with the 8085 (1976), which added two instructions to enable/disable three added interrupt pins (and the serial I/O pins), and simplified hardware by only using +5V power, and adding clock generator and bus controller circuits on-chip.
Clock speeds ranged from the original Z-80 2.5MHz to the Z80-H (later called Z80-C) at 8MHz, and later a CMOS version at 10MHz.
Like many processors (including the 8085), the Z-80 featured many undocumented instructions. In some cases, they were a by-product of early designs (which did not trap invalid op codes, but tried to interpret them as best they could), and in other cases chip area near the edge was used for added instructions, but fabrication made the failure rate high. Instructions that often failed were just not documented, increasing chip yield. Later fabrication made these more reliable.
But the thing that really made the Z-80 popular in designs was the memory interface - the CPU generated its own RAM refresh signals, which meant easier design and lower system cost, the deciding factor in its selection for the TRS-80 Model 1. That and its 8080 compatibility, and CP/M, the first standard microprocessor operating system, made it the first choice of many systems.
Embedded variants of the Z-80 were also produced. Hitachi produced the 64180 (1984) with added components (two 16 bit timers, two DMA controllers, three serial ports, and a segmented MMU mapping a 20 bit (1M) address space to any three variable sized segments in the 16 bit (64K) Z-80 memory map), a design Zilog and Hitachi later refined to produce the Z-180 and HD64180Z (1987?) which were compatible with Z-80 peripheral chips, plus variants (Z-181, Z-182). The Z-280 was a 16 bit version introduced about July, 1987 (loosely based on the ill-fated Z-800), with a paged (like Z-180) 24 bit (16M) MMU (8 or 16 bit bus resizing), user/supervisor modes and features for multitasking, a 256 byte (4-way) cache, 4 channel DMA, and a huge number of new op codes tacked on (total of almost 3,500, including previously undocumented Z-80 instructions), though the size made some very slow. Internal clock could be run at twice the external clock (ex. 16MHz CPU with a 8MHz bus), and additional on-chip components were available. A 16/32 bit Z-380 version also exists (1994) with added 32-bit linear addressing mode (16-bit mode is Z-80 and Z-180 binary compatible, but not Z-280 compatible).
Rabbit Semiconductor's Rabbit 2000 (1999/2000?) with a Z-80 derived instruction set which drops some instructions (mostly I/O, some less useful instructions), and adds others (16-bit data, computed address). It also drops dynamic RAM support, because embedded systems more often use static RAM, and adds serial, parallel, and inter-processor communication units. Program space is extended to 20 bits using an 8-bit page register, rather than the Z-180's MMU.
The Z-8 (1979) was an embedded processor with on-chip RAM (actually a set of 124 general and 20 special purpose registers) and ROM (often a BASIC interpreter), and is available in a variety of custom configurations up to 20MHz. Not actually related to the Z-80.
Unlike the 8080 and its kind, the 6502 (and 6800) had very few registers. It was an 8 bit processor, with 16 bit address bus. Inside was one 8 bit data register, two 8 bit index registers, and an 8 bit stack pointer (stack was preset from address 256 ($100 hex) to 511 ($1FF)). It used these index and stack registers effectively, with more addressing modes, including a fast zero-page mode that accessed memory addresses from address 0 to 255 ($FF) with an 8-bit address that speeded operations (it didn't have to fetch a second byte for the address).
Back when the 6502 was introduced, RAM was actually faster than microprocessors, so it made sense to optimize for RAM access rather than increase the number of registers on a chip. It also had a lower gate count (and cost) than its competitors.
The 650x also had undocumented instructions, including JAM, which simply causes the CPU to freeze, requiring a hardware reset or power cycle to restart.
The CMOS 65C02/65C02S fixed some original 6502 design flaws, and the 65816 (officially W65C816S, both designed by Bill Mensch of Western Design Center Inc.) extended the 650x to 16 bits internally, including index and stack registers, with a 16-bit direct page register (similar to the 6809), and 24-bit address bus (16 bit registers plus 8 bit data/program bank registers). It included an 8-bit emulation mode. Microcontroller versions of both exist, and a 32-bit version (the 65832) is planned. Various licensed versions are supplied by GTE (16 bit G65SC802 (pin compatible with 6502), and G65SC816 (support for VM, I/D cache, and multiprocessing)) and Rockwell (R65C40), and Mitsubishi has a redesigned compatible version. The 6502 remains surprisingly popular largely because of the variety of sources and support for it.
The 6502-based Apple II line (not backwards compatible with the Apple I) was among the first microcomputers introduced and became the longest running PC line, eventually including the 65816-based Apple IIgs The 6502 was also used in the Nintendo entertainment system (NES), and the 65816 was in the 16-bit successor, the Super NES, before Nintendo switched to MIPS embedded processors.
Other features were one of the first multiplication instructions of the time, 16 bit arithmetic, and a special fast interrupt. But it was also highly optimized, gaining up to five times the speed of the 6800 series CPU. Like the 6800, it included the undocumented HCF (Halt Catch Fire) instruction to incrementally strobe the address lines for bus testing ("jump to accumulator (A or B)" in the 6800, implemented and documented as $00 in the 68HC11 which is described below).
The 6800 and 6809, like the 6502 series, used a single clock cycle to generate the timing for four internal execution stages by using the rising and falling edges of the base cycle (not just rising edges), and another clock 90 degrees out of phase (giving two rising and two falling edges per cycle) - this allowed instructions to execute in one external 'cycle' rather than four for most CPUs, such as the 8080, which used the external clock directly, so an equivalent instruction would take four cycles, meaning a 2MHz 6809 would be roughly equivalent to a 8MHz 8080. This is different from clock-doubling, which uses a phase-locked-loop to generate a faster internal clock (for the CPU) which is synchronized with an external clock (for the bus). Motorola later produced CPUs in this line with a standard four-cycle clock. The 680x and 650x only accessed memory every other cycle, allowing a peripheral (such as video, or even a second cpu) to access the same memory without conflict.
The 6800 lived on as well, becoming the 6801/3, which included ROM, some RAM, a serial I/O port, and other goodies on the chip (as an embedded controller, minimizing part counts - but expensive at 35,000 transistors. The 6805 was a cheaper 6801/3, dropping seldom used instructions and features). Later the 68HC11 version (two 8 bit/one 16 bit data register, two 16 bit index, and one 16 bit stack register, and an expanded instruction set with 16 bit multiply operations) was extended to 16 bits as the 68HC16 (additional 16-bit accumulator E, three index registers IX, IY, IZ, plus extension registers to add 4 bits to addresses and accumulator E for a 1M address space, plus 16-bit multiply registers HR and IR and 36-bit AM accumulator), and a lower cost 16 bit 68HC12 (May 1996). It remains a popular embedded processor (with over 2 billion 6800 variants sold), and radiation hardened versions of the 68HC11 have been used in communications satellites. But the 6809 was a very fast and flexible chip for its time, particularly with the addition of the OS-9 operating system.
The Am2901, from Advanced Micro Devices, was a popular 4-bit-slice processor. It featured sixteen 4-bit registers and a 4-bit ALU, and operation signals to allow carry/borrow or shift operations and such to operate across any number of other 2901s. An address sequencer (such as the 2910) could provide control signals with the use of custom microcode in ROM.
The Am2903 featured hardware multiply.
Legend holds that some Soviet clones of the PDP-11 were assembled from Soviet clones of the Am2901.
AMD also produced what is probably the first floating point "coprocessor" for microprocessors, the AMD 9511 "arithmetic circuit" (1979), which performed 32 bit (23 + 7 bit floating point) RPN-style operations (4 element stack) under CPU control - the 64-bit 9512 (1980) lacked the transcendental functions. It was based on a 16-bit ALU, performed add, subtract, multiply, and divide (plus sine and cosine), and while faster than software on microprocessors of the time (about 4X speedup over a 4MHz Z-80), it was much slower (at 200+ cycles for 32*32->32 bit multiply) than more modern math coprocessors are.
It was used in some CP/M (Z-80)
systems (I heard it was used on an S-100 bus math card for NorthStar
systems, but that was in fact used a 74181 BCD (Binary Coded Decimal)
ALU, and ten PROM chips for microcode). Calculator circuits (such as the
National Semiconductor MM57109 (1980), actually a 4-bit NS COP400
processor with floating point routines in ROM) were also sometimes used,
with emulated keypresses sent to it and results read back, to simplify
programming rather than for speed.
While the 8048 used 1-byte instructions, the 8051 has a more flexible 2-byte instruction set. It has eight 8-bit registers, plus an accumulator A. Data space is 128 bytes accessed directly or indirectly by a register, plus another 128 above that in the 8052 which can only be accessed indirectly (usually for a stack). External memory occupies the same address space, and can be accessed directly (in a 256 byte page via I/O ports) or through the 16 bit DPTR address register much like in the RCA 1802. Direct data above location 32 is bit-addressable. Data and program memory share the address space (and address lines, when using external memory). Although complicated, these memory models allow flexibility in embedded designs, making the 8051 very popular (over 1 billion sold since 1988).
The Siemens 80C517 adds a math coprocessor to the CPU which provides 16 and 32 bit integer support plus basic floating point assistance (32 bit normalize and shift), reminiscent of the old AMD 9511. The Texas Instruments TMS370 is similar to the 8051, Adding a B accumulator and some 16 bit support.
The PIC has a large register set (from 25 to 192 8-bit registers, compared to the Z-8's 144). There are up to 31 direct registers, plus an accumulator W, though R1 to R8 also have special functions - R2 is the PC (with implicit stack (2 to 16 level)), and R5 to R8 control I/O ports. R0 is mapped to the register R4 (FSR) points to (similar to the ISAR in the F8, it's the only way to access R32 or above).
The 16x is very simple and RISC-like (but less so than the RCA 1802 or the more recent Atmel AVR microcontroller. It has only 33 fixed length 12-bit instructions, including several with a skip-on-condition flag to skip the next instruction (for loops and conditional branches), producing tight code important in embedded applications. It's marginally pipelined (2 stages - fetch and execute) - combined with single cycle execution (except for branches - 2 cycles), performance is very good for its processor category.
The 17x has more addressing modes (direct, indirect, and relative - indirect mode instructions take 2 execution cycles), more instructions (58 16-bit), more registers (232 to 454), plus up to 64K-word program space (2K to 8K on chip). The high end versions also have single cycle 8-bit unsigned multiply instructions.
The PIC 16x is an interesting look at an 8 bit design made with slightly newer design techniques than other 8 bit CPUs in this list - around 1978 by General Instruments (the 1650, a successor to the more general 1600). It lost out to more popular CPUs and was later sold to Microchip Technology, which still sells it for small embedded applications. An example of this microprocessor is a small PC board called the BASIC Stamp, consisting of 2 ICs - an 18-pin PIC 16C56 CPU (with a BASIC interpreter in 512 word ROM (yes, 512)) and 8-pin 256 byte serial EEPROM (also made by Microchip) on an I/O port where user programs (about 80 tokenized lines of BASIC) are stored.
The PDP-8 continued for a while in certain applications, while the PDP-10 (1967) was a higher capacity 36-bit mainframe-like system (sixteen general registers and floating point operations), much adored and rumored to have souls.
The PDP-11 had eight general purpose 16-bit registers (R0 to R7 - R6 was also the SP and R7 was the PC). It featured powerful register oriented (little-endian, byte addressable) addressing modes. Since the PC was treated as a general purpose register, constants were loaded using an indirect mode on R7 which had the effect of loading the 16 bit word following the current instruction, then incrementing the PC to the next instruction before fetching. The SP could be accessed the same way (and any register could be used for a user stack (useful for FORTH)). A CC (or PSW) register held results from every instruction that executed.
Adjacent registers could be implicitly grouped into a 32 bit register for multiply and divide results (Multiply result stored in two registers if destination is an even register, not if it's odd. Divide source must be grouped - quotient is stored in high order (low number) register, remainder in low order).
A floating point unit could be added which contains six 64 bit accumulators (AC0 to AC5, can also be used as six 32-bit registers - values can only be loaded or stored using the first four registers).
PDP-11 addresses were 16 bits, limiting program space to 64K, though an MMU could be used to expand total address space (18-bits and 22-bits in different PDP-11 versions).
The LSI-11 (1975-ish) was a popular microprocessor implementation of the PDP-11 using the Western Digital MCP1600 microprogrammable CPU, and the architecture influenced the Motorola 68000, NS 320xx, and Zilog Z-8000 microprocessors in particular. There was also a 32-bit PDP-11 plan as far back as its 1969 introduction. The PDP-11 was finally replaced by the VAX architecture, (early versions included a PDP-11 emulation mode, and were called VAX-11).
Looking back it was a logical design decision, since most 8 bit processors featured direct 16 bit addressing without segments.
The 68000 had sixteen 32-bit registers, split into eight data and address registers. One address register was reserved for the Stack Pointer. Data registers could be used for any operation, including offset from an address register, but not as the source of an address itself. Operations on address registers were limited to move, add/subtract, or load effective address.
Like the Z-8000, the 68000 featured a supervisor and user mode (each with its own Stack Pointer). The Z-8000 and 68000 were similar in capabilities, but the 68000 was 32 bit units internally (16 bit ALUs, making some 32-bit operations slower than 16-bit - two in parallel for 32-bit data, one for addresses), making it faster and eliminating forced segments. It was designed for expansion, including specifications for floating point and string operations (floating point was added in the 68040 (1991), with eight 80 bit floating point registers compatible with the 68881/2 coprocessor). Like many other CPUs of the time, the 68000 could fetch the next instruction during execution (a 2 stage pipeline). An instruction prefix (0xF) indicated coprocessor instructions (similar to the 80x86), so the coprocessor could "listen" to the instruction stream, and execute instructions it recognized, without a coprocessor bus.
The 68010 (1982) added virtual memory support (the 68000 couldn't restart interrupted instructions) and a special loop mode - small decrement-and-branch loops could be executed from the instruction fetch buffer. The 68020 (1984) expanded external data and address bus to 32 bits, simple 3-stage pipeline, and added a 256 byte cache (loop buffer), with either segmented (68451?) or paged (68851, it supported two level pages (logical, physical) rather than the segment/page mapping of the Intel 80386 and IBM S/360 mainframe) memory management unit. The 68020 also added a coprocessor interface. The 68030 (1987) integrated the paged MMU onto the chip . The 68040 (January 1991) added fully cached Harvard busses (4K each for data and instructions, with new MMU), 6 stage pipeline, and on chip FPU (subset of the 68882, with some operations emulated).
Someone told me a Motorola techie indicated the 68000 was originally planned to use the IBM S/360 instruction set, but the MMU and architectural differences make this unlikely. The 68000 design was later involved in microprocessor versions of the IBM S/370.
The 68060 (April 1994) expanded the design to a superscalar version, like the Intel Pentium and NS320xx (Swordfish) series before it. Like the National Semiconductor Swordfish, and later the Nx586, AMD K5, and Intel's "Pentium Pro", the the third stage of the 10-stage 68060 pipeline translates the 680x0 instructions to a decoded RISC-like form (stored in a 16 entry buffer in stage four). There is also a branch cache, and branches are folded into the decoded instruction stream like the AT&T Hobbit and other more recent processors, then dispatched to two pipelines (three stages: Decode, addr gen, operand fetch) and finally to two of three execution units - 2 integer, 1 floating point) before reaching two 'writeback' stages. Cache sizes are doubled over the 68040.
The 68060 also also includes many innovative power-saving features (3.3V operation, execution unit pipelines could actually be shut down, reducing power consumption at the expense of slower execution, and the clock could be reduced to zero) so power use is lower than the 68040 (4-6 watts vs. 3.9-4.9). Another innovation is that simple register-register instructions which don't generate addresses may use the the address stage ALU to execute 2 cycles early.
The embedded market became the main market for the 680x0 series after workstation vendors (and the Apple Macintosh) turned to faster load-store processors, so a variety of embedded versions were introduced. Later, Motorola designed a successor called Coldfire (early 1995), in which complex instructions and addressing modes (added to the 68020) were removed and the instruction set was recoded, simplifying it at the expense of compatibility (source only, not binary) with the 680x0 line.
The Coldfire 52xx (version 2 - the 51xx version 1 was a 68040-based/compatible core) architecture resembles a stripped (single pipeline) 68060, The 5 stage pipeline is literally folded over itself - after two fetch stages and a 12-byte buffer, instructions pass through the decode and address generate stages, then loop back so the decode becomes the operand fetch stage, and the address generate becomes the execute stage (so only one ALU is required for address and execution calculations). Simple (non-memory) instructions don't need to loop back. There is no translator stage as in the 68060 because Coldfire instructions are already in RISC-like form. The 53xx added a multiply-accumulate (MAC) unit and internal clock doubling. The 54xx adds branch and assignment folding with other instructions for a cheap form of superscalar execution with little added complexity, and uses a Harvard architecture for faster memory access, plus enhancements to the instruction set to improve code density, performance, and to add flexibility to the MAC unit.
At a quarter the physical size and a fraction of the power consumption, Coldfire is about as fast as a 68040 at the same clock rate, but the smaller design allows a faster clock rate to be achieved.
The Macintosh was to include the best features of the Lisa, but at an affordable price - in fact the original Macintosh came with only 128K of RAM and no expansion slots. Cost was such a factor that the 8 bit Motorola 6809 was the original design choice, and some prototypes were built, but they quickly realized that it didn't have the power for a GUI based OS, and they used the Lisa's 68000, borrowing some of the Lisa low level functions (such as graphics toolkit routines) for the Macintosh.
Competing personal computers such as the Amiga and Atari ST, and
early workstations by Sun, Apollo, NeXT and most others also used 680x0
CPUs, including one of the earliest workstations, the Tandy TRS-80
Model 16, which used a 68000 CPU and Z-80
for I/O and VM support - the 68000 could not restart an instruction
stopped by a memory exception, so it was suspended while the Z-80 loaded
the page. Early Apollo workstations used a similar solution with a
second 68000 handling paging.
It featured four 16 bit general registers, which could also be accessed as eight 8 bit registers, and four 16 bit index registers (including the stack pointer). The data registers were often used implicitly by instructions, complicating register allocation for temporary values. It featured 64K 8-bit I/O (or 32K 16-bit) ports and fixed vectored interrupts. There were also four segment registers that could be set from index registers.
The segment registers allowed the CPU to access 1 meg of memory through an odd process. Rather than just supplying missing bytes, as most segmented processors, the 8086 actually added the segment registers ( X 16, or shifted left 4 bits) to the address. As a strange result of this unsuccessful attempt at extending the address space without adding address bits, it was possible to have two pointers with the same value point to two different memory locations, or two pointers with different values pointing to the same location, and limited typical data structures to less than 64K. Most people consider this a brain damaged design (a better method might have been that developed for the MIL-STD-1750 MMU).
Although this was largely acceptable for assembly language, where control of the segments was complete (it could even be useful then), in higher level languages it caused constant confusion (ex. near/far pointers). Even worse, this made expanding the address space to more than 1 MB difficult. The 80286 (1982?) expanded the design to 32 bits only by adding a new mode (switching from 'Real' to 'Protected' mode was supported, but switching back required using a bug in the original 80286, which then had to be preserved) which greatly increased the number of segments by using a 16 bit selector for a 'segment descriptor', which contained the location within a 24 bit address space, size (still less than 64K), and attributes (for Virtual Memory support) of a segment.
But all memory access was still restricted to 64K segments until the 80386 (1985), which included much improved addressing: base reg + index reg * scale (1, 2, 4 or 8 bits) + displacement (8 or 32 bit constant = 32 bit address) in the form of paged segments (using six 16-bit segment registers), like the IBM S/360 series, and unlike the Motorola 68030). It also had several processor modes (including separate paged and segmented modes) for compatibility with the previous awkward design. In fact, with the right assembler, code written for the 8008 can still be run on the most recent Pentium Pro. The 80386 also added an MMU, security modes (called "rings" of privilege - kernel, system services, application services, applications) and new op codes in a fashion similar to the Z-80 (and Z-280).
The 8087 was a floating point coprocessor which helped define the IEEE-754 floating point format and standard operations (the main competition was the VAX floating point format), and was based on an eight element stack of 80-bit values. An instruction prefix (0xE0) indicated coprocessor instructions (similar to the 68000), so the coprocessor could "listen" to the instruction stream, and execute instructions it recognized, without a coprocessor bus.
The 80486 (1989) added full pipelines, single on chip 8K cache, FPU on-chip, and clock doubling versions (like the Z-280). Later, FPU-less 80486SX versions plus 80487 FPUs were introduced - initially these were normal 80486es where one unit or the other had failed testing, but versions with only one unit were produced later (smaller dies and reduced testing reduced costs).
The Pentium (late 1993) was superscalar (up to two instructions at once in dual integer units and single FPU) with separate 8K I/D caches. "Pentium" was the name Intel gave the 80586 version because it could not legally protect the name "586" to prevent other companies from using it - and in fact, the Pentium compatible CPU from NexGen is called the Nx586 (early 1995). Due to its popularity, the 80x86 line has been the most widely cloned processors, from the NEC V20/V30 (slightly faster clones of the 8088/8086 (could also run 8085 code)), AMD and Cyrix clones of the 80386 and 80486, to versions of the Pentium within less than two years of its introduction.
MMX (initially reported as MultiMedia eXtension, but later said by Intel to mean Matrix Math eXtension) is very similar to the earlier SPARC VIS or HP-PA MAX, or later MIPS MDMX instructions - they perform integer operations on vectors of 8, 16, or 32 bit words, using the 80 bit FPU stack elements as eight 64 bit registers (switching between FPU and MMX modes as needed - it's very difficult to use them as a stack and as MMX registers at the same time). The P55C Pentium version (January 1997) is the first Intel CPU to include MMX instructions, followed by the AMD K6, and Pentium II. Cyrix also added these instructions in its M2 CPU (6x86MX, June 1997), as well as IDT with its C6.
Interestingly, the old architecture is such a barrier to improvements that most of the Pentium compatible CPUs (NexGen Nx586/Nx686, AMD K5, IDT-C6), and even the "Pentium Pro" (Pentium's successor, late 1995) don't clone the Pentium, but emulate it with specialized hardware decoders like those introduced in the VAX 8700 and used in a simpler form by the National Semiconductor Swordfish, which convert Pentium instructions to RISC-like instructions which are executed on specially designed superscalar RISC-style cores faster than the Pentium itself. Intel also used BiCMOS in the Pentium and Pentium Pro to achieve clock rates competitive with CMOS load-store processors (the Pentium P55C (early 1997) version is a pure CMOS design).
IBM had been developing hardware or software to translate Pentium instructions for the PowerPC in a similar manner as part of the PowerPC 615 CPU (able to switch between instruction 80x86, 32-bit and 64-bit PowerPC instruction sets in five cycles (to drain the execution pipeline)), but the project was killed after significant development for marketing reasons. Rumor has it that engineers who worked on the project went on to Transmeta corporation.
The Cyrix 6x86 (early 1996), initially manufactured by IBM before Cyrix merged with National Semiconductor, still directly executes 80x86 instructions (in two integer and one FPU pipeline), but partly out of order, making it faster than a Pentium at the same clock speed. Cyrix also sold an integrated version with graphics and audio on-chip called the MediaGX. MMX instructions were added to the 6x86MX, and 3DNow! graphics instructions to the 6x86MXi. The M3 (mid 1998) turned to superpipelining (eleven stages compared to six (seven?) for the M2) for a higher clock rate (partly for marketing purposes, as MHz is often preferred to performance in the PC market), and was to provide dual floating point/MMX/3DNow! units. The Cyrix division of National Semiconductor was purchased by PC chipset maker Via, and the M3 was cancelled. National Semiconductor continued with the integrated Geode low-power/cost CPU.
The Pentium Pro (P6 execution core) is a 1 or 2-chip (CPU plus 256K or 512K L2 cache - I/D L1 cache (8K each) is on the CPU), 14-stage superpipelined processor. It uses extensive multiple branch prediction and speculative execution via register renaming. Three decoders (one for complex instructions (up to four micro-ops), two for simpler ones) each decode one 80x86 instruction into micro-ops (one per simple decoder + up to four from the complex decoder = three to six per cycle). Up to five (usually three) micro-ops can be issued in parallel and out of order (five units - integer+FPU ALU, integer ALU, two address, one load/store), but are held and retired (results written to registers or memory) as a group to prevent an inconsistent state (equivalent to half an instruction being executed when an interrupt occurs, for example). 80x86 instructions may produce several micro-ops in CPUs like this (and the Nx586 and AMD K5), so the actual instruction rate is lower. In fact, due to problems handling instruction alignment in the Pentium Pro, emulated 16-bit instructions execute slower than on a Pentium. The Pentium II (April 1997) added MMX instructions to the P6 core (both ALUs), doubled cache to 32K, and was packaged in a processor card instead of an IC package. The Pentium III added Streaming SIMD Extensions (SSE) to the P6 core (both ALUs), which included eight 128-bit registers which could be used as vectors of four 32-bit integer of floating point values (like the PowerPC AltiVec extensions, but with fewer operations or data types). Unlike MMX (and like AltiVec), the SSE registers need to be saved separately during context switches, requiring OS modifications.
In June 1998, Intel created two sub-brands of P6 CPUs, low cost (Celeron) and server oriented (Xeon). They differed in amount of cache and bus speeds.
The P7 was first released as the Pentium 4 in December 2000. This equivalent to AMD's K7 (see below) was late due to the decision to concentrate on the development of the IA-64 architecture. Intel used two teams for alternating 80x86 designs, the P5 team started work on the P7, originally a 64 bit version like the AMD K8, while the other team worked on the P6. When the 64-bit P7 was changed to the IA-64, the P6 team started on a scaled down P7 after the Pentium III was finished - meanwhile, Intel sold "overclocked" (small quantities able to run at a higher than designed clock rate) P6 CPUs to compete with the AMD K7, then later updated P6 designs.
The P7 extended the pipeline even further to over 20 stages (or 30 during cache misses), stressing clock speed over execution speed (for marketing reasons) - this led to some questionable design decisions. The three decoders are replaced by single decoder and a trace cache - similar in concept to the decoded instruction cache of the AT&T Hobbit, but 80x86 instructions often decode into multiple micro-ops, so mapping the micro-ops to memory is more complex, and instructions are loaded ahead of time using branch prediction. This speeds execution within the cache, but the single decoder limits the external instruction stream to one at a time. Long micro-op sequences are stored in microcode ROM and fed to the dispatch unit without being stored in the cache.
There are seven execution units, one FPU/MMX/SSE, one FP register load/store unit, two add/subtract integer units, one logic (shift and rotate) unit, one load and one store unit. The add/subtract units run at double the clock rate, basically as a two stage pipe, allowing two results within a single clock cycle, meaning up to nine micro-ops could be dispatched each cycle to the seven units, but in practice the trace cache is limited to three per cycle. A slower logic unit replaces two faster address units in the P6, slowing most code. Since the stack-oriented FPU registers are difficult to use for superscalar or out-of-order execution, Intel added floating-point SSE instructions (called SSE2), so that floating point operations can use the flat SSE registers which will make future designs easier, and the old FPU design becomes less important.
The bottlenecks might have been a result of rushing the design, or due to cost. As a result, the P7 executing existing code is actually slower than a P6 at slightly lower clock speed, and much slower than the AMD K7, but the intent of the design was to allow clock speed to be increased enough to make up for the difference. Possibly the bottlenecks will be removed in a future version.
A server (Xeon) version of the P7 (March 2002) introduced vertical multithreading (called "Hyperthreading" by Intel), similar to the IBM Northstar CPU (or Sun MAJC) - the main difference being that the Northstar will wait for a cache miss delay before switching threads, while full multithreading used by Intel always interleaves a small number of threads in the normal execution pipeline. It was later expanded to 64 bits (see AMD K8 below).
AMD was a second source for Intel CPUs as far back as the AMD 9080 (AMD's version of the Intel 8080). The AMD K5 translates 80x86 code to ROPs (RISC OPerations), which execute on a RISC-style core based on the unproduced superscalar AMD 29K. Up to four ROPs can be dispatched to six units (two integer, one FPU, two load/store, one branch unit), and five can be retired at a time. The complexity led to low clock speeds for the K5, prompting AMD to buy NexGen and integrate its designs for the next generation K6.
The NexGen/AMD Nx586 (early 1995) is unique by being able to execute its micro-ops (called RISC86 code) directly, allowing optimized RISC86 programs to be written which are faster than an equivalent x86 program would be, but this feature is seldom used. It also features two 16K I/D L1 caches, a dedicated L2 cache bus (like that in the Pentium Pro 2-chip module) and an off-chip FPU (either separate chip, or later as in 2-chip module).
The Nx586 successor, the K6 (April 1997) actually has three caches - 32K each for data and instructions, and a half-size 16K cache containing instruction decode information. It also brings the FPU on-chip and eliminates the dedicated cache bus of the Nx586, allowing it to be pin-compatible with the P54C model Pentium. Another decoder is added (two complex decoders, compared to the Pentium Pro's one complex and two simple decoders) producing up to four micro-ops and issuing up to six (to seven units - load, store, complex/simple integer, FPU, branch, multimedia) and retiring four per cycle. It includes MMX instructions, licensed from Intel, and AMD has designed and added 3DNow! graphics extensions without waiting for Intel's SSE additions.
AMD aggressively pursued a superscalar (fourteen-stage pipeline) design for the Athlon (K7, mid 1999), decoding x86 instructions into 'MacroOps' (made up of one or two 'micro-ops', a process similar to the branch folding in the AT&T Hobbit or instruction grouping in the T9000 Transputer and the Motorola 54xx Coldfire CPU) in two decoders (one for simple and one for complex instructions) producing up to three MacroOps per cycle. Up to nine decoded operations per cycle can be issued in six MacroOps to six functional units (three integer, each able to execute one simple integer and one address op simultaneously, and three FPU/MMX/3DNow! instructions (FMUL mul/div/sqrt, FADD simple/comparisons, FSTORE load/store/move) with extensive stack and register renaming, and a separate integer multiply unit which follows integer ALU 0, and can forward results to either ALU 0 or 1). The K7 replaces the Intel-compatible bus of the K6 with the high speed Alpha EV6 bus because Intel decided to prevent competitors from using its own higher speed bus designs (Dirk Meyer was director of engineering for the K7, as well as co-architect of the Alpha EV4 and EV6). This makes it easier to use either Alpha or AMD K7 processors in a single design. At introduction, the K7 managed to out-perform Intel's fastest P6 CPU.
Centaur, a subsidiary of Integrated Device Technology, introduced the IDT-C6 WinChip (May 1997), which uses a much simpler (6-stage, 2 way integer/simple-FPU execution) design than Intel and AMD translation-based designs by using micro-ops more closely resembling 80x86 than RISC code, which allows for a higher clock rate and larger L1 (32K each I/D) and TLB caches in a lower cost, lower power consumption design. Simplifications include replacing branch prediction (less important with a short pipeline) with an eight entry call/return stack, depending more on caches. The FPU unit includes MMX support. The C6+ version adds second FPU/MMX unit and 3D graphics enhancements.
Like Cyrix, Centaur opted for a superpipelined eleven-stage design for added performance, combined with sophisticated early branch prediction in its WinChip 4. The design also pays attention to supporting common code sequences - for example, loads occur earlier in the pipeline than stores, allowing load-alu-store sequences to be more efficient.
Cyrix division of National Semiconductor and the Centaur division of IDT were bought by Korean motherboard chipset maker Via. The Cyrix CPU was cancelled, and the Centaur design was given the "Cyrix III" brand instead.
Intel, with partner Hewlett-Packard, developed a next generation 64-bit processor architecture called IA-64 (the 80x86 design was renamed IA-32) - the first implementation was named Itanium. It's was intended to be both compatible in some way with both the PA-RISC and 80x86. This may finally produce the incentive to let the 80x86 architecture finally fade away.
On the other hand, the demand for compatibility will remain a strong market force. AMD announced its intention to extend the K7 design to produce an 80x86 compatible K8 (codenamed "Sledgehammer", then changed to just "Hammer" - variants indicate market segments, such as "Clawhammer" (keeping the Athlon brand name) for desktops, and "Sledgehammer" (named Opteron) for servers). It produced a 64-bit architecture called x86-64, in competition with the Intel IA-64.
When moving from the 80286 to the 80386 (IA-32), Intel took the opportunity to fix some of the least liked features remaining in the previous design. Moving to x86-64, AMD decided to further modernize the design, adding a cleaner 64-bit mode (selected by Code Segment Descriptor (CSD) register bits).
It's based on sixteen 64-bit integer and sixteen 128-bit vector/floating point (XMM) registers (the lower eight registers of each map to the original x86 integer and SSE/SSE2 registers) and the 8087 FPU/MMX registers, with a 64-bit program counter. In 64-bit mode, integer registers are uniform and can be 8-, 16-, 32-, or 64-bit. Address space is changed from mainly segmented to a flat space (keeping data segment registers in 32- or 16-bit sub-modes) with PC relative addressing, although code segments (within the address space) are still used to define the modes for each segment. Older 8086 modes are supported in a separate "legacy" mode. These changes give compilers a larger, more regular register set to use making optimizations easier.
Rumors persisted that Intel was developing a CPU codenamed "Yamhill", originally based on original 64-bit P7 plans dusted off, but then switching to the x86-64 architecture and instruction set (apparently under pressure from Microsoft to avoid creating yet another instruction set to support - ironically making Intel a follower of AMD, after driving 80x86 development from the beginning). Originally it was an unofficial project, then official when performance of the first Itanium disappointed, and K8 popularity exceeded expectations. It was finally released as an "enhanced" Pentium 4 Xeon (March 2004), despite being a new design. The 64-bit capability is designed as a 32-bit add-on (like old bit-slice processors) and is disabled in lower end versions, (much like the low-cost 80486SX FPU was disabled). When enabled, the extended 32 bit pipeline operates 1/2 clock cycle later than the main pipeline.
It has the same registers, addressing modes and extensions as the AMD K8, but is otherwise similar to the 64-bit P7, with the same pipeline, including double-clocked add/subtract units, though fixing the bottlenecks and using larger caches, buffers, etc. In addition, there are separate SSE2 functional units, rather than using the older FPU units for SSE operations as the P7 did.
Other factors were the fact that the the 8-bit 8088 could use existing low cost 8085-type components, and allowed the computer to be based on a modified 8085 design. 68000 components were not widely available, though it could use 6800 components to an extent. After the failure and expense of the IBM 5100 (1975, their first attempt at a personal computer - discrete random logic CPU with no bus, built in BASIC and APL as the OS, 16K RAM and 5 inch monochrome monitor - $10,000!), cost was a large factor in the design of the PC. Strategists were also not eager to have a microcomputer competing with IBM's low end minicomputers.
The availability of CP/M-86 is also likely a factor, since CP/M was the operating system standard for the computer industry at the time. However Digital Research founder Gary Kildall was unhappy with the legal demands of IBM, so Microsoft, a programming language company, was hired instead to provide the operating system (initially known at varying times as QDOS, SCP-DOS, and finally 86-DOS, it was purchased by Microsoft from Seattle Computer Products and renamed MS-DOS).
Digital Research did eventually produce CP/M 68K for the 68000 series, making the operating system choice less relevant than other factors.
Intel bubble memory
was on the market for a while, but faded away
as better and cheaper memory technologies arrived.
Much of this section is taken from Phil Storr's(pstorr@iweb.net.au) page on PC busses, which may be found at http://members.iweb.net.au/~pstorr/pcbook/book2/busses.htm
The IBM AT introduced a 16 bit data bus and the expansion slots had to handle 16 data bits. The industry wanted to be able to use existing 8-bit cards, so the new "AT" slot had to be designed to be backward compatible with the PC slots. The AT extension connector was added to the end of the 62 pin edge connector of the original 8-bit bus slot. This extension is a 36 pin edge connector. This bus slot was later given the name Industry Standard Architecture (ISA) and has survived to this day. One important aspect of this bus was that IBM never made any specification about bus speeds.
In the original 6MHz IBM AT, and the subsequent 8MHz version, the bus simply ran along at the same speed as the CPU. It was not surprising that as clone vendors started looking for a marketing edge over IBM, they simply kept the bus running at the CPU speed as they boosted speeds to 10MHz, 12MHz, and even faster. This lead to problems with users starting to run into problems. Boards that ran fine in a 6 or 8 MHz computer were not reliable in faster ones. The problem was especially severe with network cards. It turned out that they couldn't run at these higher clock speeds. The industry eventually settled on 8MHz as the standard maximum clock speed and the name Industry Standard Architecture.
One answer was to put the system memory on a local bus with the processor on the system board. The memory could be connected directly to the processors data bus and have no buffer devices between it and the processor. This way it could be 32 bits wide and accessed at the processors clock speed. At this stage in the development of RAM technology the industry was still using DIL package RAM chips of 256k bits or one Meg bit capacity and it took a lot of system board real estate to fit in more than a few megabytes of RAM.
Many companies decided to make special 32-bit expansion slots for proprietary memory boards that could be added later. This is where we can learn a lesson - many owners of computers with these system boards soon discover that they could not find these proprietary boards for their computers only a few months after they purchased the computer. Many manufacturers realized that a standard 32-bit bus was a better answer than many proprietary designs, and if it could run at the processor bus speed that would be even better.


Not only would the video system benefited from a faster/wider bus, faster hard drives and hard drive interfaces and network interface cards had outgrown the ISA bus. One solution was to design a faster bus for video and other components. Bringing the Bus Slot speed up to the then typical Bus Clock speed of 33 MHz, would provide a four-fold increase in data transfer rate. Double the width of the data bus from 16 to 32 bits, and the transfer rate could be up to eight times that of the ISA bus.
Some designers started by simply wiring video circuitry into the CPU bus on system boards. The system board already had a "local bus" between the processor and it's RAM and this could be extended to include the video interface. This provided speed gains, but at the cost of flexibility. If you wished to upgrade the Video System all you could do was to disable the video on the system board, and resort to an ISA card in a bus slot.
The next solution was a throwback to the proprietary 32-bit memory cards of the early 386 systems. Designers created their own unique solutions for local bus video slots. This approach left the buyer dependent on the original vendor to develop and offer new video options as technologies change and improve and at the rate of development of PC hardware, this usually never happened.

The problem was first solved by the Video Electronics Standards Association. This is the group that made sense out of the mayhem that occurred when vendors tried to go beyond IBM's original specification for VGA. When you wanted to run a system at higher than VGA's original specification of 640 by 480 you had to get drivers that worked with your application programs and hardware.
The VESA standards for Super VGA signal timing and resolutions sorted out much of this trouble. The committee set some basic goals for a local bus specification. It had to be low cost, based on existing technology and system chip sets as much as possible. It had to offer significantly higher performance, handling not only the present data transfer loads, but the additional traffic expected from even higher resolution displays and multimedia applications. It had to be an open standard, so anyone could use it, and it had to be software transparent, so you would not need to use any troublesome drivers. It also should be also extendible to handle future technology, such as the Pentium processor with its 64-bit data path.
The result was the VESA-Bus specification. This set forth the basic characteristics of the bus, such as mechanical, physical, timing, and protocol details. For maximum flexibility, it was designed in such a way that it could easily be added to ISA, EISA, or Micro Channel system boards. To keep the design simple, the committee designed the VESA-Bus as an extension of the internal bus used within the 80486 processor. As a result, the VESA-Bus could use the full address range of the 486 chip.
Many VESA slot equipped computer systems used a VESA, IDE/FDC/SPG interface card. The IDE interface on this card was the only part of the card that used the VESA-bus slot. The Floppy Disk Controller and the SPG functions still used the ISA portion of the slot.
Local Bus devices can be implemented with devices either integrated into the system board, or plugged into an expansion slot. The problem with integrated devices is the higher bus clock speeds push technology to its engineering limits. As the signals travel around the traces of printed circuit boards faster and faster, it is more and more difficult to maintain accurate timings. If an electrical signal is slowed too much on the way to its destination, then critical events may not take place at the correct instant, and processing crashes to a halt.
The faster the CPU runs, the smaller the load it can handle (the load on it's outputs). Sending a signal through an expansion slot rather than to a device located on the system board adds to the load on the bus. The VESA committee recommended only two VESA-Bus slots and two VESA-Bus devices (system board mounted devices) with a 33MHz (or slower) bus speed, one slot at 40MHz, and no slots at all for a 50MHz bus speed.


The PCI-Bus (Peripheral Component Interconnect) was originally designed to speed up the display of graphics on Intel-based personal computers, but the standard itself is processor independent and suitable for other hardware add-ons that require high bandwidth, including network, video and SCSI adaptors. PCI was developed by INTEL but it did take some time to get it to work reliably. By the middle of 1993 the VESA-Bus became firmly entrenched in the market place and almost all DOS computer systems had VESA-Bus slots as standard. The wide acceptance of local bus technology only took a few months and by default, VESA-Bus become the first Local Bus standard.
For a while, many people in the computer industry saw a local-bus war between the two competing local-bus standards (VESA-Bus and PCI-Bus) but in reality they were not in the same battlefield. The PCI and VESA Local-Busses did basically the same thing - both speed up PC computers by letting peripherals like graphics adaptors and hard disk controllers run at up to 33MHz, instead of the 8MHz that the ISA-Bus limited them to. The similarity breaks down when we start talking about how the two designs work.
The VESA-Bus bypassed the ISA bus by using the same bus the CPU is connected to it's RAM memory by and so it was relatively cheap and easy for system and peripheral makers to implement. Intel's PCI-Bus on the other hand, was a whole new bus, in much the same way the EISA and MCA busses were. The PCI bus gave only a slight speed improvement when used with 486 based systems, but it was far ahead when used with the Pentium chip.
The PCl-Bus uses three elegant techniques to resolve local bus problems. The first, known as reflective wave signaling, reduces the amount of electrical amplification required on the signal paths and thus reduces noise and loading problems. The second is multiplexing. Multiplexing allows two different signals to use the same electrical path, reducing the number of pins required for peripheral chips and lowering manufacturing costs. The third is a protocol letting the PCl controller receive specific configuration information from the PCl devices themselves. Intel did not defined a standard adaptor connector for the bus, leaving that job up to a PCl-Bus special-interest group who settled on the white 112 pin connector.
Other computer manufacturers are also using the PCI-Bus in there computer platforms with Digital Equipment Corp. (DEC) with their Alpha RISC-based systems, and Hewlett-Packard and SUN Microsystems all including PCI-Bus slots in there products. Intel licensed its patents on the PCI Bus free of royalties to all who wished to use it.
By adopting an established industry standard the manufacturers of the other computer platforms are ensuring lower costs and more options for both users and developers who are no longer locked into their own proprietary options. The wide range of cards that have followed the use of the PCI-Bus on PC systems are available for the first time to users of other hardware. All that should be required is alternative driver software for the various platforms.
Many combinations of the various buses that have been available over the years are possible and some system board manufacturers produced boards with combinations of ISA, EISA, MCA, VESA and PCI-Bus. This was to allow users to make use of older exotic cards such as SCSI controllers and hardware cache boards in upgraded equipment.
Most system boards available today still have two ISA bus slots but there are PCI bus slot only boards, and EISA and PCI only boards available.
| Bus type | Bus data width | Bus speed | Data transfer rate |
|---|---|---|---|
| PC/XT | 8 bits | 4.7 - 8 MHz | 3.25 (Mbits/Sec) |
| ISA | 16 bits | 8 MHz | 6.5 (Mbits/Sec) |
| EISA | 32 bits | 8 MHz | 32 (Mbits/Sec) |
| MCA | 32 bits | 8 MHz | 20 (Mbits/Sec) |
| VESA | 32 bits | 33 MHz to 50 MHz | 132 (Mbits/Sec) and above |
| PCI | 32 bits | 33 MHz | 132 (Mbits/Sec) |
This memory is referred to as AGP Memory. AGP in theory allows a peak data transfer rate of up to 528 Mbytes/second between the PC's main memory and the AGP graphics accelerator, compared to a transfer rate of only 132 Mbytes/second attainable by today's PCI bus. Doubts exist about this claim because this figure is the whole bandwidth of main memory and it has to be shared with CPU and other devices. AGP may never be able to get a throughput of 528 MB/s, but the trend to 100MHz bus speeds will speed main memory transfers and make this more likely.
Like most other modern PC developments, the chipset has to provide services for the AGP bus, in particular, the function to map the 'AGP memory' to normal main memory. Intel calls this GART (Graphics Address Remapping Table). This means the Video Interface can use some of the System Memory rather than having dedicated Video RAM on the card.
The benefits AGP is offering:
Software Considerations
Unfortunately, getting an AGP board plus an AGP graphic accelerator won't be enough to take advantage of AGP's new performance. The operating system has to take care of particularly the DIME/GART part of the AGP benefits. The Operating System has to provide main memory for the AGP RAM. This is achieved via DirectDraw in Windows98 and Windows NT 5.
Example of a Pentium II System Board with an AGP socket
Serial Ports can be used for:
| The history of the UART chip Over the years, since the introduction of the DOS computer, three types of UARTS have been used in this hardware. The first was the 8250 chip, this was followed by the 16450 chip, and then the 16550 chip. |
|
The 8250 chip was used in the Serial Ports of PC or XT computers, and the 16450 in the Serial Ports of 286 (AT) and then 386, and 486 machines, until early 1995. Over the years the maximum data rate provided by devices connected to the Serial Ports has been steadily rising. Back in 1987 a 2.4Kbits/second Telephone Modem was considered fast. The most cost effective Telephone Modems today are transferring data at as fast as 56Kbits/sec with 33.6Kbits/sec modems being phased out rapidly. The Serial Ports must keep up with the modem and therefore the UART must be faster than the modem.
UART stands for UNIVERSAL ASYNCHRONOUS RECEIVER TRANSMITTER
The main job of a UART is to convert the computer's parallel data from the bus into a serial flow for transmission and when information is being received, the UART collects it into bytes (8 bits) and passes those bytes onto the bus. The UART provides the shift registers for parallel to serial and for serial to parallel conversions and all the Flow Control (hand shaking) required to control the flow of data to and from the computer and some other device.
What does the UART chip look like ?
The 16550 is otherwise identical to its predecessors and so it can be used as a 16450 or 8250 replacement. The older I/O cards found in DOS computer hardware had a 40 pin DIL UART chip mounted on a socket and so it would be simple to upgrade by just replacing the chip. A 40 pin 16550 chip usually costs more than a new I/O card and so it is not economical to replace the UART chip on old I/O cards. Some SPG cards had a 16450 chip soldered in, making it almost impossible to replace it. Another chip you will sometimes find on old I/O cards is the 16451. This chip is a 16450 with a Parallel Interface as well as the UART.
Starting with the SPG and IDE/FDC-SPG cards used in 386 and 486 hardware, the UARTS were in a chip called an ASICS chip. This was a custom VLSI chip that contained 2 UARTs, the Parallel Port, Games Port, and often the Floppy Disk Drive Controller.
Modern PC hardware has the UARTS built into the System Boards chip-set and all these provide 16550 type UARTS, capable of data rates in excess of 100Kbits/second. Diagnostic software is available that will detect the type of UARTs fitted.
The TTL logic levels from the UART device require Line Driver chips to convert the output signals to RS232 levels, and other Line Driver chips to convert the input signal RS232, levels back to TTL logic levels. Today these drivers are often built into the ASICS chip or the chip-set but the 1488 and 1489 line driver devices were used for many years. The line drivers require + and - 12 to 15 Volts supplies. Alternative Line Driver Chips are available that generate the + and - 12 Volts inside the chip and these are used in note-book type computers. The Line Driver Chips often fail due to near lightning strikes and ground potential faults.
The I/O assignments used for the Serial Ports:The Serial Port requires a small range of I/O addresses and an IRQ line. The original DOS assignments were like this. |
|
| With the introduction of DOS version 3.1 provision was made to have two more Serial Ports and the resources assigned to these were: |
|
The problem with the extra two Serial Ports is that they do not have unique IRQ lines assigned to them and some hardware and/or software is not good at sharing such resources. Specialized Serial Interface Cards are available that provide four or eight Serial Ports and these are intended for use in UNIX systems and they may not have driver software for DOS systems. These cards are often used for "point of sale" computers in installations like Service Stations and Supermarkets.
If you wished to use a printer with a serial interface instead of one with a parallel interface, When running DOS applications, you would have to add these 2 lines to the AUTOEXEC.BAT file. This is not required when using a MSWindows Operating System.
MODE COM1: 9600,N,8,1,P
MODE PRN = COM1: (The alternative to this line is MODE LPT1: = COM1:)
The actual values in the first line above will depend on the parameters required by the printer. This information is obtained from the printer handbook and will be similar to the listing below.
Bit rates with the latest UART devices used in PC computers can be far higher than 9600, with speeds of 19.2K, 28.8K, 33.6K and 56K being used today.
The "P" tells the Serial Port Service Routine to wait for the device on the other end, do not time-out after a predetermined time. This is necessary if a slow device like a printer is connected.
| While each of the four Serial Ports defined here has it's own unique I/O addresses, only two IRQ's are assigned. PC hardware and software is not good at sharing IRQ lines and so the above assignments may lead to problems with devices interacting with one another in a way that hinders the proper operation of one or both devices, sharing the same IRQ line. This problem should be overcome with the full introduction of Plug and Play technology but until that happens, to both the hardware and the operating systems, we will have possible trouble with providing more than two Serial (Communication) Ports. |
|
While each of the four Serial Ports defined above has it's own unique I/O addresses, only two IRQ's are assigned. PC hardware and software is not good at sharing IRQ lines and so the above assignments may lead to problems with devices interacting with one another in a way that hinders the proper operation of one or both devices, sharing the same IRQ line. This problem should be overcome with the full introduction of Plug and Play technology but until that happens, to both the hardware and the operating systems, we will have possible trouble with providing more than two Serial (Communication) Ports.
Why would you want more than two Serial Ports ?
Serial Ports are used for a wide range of I/O functions and computers used in CAD/CAM installations for example may have a Mouse, a Digitizer, a Plotter and a Modem fitted. The Mouse can make use of the PS/2 Mouse port fitted to most modern System Boards, but this uses one of the Available IRQ's and still leaves us with three Serial Ports required.
Another situations where more than two Serial Ports may be required is where multiple modems are required or devices like Bar Code Readers are in use. This is common if a PC is used as a "point of sale" terminal, and with the cost of PC being so low, they are often the most cost effective way of providing these facilities.
Overcoming the problem
The easiest way to overcome the lack of IRQ assignments is to change either COM1 or COM3 and COM2 or COM4 to alternative IRQ lines. Modern PC Computer hardware has the Serial Ports provided by the Chipset built into the System Board. You can change the I/O addresses and IRQs assigned to these ports from the CMOS setup routines. Many older SPG and FDC/IDE-SPG cards had jumpers to select the I/O addresses and IRQ lines for each port and to turn each I/O function off. You could provide extra Serial Ports, selecting alternative I/O addresses and IRQs, using one of these cards. Only one problem, remember most of these cards were fitted with 16450 UARTs, too slow for modern Telephone Modems. You would have to use these other Serial Ports for devices like the Mouse, and a Digitizer.
Serial Port cards are available that provide one or two extra Serial Ports at all possible I/O addresses and IRQs via jumpers or some form of Soft-setup, but these are quite expensive for what they are.
Possible available IRQ assignments
The IRQ lines are not usually required by the Parallel Ports and will usually be available for other uses. IRQ5 was used for the Hard Disk Controller in an XT type (8 bit bus) computer but is available in modern PC Computers. This means IRQ5 and IRQ7 may be available.
Common default assignments to look out for
Which IRQ lines may be available ?
From what I have said above it is clear you may have IRQ9, IRQ5 and/or IRQ7 available for use with COM3 and COM4. Watch out for what IRQ9 is actually called, IRQ9 in a 16 bit bus DOS computer is wired to the IRQ2 Bus pin, the real IRQ2 is used to cascade a second interrupt controller device. The IRQ9 input on this second device is wired to IRQ2's place on the ISA bus. Windows wants to call this hardware interrupt IRQ9 but some software is quite happy if it is called IRQ2.
In the past IRQ10, IRQ11, IRQ12 and IRQ15 have been available but recent advances in PC technology have lead to these being assigned to standard uses. IRQ10 is often used by Sound Cards, IRQ11 by Network Interface cards, and IRQ12 is used if the PS/2 Mouse Port fitted to most System Boards, is in use.
With the introduction of the second IDE interface channel these days, usually used for interfacing to a CDROM Drive, IRQ15 is assigned to this channel and is no longer available.
How to make use of these extra COM ports
Modern GUI Operating Systems have support for almost any combination of I/O address and IRQ built in and some DOS software packages have a facility to configure the COM ports for I/O address and IRQ line.
The Parallel Port I/O address assignment
Three addresses are available to the Parallel Ports and at boot-up, the setup routines in the BIOS ROM look for Parallel Ports on the I/O bus, and assigns the LPT numbers, from LPT 1, in this order :-
Officially LPT1 uses I/O address 0378 to 037A but when the BIOS setup routine is looking for Parallel Ports it assigns the first one it finds (in the order given above) as LPT1. The address 03BC to 03BE was first provided by a Parallel Port on IBMs Mono Display Adaptor Video Card but today it is quite common to find this address available on Parallel Port hardware.
The Parallel Ports are assigned an IRQ line as follows.
In the eight bit PC computer (PC or PC/XT type) IRQ 7 was assigned to both LPT 1 and LPT 2 but in later generation hardware IRQ 5 is assigned to LPT 2.Port IRQ LPT 1 IRQ 7 LPT 2 IRQ 7 or IRQ 5
The IRQ line is not usually used by software communicating with the LPT Ports and so IRQ 7 and IRQ 5 is usually available for other I/O functions. This means IRQ 7 and IRQ5 can be used for some other I/O function. Sound Cards as a rule use either IRQ5, 7 or 10 as the default IRQ.
Parallel Ports can be used for:
The Parallel Port Standard is based on the Centronics Parallel Interface Standard but it has been modified to be bidirectional. Some older Parallel Port hardware in some DOS type computers are not fully bidirectional and these will not work some devices such as Pocket Hard Drives and Tape Backup Drives. The standard Parallel Cable has a DB25P (plug) on the computer end (a socket is used on the computer) and a 36 pin Centronics plug on the printer end. The cable should be shielded and should be no longer than 3 meter. When ASICS chips were first used to provide the Parallel Port, some of these had trouble driving long cables (over 3 m) because they had LSI outputs rather than TTL outputs and they did not like high capacitance loading.
By 1994 this development was getting out of hand, and so the IEEE set down standard modes of operation for the Parallel Port, in an document with the title IEEE 1284-1994, Standard Signaling Method for a Bi-directional Parallel Interface for Personal Computers. Before this time there were no set standards as to how the Parallel Port should behave when connected to devices such as Printers, Scanners External Disk Drives etc. The IEEE defined five modes of operation. These modes take care of the various types of hardware that have developed over the years since the PC Computer was released.
This IEEE specification is aimed at standardizing the behavior between a PC Computer and an attached device. Although the specification deals mainly with Printers, devices like SCSI Adaptors, CDROM, High Capacity Disk Drive and Tape Backup Adaptors, Optical Scanners and simple LAN interfaces are also covered to some extent.
The Unidirectional (4 bit) Port was capable of data transfer rates of 40 to 60 KB/s in the reverse direction and up to 140 KB/s in the forward direction.
The Bi-directional Parallel Port opened up the way for eight bit communications between the computer and peripheral devices across the Parallel I/O Port. This was done by redefining some unused pins in the Parallel (Centronics) connector, and by defining a Status Bit, used to indicate which direction data was traveling across the interface.
The IEEE incorporated the EPP standard into its document 1284-1994 but because some minor changes they made to the 1992 version of the standard, we now have two incompatible standards for EPP. There is the original EPP Standards Committee version 1.7, and the IEEE 1284 version. Because the differences were only minor, new peripherals can be designed to cope with the two variations, but older peripherals made to the original EPP 1.7 standard may not work with the newer IEEE 1284 ports.
Another feature of ECP is a real time data compression. It uses Run Length Encoding (RLE) to achieve data compression ratio's up to 64:1. This comes is useful with devices such as Optical Scanners and Printers where a good part of the data is long strings which are repetitive.
The Extended Capabilities Port supports a method of channel addressing. This is not intended to be used to daisy chain devices but rather to address multiple devices within one device. Such an example is some of the latest Fax machines on the market. They can be connected to a computer via a Parallel Port and can operate as separate devices such as the Scanner, Modem/Fax and Printer, where each part can be addresses separately, even if the other devices cannot accept data due to full buffers.
As originally designed, the Control lines were used as Interface Control and Flow Control (handshaking) signals from the PC to the printer. The Status lines were used for Flow Control signals and as Status Indicators for such things as paper empty, busy indication and interface or peripheral errors. The data lines were used to provide data from the PC to the printer, in that direction only. As we have already said, later implementations of the Parallel Port allowed for data to be driven from the peripheral to the PC.
The original Parallel Interface Port used open collector TTL devices on each side of the interface and these can be damaged by ESD.
The Parallel Ports in modern PC hardware use V.L.S.I. devices that are not open collector devices and these are also easy to damage by ESD. These outputs often do not conform to the TTL standards, and they may have trouble driving older printers, long cables, and external signal-powered devices.
| Line name | DB25S | 36 pin Centronics | Notes |
|---|---|---|---|
| Strobe | 1 | 1 | a 1 usec pulse used to clock data into the printer |
| Data 0 | 2 | 2 | |
| Data 1 | 3 | 3 | |
| Data 2 | 4 | 4 | |
| Data 3 | 5 | 5 | |
| Data 4 | 6 | 6 | |
| Data 5 | 7 | 7 | |
| Data 6 | 8 | 8 | |
| Data 7 | 9 | 9 | |
| Acknowledge | 10 | 10 | acknowledge signal from printer to computer |
| Busy | 11 | 11 | used by the printer to stop the flow of data |
| Paper Empty | 12 | 12 | indicates the printer has run out of paper |
| Select Out | 13 | 13 | indicates the printer is "on line" |
| Auto Feed | 14 | 14 | not often implemented - wired to ground |
| Error | 15 | 32 | indicates a fault in the printer (motor jammed etc) |
| Initialization | 16 | 31 | clears the printers buffers and resets defaults |
| Select input | 17 | 36 | a signal on this line is the same as "select button" |
| Ground | 20 to 25 | 18 to 25, 16, 19 to 30, 33 | 18 to 25 are paired with the Data wires pins 2 to 9 as shields |
Note - the original specification included plus 5 volt on pin 18 and a "clock signal" from pin 15.
The IEEE 1284 standard specifies 3 different connectors for use with the Parallel Port. The first one (1284 Type A) is the DB25 connector found on the back of most computers, and the second is the (1284 Type B) 36 pin Centronics Connector found on most printers. The third, the IEEE 1284 Type C connector, is also a 36 conductor connector like the Centronics, but it is much smaller. IEEE 1284 Type C also defines two more pins for signals which can be used to see whether the other device connected via it, has power applied.