SlideShare una empresa de Scribd logo
1 de 34
DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY 
(POLYTECHNIC) 
BANGLORE 
Affilliated to A.I.C.T.E 
NEW D ELHI 
COMMUNICATION AND ANALYSIS SKILL D EVELOPMENT PROAMGRME 
REPORT ON 
“MICROPROCESSOR ” 
Submitted for the partial fulfillment of the requirement for the award of 
5th sem diploma in electr onics and communication 
By 
Amiyodyuti ganguly (305EC10009) 
D EPARTMENT OF ELECTRONICS AND COMMUNICATION
DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY 
SHAVIGE MALLESHWARA HILLS, KUMARASWAMY LAYOUT, BANSHANKARI 
BANGALORE-560078 
DEPARTMENT OF ELECTRONICS AND COMMUNICATION 
CERTIFICATE 
This is to certify that the CASP seminar entitled 
“MICROPROCESSOR” 
Has been completed successfully 
Submitted By 
Amiyodyuti ganguly (305EC10009) 
It is by the confinement Board of Technical Education 
GOVERNMENT OF KARNATAKA 
This a confide work carried out by him in 
DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY 
During the year 2014-2015 under the guidance of 
Mr s.Muyeen r eshma 
GUIDE: PRINCIPAL:
ACKNOWLEDGEMENT 
This would be incomplete if I don’t take the opportunity to sincerely thank all 
the people before the successful completion of the project. 
We gratefully thank our principal Prof. PREM KUMAR of Dayananda Sagar 
Institute of Technology for providing this opportunity to work on this project. 
They have been the constant source inspiration for us throughout this project. 
We heartily thank them for their encouragement and support. 
We would like thank our internal guide Mrs. MUYEEN RESHMA for her valuable 
guidance and advice that has enabled us for the successful completion for this 
project. 
We would like to thank for Head of the Department and all other staff 
members for their moral support.
CONTENTS 
1. INTRODUCTION 
2. STRUCTURE 
3. SPECIAL-PURPOSE DESIGN 
4. EMBEDDED APPLICATI0NS 
5. HISTORY 
6. CADC 
7. TMS 1000 
8. INTEL 4004 
9. PICO/GENERAL INSTRUMENT 
10. FOUR-PHASE SYSTEMS AL 1 
11. 8-BIT DESIGNS 
12. 12-BIT DESIGNS 
13. 16-BIT DESIGNS 
14. 32-BIT DESIGNS 
15. 64-BIT DESIGNS IN PC 
16. MULTI-CORE DESIGNS 
17. RISC 
18. MICROARCHITECTURE 
19. ARITHMETIC LOGIC UNIT 
20. CENTRAL PROCESSING UNIT
ABSTRACT 
Microprocessor 
Intel commercial microprocessor 4004, the first 
A microprocessor incorporates the functions of a computer's central processing unit (CPU) 
on a single integrated circuit (IC), or at most a few integrated circuits. All modern CPUs are 
microprocessors making the micro- prefix redundant. The microprocessor is a multipurpose, 
programmable device that accepts digital data as input, processes it according to 
instructions stored in its memory, and provides results as output. It is an example 
of sequential digital logic, as it has internal memory. Microprocessors operate on numbers 
and symbols represented in the binary numeral system. 
The integration of a whole CPU onto a single chip or on a few chips greatly reduced the cost 
of processing power. The integrated circuit processor was produced in large numbers by 
highly automated processes, so unit cost was low. Single-chip processors increase reliability 
as there are many fewer electrical connections to fail. As microprocessor designs get faster, 
the cost of manufacturing a chip (with smaller components built on a semiconductor chip the 
same size) generally stays the same. 
Before microprocessors, small computers had been implemented using racks of circuit 
boards with many medium- and small-scale integrated circuits. Microprocessors integrated 
this into one or a few large-scale ICs. Continued increases in microprocessor capacity have 
since rendered other forms of computers almost completely obsolete with one or more 
microprocessors used in everything from the smallest embedded systems and handheld 
devices to the largest mainframes and supercomputers.
STRUCTURE 
A block diagram of the internal architecture of the Z80 microprocessor, showing the 
arithmetic and logic section, register file, control logic section, and buffers to external 
address and data lines 
The internal arrangement of a microprocessor varies depending on the age of the design 
and the intended purposes of the processor. The complexity of an integrated circuit is 
bounded by physical limitations of the number of transistors that can be put onto one chip, 
the number of package terminations that can connect the processor to other parts of the 
system, the number of interconnections it is possible to make on the chip, and the heat that 
the chip can dissipate. Advancing technology makes more complex and powerful chips 
feasible to manufacture. 
A minimal hypothetical microprocessor might only include an arithmetic logic unit (ALU) and 
a control logic section. The ALU performs operations such as addition, subtraction, and 
operations such as AND or each operation of the ALU sets one or more flags in a status 
register, which indicate the results of the last operation (zero value, negative number, 
overflow, or others). The control logic section retrieves instruction operation codes from 
memory, and initiates whatever sequence of operations of the ALU requires carrying out the 
instruction. A single operation code might affect many individual data paths, registers, and 
other elements of the processor. 
As integrated circuit technology advanced, it was feasible to manufacture more and more 
complex processors on a single chip. The size of data objects became larger; allowing more 
transistors on a chip allowed word sizes to increase from 4 and 8-bit words up to today’s 64- 
bit words. Additional features were added to the processor architecture; more on-chip 
registers sped up programs, and complex instructions could be used to make more compact 
programs. Floating-point arithmetic, for example, was often not available on 8-bit 
microprocessors, but had to be carried out in software. Integration of the floating point 
unit first as a separate integrated circuit and then as part of the same microprocessor chip, 
sped up floating point calculations. 
Occasionally, physical limitations of integrated circuits made such practices as a bit 
slice approach necessary. Instead of processing all of a long word on one integrated circuit, 
multiple circuits in parallel processed subsets of each data word. While this required extra 
logic to handle, for example, carry and overflow within each slice, the result was a system 
that could handle, say, 32-bit words using integrated circuits with a capacity for only four bits 
each.
With the ability to put large numbers of transistors on one chip, it becomes feasible to 
integrate memory on the same die as the processor. This CPU cache has the advantage of 
faster access than off-chip memory, and increases the processing speed of the system for 
many applications. Processor clock frequency has increased more rapidly than external 
memory speed, except in the recent past, so cache memory is necessary if the processor is 
not delayed by slower external memory. 
SPECIAL PURPOSE DESIGN 
A microprocessor is a general purpose system. Several specialized processing devices have 
followed from the technology. Microcontrollers integrate a microprocessor with peripheral 
devices in embedded systems. A digital signal processor (DSP) is specialized for signal 
processing. Graphics processing units may have no, limited, or general programming 
facilities. For example, GPUs through the 1990s were mostly non-programmable and have 
only recently gained limited facilities like programmable vertex shades. 
32-bit processors have more digital logic than narrower processors, so 32-bit (and wider) 
processors produce more digital noise and have higher static consumption than narrower 
processors. So 8-bit or 16-bit processors are better than 32-bit processors for system on a 
chip and microcontrollers that require extremely low-power electronics, or are part of 
admixed with noise-sensitive on-chip analog electronics such as high-resolution analog to 
digital converters, or both. 
When manufactured on a similar process, 8-bit micros use less power when operating and 
less power when sleeping than 32-bit micros. 
However, some people say a 32-bit micro may use less average power than an 8-bit micro, 
when the application requires certain operations, such as floating-point math, that take many 
more clock cycles on an 8-bit micro than a 32-bit micro, and so the 8-bit micro spends more 
time in high-power operating mode. 
EMBEDDED APPLICATION 
Thousands of items that were traditionally not computer-related include microprocessors. 
These include large and small household appliances, cars (and their accessory equipment 
units), car keys, tools and test instruments, toys, light switches/dimmers and electrical circuit 
breakers, smoke alarms, battery packs, and hi-fi audio/visual components (from DVD 
players to phonograph turntables). Such products as cellular telephones, DVD video system 
and HDTV broadcast systems fundamentally require consumer devices with powerful, low-cost, 
microprocessors. Increasingly stringent pollution control standards effectively require 
automobile manufacturers to use microprocessor engine management systems, to allow 
optimal control of emissions over widely varying operating conditions of an automobile. Non-programmable 
controls would require complex, bulky, or costly implementation to achieve 
the results possible with a microprocessor. 
A microprocessor control program (embedded software) can be easily tailored to different 
needs of a product line, allowing upgrades in performance with minimal redesign of the
product. Different features can be implemented in different models of a product line at 
negligible production cost. 
Microprocessor control of a system can provide control strategies that would be impractical 
to implement using electromechanical controls or purpose-built electronic controls. For 
example, an engine control system in an automobile can adjust ignition timing based on 
engine speed, load on the engine, ambient temperature, and any observed tendency for 
knocking—allowing an automobile to operate on a range of fuel grades. 
HISTORY 
The advent of low-cost computers on integrated circuits has transformed modern society. 
General-purpose microprocessors in personal computers are used for computation, text editing, 
multimedia display, and communication over the Internet. Many more microprocessors are part 
of embedded systems, providing digital control over myriad objects from appliances to 
automobiles to cellular phones and industrial process control. 
The first use of the term "microprocessor" is attributed to via tron Computer Systems describing 
the custom integrated circuit used in their System 21 small computer system announced in 1968. 
Intel introduced its first 4-bit microprocessor 4004 in 1971 and its 8-bit microprocessor 8008 in 
1972. During the 1960s, computer processors were constructed out of small and medium-scale 
ICs—each containing from tens of transistors to a few hundred. These were placed and soldered 
onto printed circuit boards, and often multiple boards were interconnected in a chassis. The large 
number of discrete logic gates used more electrical power—and therefore produced more heat— 
than a more integrated design with fewer ICs. The distance that signals had to travel between 
ICs on the boards limited a computer's operating speed. 
In the NASA Apollo space missions to the moon in the 1960s and 1970s, all onboard 
computations for primary guidance, navigation and control were provided by a small custom 
processor called "The Apollo Guidance Computer". It used wire wrap circuit boards whose 
only logic elements were three-input NOR gates. 
The first microprocessors emerged in the early 1970s and were used for electronic calculators, 
using binary-coded decimal (BCD) arithmetic on 4-bit words. Other embedded uses of 4-bit and 
8-bit microprocessors, such as terminals, printers, various kinds of automation etc., followed 
soon after. Affordable 8-bit microprocessors with 16-bit addressing also led to the first general-purpose 
microcomputers from the mid-1970s on. 
Since the early 1970s, the increase in capacity of microprocessors has followed Moore's law; this 
originally suggested that the number of components that can be fitted onto a chip doubles every 
year. With present technology, it is actually every two years, and as such Moore later changed 
the period to two years.
CADC 
In 1968, Garrett Air Research (which employed designers Ray Holt and Steve Geller) was 
invited to produce a digital computer to compete with electromechanical systems then under 
development for the main flight control computer in the US Navy's new F-14 Tomcat fighter. 
The design was complete by 1970, and used a MOS-based chipset as the core CPU. The 
design was significantly (approximately 20 times) smaller and much more reliable than the 
mechanical systems it competed against, and was used in all of the early Tomcat models. 
This system contained "a 20-bit, pipelined, parallel multi-microprocessor". The Navy refused 
to allow publication of the design until 1997. For this reason the CADC, and 
theMP944 chipset it used, are fairly unknown. Ray graduated California Polytechnic 
University in 1968, and began his computer design career with the CADC. From its 
inception, it was shrouded in secrecy until 1998 when at Holt's request, the US Navy allowed 
the documents into the public domain. Since then people have debated whether this was the 
first microprocessor. Holt has stated that no one has compared this microprocessor with 
those that came later. According to Parab et al. (2007), "The scientific papers and literature 
published around 1971 reveal that the MP944 digital processor used for the F-14 Tomcat 
aircraft of the US Navy qualifies as the first microprocessor. Although interesting, it was not a 
single-chip processor, as was not the Intel 4004 – they both were more like a set of parallel 
building blocks you could use to make a general-purpose form. It contains a CPU, RAM, 
ROM, and two other support chips like the Intel 4004. It was made from the same P-channel 
technology, operated at military specifications and had larger chips -- an excellent 
computer engineering design by any standards. Its design indicates a major advance over 
Intel, and two year earlier. It actually worked and was flying in the F-14 when the Intel 4004 
was announced. It indicates that today’s industry theme of converging DSP-microcontroller 
architectures was started in 1971. This convergence of DSP and 
microcontroller architectures is known as a digital signal controller. 
TMS 1000 
The Smithsonian Institution says TI engineers Gary Boone and Michael Cochran succeeded 
in creating the first microcontroller (also called a microcomputer) and the first single-chip 
CPU in 1971. The result of their work was the TMS 1000, which went commercial in 1974 TI 
stressed the 4-bit TMS 1000 for use in pre-programmed embedded applications, introducing 
a version called the TMS1802NC on September 17, 1971 that implemented a calculator on a 
chip. 
TI filed for a patent on the microprocessor. Gary Boone was awarded U.S. Patent 
3,757,306 for the single-chip microprocessor architecture on September 4, 1973. In 1971 
and again in 1976, Intel and TI entered into broad patent cross-licensing agreements, with 
Intel paying royalties to TI for the microprocessor patent. A history of these events is 
contained in court documentation from a legal dispute between Cyrix and Intel, with TI 
as intervener and owner of the microprocessor patent.
A computer-on-a-chip combines the microprocessor core (CPU), memory, and I/O 
(input/output) lines onto one chip. The computer-on-a-chip patent, called the "microcomputer 
patent" at the time, U.S. Patent 4,074,351, was awarded to Gary Boone and Michael J. 
Cochran of TI. Aside from this patent, the standard meaning of microcomputer is a computer 
using one or more microprocessors as its CPU(s), while the concept defined in the patent is 
more akin to a microcontroller. 
INTEL 4004 
The 4004 with cover removed (left) and as actually used (right) 
The Intel 4004 is generally regarded as the first commercially available microprocessor, and 
cost $60. The first known advertisement for the 4004 is dated November 15, 1971 and 
appeared in Electronic News. The project that produced the 4004 originated in 1969, 
when Busicom, a Japanese calculator manufacturer, asked Intel to build a chipset for high-performance 
desktop calculators. Bascom’s original design called for a programmable chip 
set consisting of seven different chips. Three of the chips were to make a special-purpose 
CPU with its program stored in ROM and its data stored in shift register read-write 
memory. Ted Hoff, the Intel engineer assigned to evaluate the project, believed the Busicom 
design could be simplified by using dynamic RAM storage for data, rather than shift register 
memory, and a more traditional general-purpose CPU architecture. Hoff came up with a 
four–chip architectural proposal: a ROM chip for storing the programs, a dynamic RAM chip 
for storing data, a simple I/O device and a 4-bit central processing unit (CPU). Although not 
a chip designer, he felt the CPU could be integrated into a single chip, but as he lacked the 
technical know-how the idea remained just a wish for the time being. 
While the architecture and specifications of the MCS-4 came from the interaction of Hoff 
with Stanley Mazor, a software engineer reporting to him, and with Busicom engineer 
Masatoshi Shima, during 1969, Mazor and Hoff moved on to other projects. In April 1970, 
Intel hired Italian-born engineer Federico Faggin as project leader, a move that ultimately 
made the single-chip CPU final design a reality (Shima meanwhile designed the Busicom 
calculator firmware and assisted Faggin during the first six months of the implementation). 
Faggin, who originally developed the silicon gate technology (SGT) in 1968 at Fairchild 
Semiconductor and designed the world’s first commercial integrated circuit using SGT, the 
Fairchild 3708, had the correct background to lead the project into what would become the 
first commercial general purpose microprocessor. Since SGT was his very own invention, it 
in addition to his new methodology for random logic design made it possible to implement a 
single-chip CPU with the proper speed, power dissipation and cost. The manager of Intel's 
MOS Design Department was Leslie L. Vadász at the time of the MCS-4 development, but 
Vadasz's attention was completely focused on the mainstream business of semiconductor 
memories and he left the leadership and the management of the MCS-4 project to Faggin,
who was ultimately responsible for leading the 4004 project to its realization. Production 
units of the 4004 were first delivered to Busicom in March 1971 and shipped to other 
customers in late 1971. 
PICO/GENERAL INSTRUMENT 
The PICO1/GI250 chip introduced in 1971. This was designed by Pico Electronics 
(Glenrothes, Scotland) and manufactured by General Instrument of Hicksville NY. 
In 1971 Pico Electronics and General Instrument (GI) introduced their first collaboration in 
ICs, a complete single chip calculator IC for the Monroe/Litton Royal Digital III calculator. 
This chip could also arguably lay claim to be one of the first microprocessors or 
microcontrollers having ROM, RAM and a RISC instruction set on-chip. The layout for the 
four layers of the PMOS process was hand drawn at x500 scale on Mylar film, a significant 
task at the time given the complexity of the chip. 
Pico was a spinout by five GI design engineers whose vision was to create single chip 
calculator ICs. They had significant previous design experience on multiple calculator 
chipsets with both GI and Marconi-Elliott. The key team members had originally been tasked 
by Elliott Automation to create an 8-bit computer in MOS and had helped establish a MOS 
Research Laboratory in Glenrothes, Scotland in 1967. 
Calculators were becoming the largest single market for semiconductors and Pico and GI 
went on to have significant success in this burgeoning market. GI continued to innovate in 
microprocessors and microcontrollers with products including the CP1600, IOB1680 and 
PIC1650. In 1987 the GI Microelectronics business was spun out into the Microchip PIC 
microcontroller business. 
FOUR-PHASE SYSTEM ALI 
The Four-Phase Systems AL1 was an 8-bit bit slice chip containing eight registers and an 
ALU. It was designed by Lee Boysel in 1969. At the time, it formed part of a nine-chip, 24-bit 
CPU with three AL1s, but it was later called a microprocessor when, in response to 1990s 
litigation by Texas Instruments, a demonstration system was constructed where a single AL1 
formed part of a courtroom demonstration computer system, together with RAM, ROM, and 
an input-output device.
8-BIT DESIGN 
The Intel 4004 was followed in 1972 by the Intel 8008, the world's first 8-bit microprocessor. 
The 8008 was not, however, an extension of the 4004 design, but instead the culmination of 
a separate design project at Intel, arising from a contract with Computer Terminals 
Corporation, of San Antonio TX, for a chip for a terminal they were designing, the Data point 
2200 — fundamental aspects of the design came not from Intel but from CTC. In 1968, 
CTC's Vic Poor and Harry Pyle developed the original design for the instruction set and 
operation of the processor. In 1969, CTC contracted two companies, Intel and Texas 
Instruments, to make a single-chip implementation, known as the CTC 1201. In late 1970 or 
early 1971, TI dropped out being unable to make a reliable part. In 1970, with Intel yet to 
deliver the part, CTC opted to use their own implementation in the Data point 2200; using 
traditional TTL logic instead (thus the first machine to run “8008 code” was not in fact a 
microprocessor at all and was delivered a year earlier). Intel's version of the 1201 
microprocessor arrived in late 1971, but was too late, slow, and required a number of 
additional support chips. CTC had no interest in using it. CTC had originally contracted Intel 
for the chip, and would have owed them $50,000 for their design work.[32] To avoid paying for 
a chip they did not want (and could not use), CTC released Intel from their contract and 
allowed them free use of the design. Intel marketed it as the 8008 in April, 1972, as the 
world's first 8-bit microprocessor. It was the basis for the famous "Mark-8" computer kit 
advertised in the magazine Radio-Electronics in 1974. 
The 8008 was the precursor to the very successful Intel 8080 (1974), which offered much 
improved performance over the 8008 and required fewer support chips, Zilog Z80 (1976), 
and derivative Intel 8-bit processors. The competing Motorola 6800 was released August 
1974 and the similar MOS Technology 6502 in 1975 (both designed largely by the same 
people). The 6502 family rivaled the Z80 in popularity during the 1980s. 
A low overall cost, small packaging, simple computer bus requirements, and sometimes the 
integration of extra circuitry (e.g. the Z80's built-in memory refresh circuitry) allowed the 
home computer "revolution" to accelerate sharply in the early 1980s. This delivered such 
inexpensive machines as the Sinclair ZX-81, which sold for US$99. A variation of the 6502, 
the MOS Technology 6510 was used in the Commodore 64 and yet another variant, 
the 8502, powered the Commodore 128. 
The Western Design Center, Inc. (WDC) introduced the CMOS 65C02 in 1982 and licensed 
the design to several firms. It was used as the CPU in the Apple IIe and IIc personal 
computers as well as in medical implantable grade pacemakers and defibrillators, 
automotive, industrial and consumer devices. WDC pioneered the licensing of 
microprocessor designs, later followed by ARM (32-bit) and other microprocessor intellectual 
property (IP) providers in the 1990s. 
Motorola introduced the MC6809 in 1978, an ambitious and well thought-through 8-bit 
design which was source compatible with the 6800 and was implemented using purely hard-wired 
logic. (Subsequent 16-bit microprocessors typically used microcode to some extent, as 
CISC design requirements were getting too complex for pure hard-wired logic.)
Another early 8-bit microprocessor was the Signe tics 2650, which enjoyed a brief surge of 
interest due to its innovative and powerful instruction set architecture. 
A seminal microprocessor in the world of spaceflight was RCA's RCA 1802 (aka CDP1802, 
RCA COSMAC) (introduced in 1976), which was used on board the Galileo probe to Jupiter 
(launched 1989, arrived 1995). RCA COSMAC was the first to implement CMOS technology. 
The CDP1802 was used because it could be run at very low power, and because a variant 
was available fabricated using a special production process, silicon on sapphire (SOS), 
which provided much better protection against cosmic radiation and electrostatic than that of 
any other processor of the era. Thus, the SOS version of the 1802 was said to be the first 
radiation-hardened microprocessor. 
The RCA 1802 had what is called a static design, meaning that the clock frequency could be 
made arbitrarily low, even to 0 Hz, a total stop condition. This let the spacecraft use minimum 
electric power for long uneventful stretches of a voyage. Timers or sensors would awaken 
the processor in time for important tasks, such as navigation updates, attitude control, data 
acquisition, and radio communication. Current versions of the Western Design Center 65C02 
and 65C816 have static cores, and thus retain data even when the clock is completely 
halted. 
12-BIT DESIGN 
The Intersil 6100 family consisted of a 12-bit microprocessor (the 6100) and a range of 
peripheral support and memory ICs. The microprocessor recognized the DEC PDP- 
8minicomputer instruction set. As such it was sometimes referred to as the CMOS-PDP8. 
Since it was also produced by Harris Corporation, it was also known as the Harris HM-6100. 
By virtue of its CMOS technology and associated benefits, the 6100 was being incorporated 
into some military designs until the early 1980s. 
16-BIT DESIGN 
The first multi-chip 16-bit microprocessor was the National Semiconductor IMP-16, introduced in 
early 1973. An 8-bit version of the chipset was introduced in 1974 as the IMP-8. 
Other early multi-chip 16-bit microprocessors include one that Digital Equipment Corporation 
(DEC) used in the LSI-11 OEM board set and the packaged PDP 11/03 minicomputer—and 
the Fairchild Semiconductor Micro Flame 9440, both introduced in 1975–1976. In 1975, National 
introduced the first 16-bit single-chip microprocessor, the National Semiconductor PACE, which 
was later followed by an NMOS version, the INS8900. 
Another early single-chip 16-bit microprocessor was TI's TMS 9900, which was also compatible 
with their TI-990 line of minicomputers. The 9900 was used in the TI 990/4 minicomputer, the TI- 
99/4A home computer, and the TM990 line of OEM microcomputer boards. The chip was 
packaged in a large ceramic 64-pin DIP package, while most 8-bit microprocessors such as the 
Intel 8080 used the more common, smaller, and less expensive plastic 40-pin DIP. A follow-on 
chip, the TMS 9980, was designed to compete with the Intel 8080, had the full TI 990 16 -bit 
instruction set, used a plastic 40-pin package, moved data 8 bits at a time, but could only
address 16 KB. A third chip, the TMS 9995, was a new design. The family later expanded to 
include the 99105 and 99110. 
The Western Design Center (WDC) introduced the CMOS 65816 16-bit upgrade of the WDC 
CMOS 65C02 in 1984. The 65816 16-bit microprocessor was the core of the Apple IIgsand later 
the Super Nintendo Entertainment System, making it one of the most popular 16-bit designs of all 
time. 
Intel "upsized" their 8080 design into the 16-bit Intel 8086, the first member of the x86 family, 
which powers most modern PC type computers. Intel introduced the 8086 as a cost-effective way 
of porting software from the 8080 lines, and succeeded in winning much business on that 
premise. The 8088, a version of the 8086 that used an 8-bit external data bus, was the 
microprocessor in the first IBM PC. Intel then released the 80186 and 80188, the 80286 and, in 
1985, the 32-bit 80386, cementing their PC market dominance with the processor family's 
backwards compatibility. The 80186 and 80188 were essentially versions of the 8086 and 8088, 
enhanced with some onboard peripherals and a few new instructions; they were not used in IBM-compatible 
PCs because the built-in peripherals and their locations in the memory map were 
incompatible with the IBM design. The 8086 and successors had an innovative but limited 
method of memory segmentation, while the 80286 introduced a full-featured segmented memory 
management unit (MMU). The 80386 introduced a flat 32-bit memory model with paged memory 
management. 
The Intel x86 processors up to and including the 80386 do not include floating-point units (FPUs). 
Intel introduced the 8087, 80287, and 80387 math coprocessors to add hardware floating-point 
and transcendental function capabilities to the 8086 through 80386 CPUs. The 8087 works with 
the 8086/8088 and 80186/80188, the 80187 works with the 80186/80188, the 80287 works with 
the 80286 and 80386, and the 80387 works with the 80386 (yielding better performance than the 
8028. The combination of an x86 CPU and an x87 coprocessor forms a single multi -chip 
microprocessor; the two chips are programmed as a unit using a single integrated instruction 
set. Though the 8087 coprocessor is interfaced to the CPU through I/O ports in the CPU's 
address space, this is transparent to the program, which does not need to know about or access 
these I/O ports directly; the program accesses the coprocessor and its registers through normal 
instruction op codes. Starting with the successor to the 80386, the 80486, the FPU was 
integrated with the control unit, MMU, and integer ALU in a pipelined design on a single chip (in 
the 80486DX version), or the FPU was eliminated entirely (in the 80486SX version). An 
ostensible coprocessor for the 80486SX, the 80487 was actually a complete 80486DX that 
disabled and replaced the coprocessor less 80486SX that it was installed to upgrade.
32-BIT DESIGN 
Upper interconnect layers on an Intel 80486DX2 die 
16-bit designs had only been on the market briefly when 32-bit implementations started to 
appear. 
The most significant of the 32-bit designs is the Motorola MC68000, introduced in 1979. The 
68k, as it was widely known, had 32-bit registers in its programming model but used 16-bit 
internal data paths, three 16-bit Arithmetic Logic Units, and a 16-bit external data bus (to 
reduce pin count), and externally supported only 24-bit addresses (internally it worked with 
full 32 bit addresses). In PC-based IBM-compatible mainframes the MC68000 internal 
microcode was modified to emulate the 32-bit System/370 IBM mainframe.[36] Motorola 
generally described it as a 16-bit processor, though it clearly has 32-bit capable architecture. 
The combination of high performance, large (16 megabytes or 224 bytes) memory space and 
fairly low cost made it the most popular CPU design of its class. The Apple Lisa and 
Macintosh designs made use of the 68000, as did a host of other designs in the mid-1980s, 
including the Atari ST and Commodore Amiga. 
The world's first single-chip fully 32-bit microprocessor, with 32-bit data paths, 32-bit buses, 
and 32-bit addresses, was the AT&T Bell LabsBELLMAC-32A, with first samples in 1980, 
and general production in 1982.[37][38] After the divestiture of AT&T in 1984, it was renamed 
the WE 32000 (WE for Western Electric), and had two follow-on generations, the WE 32100 
and WE 32200. These microprocessors were used in the AT&T 3B5 and 3B15 
minicomputers; in the 3B2, the world's first desktop super microcomputer; in the 
"Companion", the world's first 32-bit laptop computer; and in "Alexander", the world's first 
book-sized super microcomputer, featuring ROM-pack memory cartridges similar to today's 
gaming consoles. All these systems ran the UNIX System V operating system. 
The first commercial, single chip, fully 32-bit microprocessor available on the market was 
the HP FOCUS. 
Intel's first 32-bit microprocessor was the IAPX 432, which was introduced in 1981 but was 
not a commercial success. It had an advanced capability-based oriented architecture, but 
poor performance compared to contemporary architectures such as Intel's own 80286 
(introduced 1982), which was almost four times as fast on typical benchmark tests. However,
the results for the iAPX432 was partly due to a rushed and therefore 
suboptimal Ada compiler. 
Motorola's success with the 68000 led to the MC68010, which added virtual memory 
support. The MC68020, introduced in 1984 added full 32-bit data and address buses. The 
68020 became hugely popular in the Unix super microcomputer market, and many small 
companies (e.g., Altos, Charles River Data Systems, Cromemco) produced desktop-size 
systems. The MC68030 was introduced next, improving upon the previous design by 
integrating the MMU into the chip. The continued success led to the MC68040, which 
included an FPU for better math performance. A 68050 failed to achieve its performance 
goals and was not released, and the follow-up MC68060 was released into a market 
saturated by much faster RISC designs. The 68k family faded from the desktop in the early 
1990s. 
Other large companies designed the 68020 and follow-ons into embedded equipment. At 
one point, there were more 68020s in embedded equipment than there were Intel Pentiums 
in PCs.[39] The Cold Fire processor cores are derivatives of the venerable 68020. 
During this time (early to mid-1980s), National Semiconductor introduced a very similar 16- 
bit pin out, 32-bit internal microprocessor called the NS 16032 (later renamed 32016), the full 
32-bit version named the NS 32032. Later, National Semiconductor produced the NS 32132, 
which allowed two CPUs to reside on the same memory bus with built in arbitration. The 
NS32016/32 outperformed the MC68000/10, but the NS32332—which arrived at 
approximately the same time as the MC68020—did not have enough performance. The third 
generation chip, the NS32532, was different. It had about double the performance of the 
MC68030, which was released around the same time. The appearance of RISC processors 
like the AM29000 and MC88000 (now both dead) influenced the architecture of the final 
core, the NS32764. Technically advanced—with a superscalar RISC core, 64-bit bus, and 
internally overclocked—it could still execute Series 32000 instructions through real-time 
translation. 
When National Semiconductor decided to leave the Unix market, the chip was redesigned 
into the Swordfish Embedded processor with a set of on chip peripherals. The chip turned 
out to be too expensive for the laser printer market and was killed. The design team went to 
Intel and there designed the Pentium processor, which is very similar to the NS32764 core 
internally. The big success of the Series 32000 was in the laser printer market, where the 
NS32CG16 with microcode Bit instructions had very good price/performance and was 
adopted by large companies like Canon. By the mid-1980s, Sequent introduced the first 
SMP server-class computer using the NS 32032. This was one of the design's few wins, and 
it disappeared in the late 1980s. The MIPS R2000 (1984) and R3000 (1989) were highly 
successful 32-bit RISC microprocessors. They were used in high-end workstations and 
servers by SGI, among others. Other designs included the Zilog Z80000, which arrived too 
late to market to stand a chance and disappeared quickly. 
The ARM first appeared in 1985. This is a RISC processor design, which has since come to 
dominate the 32-bit embedded systems processor space due in large part to its power 
efficiency, its licensing model, and its wide selection of system development tools.
Semiconductor manufacturers generally license cores and integrate them into their 
own system on a chip products; only a few such vendors are licensed to modify the ARM 
cores. Most cell on phones include an ARM processor, as do a wide variety of other 
products. There are microcontroller-oriented ARM cores without virtual memory support, as 
well as symmetric multiprocessor (SMP) applications processors with virtual memory. 
In the late 1980s, "microprocessor wars" started killing off some of the 
microprocessors. Apparently, with only one bigger design win, sequent, the NS 32032 just 
faded out of existence, and Sequent switched to Intel microprocessors. 
From 1993 to 2003, the 32-bit x86 architectures became increasingly dominant in desktop, 
laptop, and server markets and these microprocessors became faster and more capable. 
Intel had licensed early versions of the architecture to other companies, but declined to 
license the Pentium, so AMD and Cyrix built later versions of the architecture based on their 
own designs. During this span, these processors increased in complexity (transistor count) 
and capability (instructions/second) by at least three orders of magnitude. Intel's Pentium 
line is probably the most famous and recognizable 32-bit processor model, at least with the 
public at broad. 
64-BIT DESIGN IN PC 
While 64-bit microprocessor designs have been in use in several markets since the early 
1990s (including the Nintendo 64 gaming console in 1996), the early 2000s saw the 
introduction of 64-bit microprocessors targeted at the PC market. 
With AMD's introduction of a 64-bit architecture backwards-compatible with x86, x86- 
64 (also called AMD64), in September 2003, followed by Intel's near fully compatible 64-bit 
extensions (first called IA-32e or EM64T, later renamed Intel 64), the 64-bit desktop era 
began. Both versions can run 32-bit legacy applications without any performance penalty as 
well as new 64-bit software. With operating systems Windows XP x64, Windows 
Vista x64, Windows 7 x64, Linux, BSD, and Mac OS X that run 64-bit native, the software is 
also geared to fully utilize the capabilities of such processors. The move to 64 bits is more 
than just an increase in register size from the IA-32 as it also doubles the number of general-purpose 
registers. 
The move to 64 bits by PowerPC processors had been intended since the processors' 
design in the early 90s and was not a major cause of incompatibility. Existing integer 
registers are extended as are all related data pathways, but, as was the case with IA-32, 
both floating point and vector units had been operating at or above 64 bits for several years. 
Unlike what happened when IA-32 was extended to x86-64, no new general purpose 
registers were added in 64-bit PowerPC, so any performance gained when using the 64-bit 
mode for applications making no use of the larger address space is minimal. 
In 2011, ARM introduced a new 64-bit ARM architecture.
MULTI-CORE DESIGNS 
A different approach to improving a computer's performance is to add extra processors, as 
in symmetric multiprocessing designs, which have been popular in servers and workstations 
since the early 1990s. Keeping up with Moore's Law is becoming increasingly challenging as 
chip-making technologies approach their physical limits. In response, microprocessor 
manufacturers look for other ways to improve performance so they can maintain the 
momentum of constant upgrades. 
A multi-core processor is a single chip that contains more than one microprocessor core. 
Each core can simultaneously execute processor instructions in parallel. This effectively 
multiplies the processor's potential performance by the number of cores, if the software is 
designed to take advantage of more than one processor core. Some components, such as 
bus interface and cache, may be shared between cores. Because the cores are physically 
close to each other, they can communicate with each other much faster than separate (off-chip) 
processors in a multiprocessor system, which improves overall system performance. 
In 2005, AMD released the first native dual-core processor, the Athlon X2. Intel's Pentium D 
had beaten the X2 to market by a few weeks, but it used two separate CPU dies and was 
less efficient than AMD's native design. As of 2012, dual-core and quad-core processors are 
widely used in home PCs and laptops, while quad, six, eight, ten, twelve, and sixteen-core 
processors are common in the professional and enterprise markets with workstations and 
servers. 
Sun Microsystems has released the Niagara and Niagara 2 chips, both of which feature an 
eight-core design. The Niagara 2 supports more threads and operates at 1.6 GHz. 
High-end Intel Xeon processors that are on the LGA 771, LGA1336, and LGA 2011 sockets 
and high-end AMD Opteron processors that are on the C32 and G34 sockets are DP (dual 
processor) capable, as well as the older Intel Core 2 Extreme QX9775 also used in an older 
Mac Pro by Apple and the Intel Skull trail motherboard. AMD's G34 motherboards can 
support up to four CPUs and Intel's LGA 1567 motherboards can support up to eight CPUs. 
Modern desktop computers do not support systems with multiple CPUs, but few applications 
outside of the professional market can make good use of more than four cores. Both Intel 
and AMD currently offer fast quad- and six-core desktop CPUs, making multi CPU systems 
obsolete for many purposes. AMD also offers the first and currently the only eight core 
desktop CPUs with the FX-8xxx line. 
The desktop market has been in a transition towards quad-core CPUs since Intel's Core 2 
Quads were released and now are common, although dual-core CPUs are still more 
prevalent. Older or mobile computers are less likely to have more than two cores than newer 
desktops. Not all software is optimized for multi core CPU's, making fewer, more powerful 
cores preferable. AMD offers CPUs with more cores for a given amount of money than 
similarly priced Intel CPUs—but the AMD cores are somewhat slower, so the two trade 
blows in different applications depending on how well-threaded the programs running are.
For example, Intel's cheapest Sandy Bridge quad-core CPUs often cost almost twice as 
much as AMD's cheapest Athlon II, Phenom II, and FX quad-core CPUs but Intel has dual-core 
CPUs in the same price ranges as AMD's cheaper quad core CPUs. In an application 
that uses one or two threads, the Intel dual cores outperform AMD's similarly priced quad-core 
CPUs—and if a program supports three or four threads the cheap AMD quad-core 
CPUs outperform the similarly priced Intel dual-core CPUs. 
Historically, AMD and Intel have switched places as the company with the fastest CPU 
several times. Intel currently leads on the desktop side of the computer CPU market, with 
their Sandy Bridge and Ivy Bridge series. In servers, AMD's new Opteron seem to have 
superior performance for their price point. This means that AMD are currently more 
competitive in low- to mid-end servers and workstations that more effectively use fewer 
cores and threads. 
RISC 
In the mid-1980s to early 1990s, a crop of new high-performance reduced instruction set 
computer (RISC) microprocessors appeared, influenced by discrete RISC-like CPU designs 
such as the IBM 801 and others. RISC microprocessors were initially used in special-purpose 
machines and UNIX workstations, but then gained wide acceptance in other roles. 
In 1986, HP released its first system with a PA-RISC CPU. The first commercial RISC 
microprocessor design was released in 1984 by MIPS Computer Systems, the 32- 
bit R2000(the R1000 was not released). In 1987 in the non-Unix Acorn computers' 32-bit, 
then cache-less, ARM2-based Acorn Archimedes the first commercial success using 
the ARM architecture, then known as Acorn RISC Machine (ARM); first silicon ARM1 in 
1985. The R3000 made the design truly practical, and the R4000 introduced the world's first 
commercially available 64-bit RISC microprocessor. Competing projects would result in the 
IBM POWER and Sun SPARC architectures. Soon every major vendor was releasing a 
RISC design, including the AT&T CRISP, AMD 29000, Intel i860 and Intel i960, Motorola 
88000, DEC Alpha. 
In the late 1990s, only two 64-bit RISC architectures were still produced in volume for non-embedded 
applications: SPARC and Power ISA, but as ARM has become increasingly 
powerful, in the early 2010s, it became the third RISC architecture in the general computing 
segment. 
MARKET STATISTICS 
In 2003, about US$44 billion worth of microprocessors were manufactured and sold. Although 
about half of that money was spent on CPUs used in desktop or laptop personal computers, 
those count for only about 2% of all CPUs sold. 
About 55% of all CPUs sold in the world are 8-bit microcontrollers, over two billion of which were 
sold in 1997. 
In 2002, less than 10% of all the CPUs sold in the world were 32-bit or more. Of all the 32-bit 
CPUs sold, about 2% are used in desktop or laptop personal computers. Most microprocessors
are used in embedded control applications such as household appliances, automobiles, and 
computer peripherals. Taken as a whole, the average price for a microprocessor, microcontroller, 
or DSP is just over $6. 
About ten billion CPUs were manufactured in 2008. About 98% of new CPUs produced each 
year are embedded. 
MICROARCHITECTURE 
Intel Core microarchitecture 
In electronics engineering and computer engineering, microarchitecture (sometimes 
abbreviated to μarch or uarch), also called computer organization, is the way a 
given instruction set architecture (ISA) is implemented on a processor. A given ISA may be 
implemented with different microarchitectures; implementations may vary due to different 
goals of a given design or due to shifts in technology. 
Computer architecture is the combination of microarchitecture and instruction set design. 
RELATION TO INSTRUCTION SET 
ARCHITECTURE 
The ISA is roughly the same as the programming model of a processor as seen by 
an assembly language programmer or compiler writer. The ISA includes the execution 
model, processor, address and data formats among other things. The microarchitecture 
includes the constituent parts of the processor and how these interconnect and interoperate 
to implement the ISA.
Single bus organization microarchitecture 
The microarchitecture of a machine is usually represented as (more or less detailed) 
diagrams that describe the interconnections of the various micro architectural elements of 
the machine, which may be everything from single gates and registers, to 
complete arithmetic logic units (ALUs) and even larger elements. These diagrams generally 
separate the datapath (where data is placed) and the control path (which can be said to 
steer the data) 
The person designing a system usually draws the specific microarchitecture as a kind 
of data flow diagram. Like a block diagram, the microarchitecture diagram shows micro 
architectural elements such as the arithmetic and logic unit and the register file as a single 
schematic symbol. Typically the diagram connects those elements with arrows and thick 
lines and thin lines to distinguish between three-state buses -- which require a three state 
buffer for each device that drives the bus; unidirectional buses -- always driven by a single 
source, such as the way the address bus on simpler computers is always driven by 
the memory address register; and individual control lines. Very simple computers have 
a single data bus organization -- they have a single three-state bus. The diagram of more 
complex computers usually shows multiple three-state buses, which help the machine do 
more operations simultaneously. 
Each micro architectural element is in turn represented by a schematic describing the 
interconnections of logic gates used to implement it. Each logic gate is in turn represented 
by a circuit diagram describing the connections of the transistors used to implement it in 
some particular logic family. Machines with different microarchitectures may have the same 
instruction set architecture, and thus be capable of executing the same programs. New 
microarchitectures and/or circuitry solutions, along with advances in semiconductor 
manufacturing, are what allows newer generations of processors to achieve higher 
performance while using the same ISA. 
In principle, a single microarchitecture could execute several different ISAs with only minor 
changes to the microcode.
ASPECT OF 
MICROARCHITECTURE 
Intel 80286 microarchitecture 
The pipelined data path is the most commonly used data path design in microarchitecture 
today. This technique is used in most modern microprocessors, microcontrollers, and DSPs. 
The pipelined architecture allows multiple instructions to overlap in execution, much like an 
assembly line. The pipeline includes several different stages which are fundamental in 
microarchitecture designs Some of these stages include instruction fetch, instruction decode, 
execute, and write back. Some architectures include other stages such as memory access. 
The design of pipelines is one of the central micro architectural tasks. 
Execution units are also essential to microarchitecture. Execution units include arithmetic 
logic units (ALU), floating point units (FPU), load/store units, branch prediction, and SIMD. 
These units perform the operations or calculations of the processor. The choice of the 
number of execution units, their latency and throughput is a central micro architectural 
design task. The size, latency, throughput and connectivity of memories within the system 
are also micro architectural decisions. 
System-level design decisions such as whether or not to include peripherals, such 
as memory controllers, can be considered part of the micro architectural design process. 
This includes decisions on the performance-level and connectivity of these peripherals. 
Unlike architectural design, where achieving a specific performance level is the main goal, 
micro architectural design pays closer attention to other constraints. Since microarchitecture 
design decisions directly affect what goes into a system, attention must be paid to such 
issues as: 
 Chip area/cost 
 Power consumption 
 Logic complexity 
 Ease of connectivity 
 Manufacturability 
 Ease of debugging 
 Testability 
MICRO ARCHITRUCTURAL 
CONCEPT 
Instruction cycle 
In general, all CPUs, single-chip microprocessors or multi-chip implementations run 
programs by performing the following steps: 
1. Read an instruction and decode it
2. Find any associated data that is needed to process the instruction 
3. Process the instruction 
4. Write the results out 
The instruction cycle is repeated continuously until the power is turned off. 
Increasing execution speed 
Complicating this simple-looking series of steps is the fact that the memory hierarchy, which 
includes caching, main memory and non-volatile storage like hard disks (where the program 
instructions and data reside), has always been slower than the processor itself. Step (2) 
often introduces a lengthy (in CPU terms) delay while the data arrives over the computer. A 
considerable amount of research has been put into designs that avoid these delays as much 
as possible. Over the years, a central goal was to execute more instructions in parallel, thus 
increasing the effective execution speed of a program. These efforts introduced complicated 
logic and circuit structures. Initially, these techniques could only be implemented on 
expensive mainframes or supercomputers due to the amount of circuitry needed for these 
techniques. As semiconductor manufacturing progressed, more and more of these 
techniques could be implemented on a single semiconductor chip. See Moore's law. 
Instruction set choice 
Instruction sets have shifted over the years, from originally very simple to sometimes very 
complex (in various respects). In recent years, load-store architectures, VLIW and EPIC 
types have been in fashion. Architectures that are dealing with data 
parallelism include SIMD and Vectors. Some labels used to denote classes of CPU 
architectures are not particularly descriptive, especially so the CISC label; many early 
designs retroactively denoted "CISC" are in fact significantly simpler than modern RISC 
processors (in several respects). 
However, the choice of instruction set architecture may greatly affect the complexity of 
implementing high performance devices. The prominent strategy, used to develop the first 
RISC processors, was to simplify instructions to a minimum of individual semantic 
complexity combined with high encoding regularity and simplicity. Such uniform instructions 
were easily fetched, decoded and executed in a pipelined fashion and a simple strategy to 
reduce the number of logic levels in order to reach high operating frequencies; instruction 
cache-memories compensated for the higher operating frequency and inherently low code 
density while large register sets were used to factor out as much of the (slow) memory 
accesses as possible. 
Instruction pipelining 
One of the first, and most powerful, techniques to improve performance is the use of 
the instruction pipeline. Early processor designs would carry out all of the steps above for 
one instruction before moving onto the next. Large portions of the circuitry were left idle at 
any one step; for instance, the instruction decoding circuitry would be idle during execution 
and so on. 
Pipelines improve performance by allowing a number of instructions to work their way 
through the processor at the same time. In the same basic example, the processor would 
start to decode (step 1) a new instruction while the last one was waiting for results. This 
would allow up to four instructions to be "in flight" at one time, making the processor look 
four times as fast. Although any one instruction takes just as long to complete (there are still 
four steps) the CPU as a whole "retires" instructions much faster. 
RISC make pipelines smaller and much easier to construct by cleanly separating each stage 
of the instruction process and making them take the same amount of time — one cycle. The 
processor as a whole operates in an assembly line fashion, with instructions coming in one 
side and results out the other. Due to the reduced complexity of the Classic RISC pipeline, 
the pipelined core and an instruction cache could be placed on the same size die that would 
otherwise fit the core alone on a CISC design. This was the real reason that RISC was 
faster. Early designs like the SPARC and MIPS often ran over 10 times as fast 
as Intel and Motorola CISC solutions at the same clock speed and price.
Pipelines are by no means limited to RISC designs. By 1986 the top-of-the-line VAX 
implementation (VAX 8800) was a heavily pipelined design, slightly predating the first 
commercial MIPS and SPARC designs. Most modern CPUs (even embedded CPUs) are 
now pipelined, and micro coded CPUs with no pipelining are seen only in the most area-constrained 
embedded processors. Large CISC machines, from the VAX 8800 to the 
modern Pentium 4 and Athlon, are implemented with both microcode and pipelines. 
Improvements in pipelining and caching are the two major micro architectural advances that 
have enabled processor performance to keep pace with the circuit technology on which they 
are based. 
Cache 
It was not long before improvements in chip manufacturing allowed for even more circuitry to 
be placed on the die, and designers started looking for ways to use it. One of the most 
common was to add an ever-increasing amount of cache memory on-die. Cache is simply 
very fast memory, memory that can be accessed in a few cycles as opposed to many 
needed to "talk" to main memory. The CPU includes a cache controller which automates 
reading and writing from the cache, if the data is already in the cache it simply "appears", 
whereas if it is not the processor is "stalled" while the cache controller reads it in. 
RISC designs started adding cache in the mid-to-late 1980s, often only 4 KB in total. This 
number grew over time, and typical CPUs now have at least 512 KB, while more powerful 
CPUs come with 1 or 2 or even 4, 6, 8 or 12 MB, organized in multiple levels of a memory 
hierarchy. Generally speaking, more cache means more performance, due to reduced 
stalling. 
Caches and pipelines were a perfect match for each other. Previously, it didn't make much 
sense to build a pipeline that could run faster than the access latency of off-chip memory. 
Using on-chip cache memory instead, meant that a pipeline could run at the speed of the 
cache access latency, a much smaller length of time. This allowed the operating frequencies 
of processors to increase at a much faster rate than that of off-chip memory. 
Branch prediction 
One barrier to achieving higher performance through instruction-level parallelism stems from 
pipeline stalls and flushes due to branches. Normally, whether a conditional branch will be 
taken isn't known until late in the pipeline as conditional branches depend on results coming 
from a register. From the time that the processor's instruction decoder has figured out that it 
has encountered a conditional branch instruction to the time that the deciding register value 
can be read out, the pipeline needs to be stalled for several cycles, or if it's not and the 
branch is taken, the pipeline needs to be flushed. As clock speeds increase the depth of the 
pipeline increases with it, and some modern processors may have 20 stages or more. On 
average, every fifth instruction executed is a branch, so without any intervention, that's a 
high amount of stalling. 
Techniques such as branch prediction and speculative execution are used to lessen these 
branch penalties. Branch prediction is where the hardware makes educated guesses on 
whether a particular branch will be taken. In reality one side or the other of the branch will be 
called much more often than the other. Modern designs have rather complex statistical 
prediction systems, which watch the results of past branches to predict the future with 
greater accuracy. The guess allows the hardware to prefetch instructions without waiting for 
the register read. Speculative execution is a further enhancement in which the code along 
the predicted path is not just perfected but also executed before it is known whether the 
branch should be taken or not. This can yield better performance when the guess is good, 
with the risk of a huge penalty when the guess is bad because instructions need to be 
undone. 
Superscalar 
Even with all of the added complexity and gates needed to support the concepts outlined 
above, improvements in semiconductor manufacturing soon allowed even more logic gates 
to be used.
In the outline above the processor processes parts of a single instruction at a time. 
Computer programs could be executed faster if multiple instructions were processed 
simultaneously. This is what superscalar processors achieve, by replicating functional units 
such as ALUs. The replication of functional units was only made possible when the die area 
of a single-issue processor no longer stretched the limits of what could be reliably 
manufactured. By the late 1980s, superscalar designs started to enter the market place. 
In modern designs it is common to find two load units, one store (many instructions have no 
results to store), two or more integer math units, two or more floating point units, and often 
a SIMD unit of some sort. The instruction issue logic grows in complexity by reading in a 
huge list of instructions from memory and handing them off to the different execution units 
that are idle at that point. The results are then collected and re-ordered at the end. 
Out-of-order execution 
The addition of caches reduces the frequency or duration of stalls due to waiting for data to 
be fetched from the memory hierarchy, but does not get rid of these stalls entirely. In early 
designs a cache miss would force the cache controller to stall the processor and wait. Of 
course there may be some other instruction in the program whose data is available in the 
cache at that point. Out-of-order execution allows that ready instruction to be processed 
while an older instruction waits on the cache, then re-orders the results to make it appear 
that everything happened in the programmed order. This technique is also used to avoid 
other operand dependency stalls, such as an instruction awaiting a result from a long latency 
floating-point operation or other multi-cycle operations. 
Register renaming 
Register renaming refers to a technique used to avoid unnecessary serialized execution of 
program instructions because of the reuse of the same registers by those instructions. 
Suppose we have two groups of instruction that will use the same register. One set of 
instructions is executed first to leave the register to the other set, but if the other set is 
assigned to a different similar register, both sets of instructions can be executed in parallel 
(or) in series. 
Multiprocessing and multithreading 
Computer architects have become stymied by the growing mismatch in CPU operating 
frequencies and DRAM access times. None of the techniques that exploited instruction-level 
parallelism (ILP) within one program could make up for the long stalls that occurred when 
data had to be fetched from main memory. Additionally, the large transistor counts and high 
operating frequencies needed for the more advanced ILP techniques required power 
dissipation levels that could no longer be cheaply cooled. For these reasons, newer 
generations of computers have started to exploit higher levels of parallelism that exist 
outside of a single program or program thread. 
This trend is sometimes known as throughput computing. This idea originated in the 
mainframe market where online transaction processing emphasized not just the execution 
speed of one transaction, but the capacity to deal with massive numbers of transactions. 
With transaction-based applications such as network routing and web-site serving greatly 
increasing in the last decade, the computer industry has re-emphasized capacity and 
throughput issues. 
One technique of how this parallelism is achieved is through multiprocessing systems, 
computer systems with multiple CPUs. Once reserved for high-end mainframes and 
supercomputers, small scale (2-8) multiprocessors servers have become commonplace for 
the small business market. For large corporations, large scale (16-256) multiprocessors are 
common. Even personal computers with multiple CPUs have appeared since the 1990s. 
With further transistor size reductions made available with semiconductor technology 
advances, multicore CPUs have appeared where multiple CPUs are implemented on the 
same silicon chip. Initially used in chips targeting embedded markets, where simpler and 
smaller CPUs would allow multiple instantiations to fit on one piece of silicon. By 2005, 
semiconductor technology allowed dual high-end desktop CPUs CMP chips to be
manufactured in volume. Some designs, such as Sun Microsystems' Ultra SPARC T1 have 
reverted to simpler (scalar, in-order) designs in order to fit more processors on one piece of 
silicon. 
Another technique that has become more popular recently is multithreading. In 
multithreading, when the processor has to fetch data from slow system memory, instead of 
stalling for the data to arrive, the processor switches to another program or program thread 
which is ready to execute. Though this does not speed up a particular program/thread, it 
increases the overall system throughput by reducing the time the CPU is idle. 
Conceptually, multithreading is equivalent to a context switch at the operating system level. 
The difference is that a multithreaded CPU can do a thread switch in one CPU cycle instead 
of the hundreds or thousands of CPU cycles a context switch normally requires. This is 
achieved by replicating the state hardware (such as the register file and program counter) for 
each active thread. 
A further enhancement is simultaneous multithreading. This technique allows superscalar 
CPUs to execute instructions from different programs/threads simultaneously in the same 
cycle. 
ARITHMETIC LOGIC UNIT 
In digital electronics, an arithmetic logic unit (ALU) is a digital circuit that 
performs integer arithmetic and logical operations. The ALU is a fundamental building block 
of the central processing unit of a computer, and even the simplest microprocessors contain 
one for purposes such as maintaining timers. The processors found inside modern CPUs 
and graphics processing units (GPUs) accommodate very powerful and very complex ALUs; 
a single component may contain a number of ALUs. 
Mathematician John von Neumann proposed the ALU concept in 1945, when he wrote a 
report on the foundations for a new computer called the EDVAC. 
Arithmetic And Logic Unit schematic symbol 
Casacdable 8 Bit ALU Texas Instruments SN74AS888
NUMERICAL SYSTEM 
An ALU must process numbers using the same formats as the rest of the digital circuit. The 
format of modern processors is almost always the two's complement binary number 
representation. Early computers used a wide variety of number systems, including ones' 
complement, two's complement, sign-magnitude format, and even true decimal systems, 
with various representation of the digits. 
The ones' complement and two's complement number systems allow for subtraction to be 
accomplished by adding the negative of a number in a very simple way which negates the 
need for specialized circuits to do subtraction; however, calculating the negative in two's 
complement requires adding a one to the low order bit and propagating the carry. An 
alternative way to do two's complement subtraction of A−B is to present a one to the carry 
input of the adder and use ¬B rather than B as the second input. The arithmetic, logic and 
shift circuits introduced in previous sections can be combined into one ALU with common 
selection. 
PRACTICAL OVERVIEW 
Most of a processor's operations are performed by one or more ALUs. An ALU loads data 
from input registers. Then an external control unit tells the ALU what operation to perform on 
that data, and then the ALU stores its result into an output register. The control unit is 
responsible for moving the processed data between these registers, ALU and memory. 
Complex operations 
Engineers can design an arithmetic logic unit to calculate most operations. The more 
complex the operation, the more expensive the ALU is, the more space it uses in the 
processor, and the more power it dissipates. Therefore, engineers compromise. They make 
the ALU powerful enough to make the processor fast, yet not so complex as to become 
prohibitive. For example, computing the square root of a number might use: 
1. Calculation in a single clock Design an extraordinarily complex ALU that calculates 
the square root of any number in a single step. 
2. Calculation pipeline Design a very complex ALU that calculates the square root of 
any number in several steps. The intermediate results go through a series of circuits 
arranged like a factory production line. The ALU can accept new numbers to 
calculate even before having finished the previous ones. The ALU can now produce 
numbers as fast as a single-clock ALU, although the results start to flow out of the 
ALU only after an initial delay. 
3. Iterative calculation Design a complex ALU that calculates the square root through 
several steps. This usually relies on control from a complex control unit with built-in 
microcode. 
4. Co-processor Design a simple ALU in the processor, and sell a separate specialized 
and costly processor that the customer can install just beside this one, and 
implements one of the options above.
5. Software libraries Tell the programmers that there is no co-processor and there is 
no emulation, so they will have to write their own algorithms to calculate square roots 
by software. 
6. Software emulation Emulate the existence of the co-processor. Whenever a 
program attempts to perform the square root calculation, make the processor check 
if there is a co-processor present and use it if there is one; if there is not 
one, interrupt the processing of the program and invoke the operating system to 
perform the square root calculation through some software algorithm. 
The options above go from the fastest and most expensive one to the slowest and least 
expensive one. Therefore, while even the simplest computer can calculate the most 
complicated formula, the simplest computers will usually take a long time doing that because 
of the several steps for calculating the formula. 
Powerful processors like the Intel Core and AMD64 implement option #1 for several simple 
operations, #2 for the most common complex operations and #3 for the extremely complex 
operations. 
Inputs and outputs 
The inputs to the ALU are the data to be operated on (called operands) and a code from 
the control unit indicating which operation to perform. Its output is the result of the 
computation. One thing designers must keep in mind is whether the ALU will operate on big-endian 
or little-endian numbers. 
In many designs, the ALU also takes or generates inputs or outputs a set of condition codes 
from or to a status register. These codes are used to indicate cases such as carry-in or 
carry-out, overflow, divide-by-zero, etc. 
A floating-point unit also performs arithmetic operations between two values, but they do so 
for numbers in floating-point representation, which is much more complicated than the two's 
complement representation used in a typical ALU. In order to do these calculations, 
a FPU has several complex circuits built-in, including some internal ALUs. 
In modern practice, engineers typically refer to the ALU as the circuit that performs integer 
arithmetic operations (like two's complement and BCD). Circuits that calculate more complex 
formats like floating point, complex numbers, etc. usually receive a more specific name such 
as floating-point unit (FPU).
CENTRAL PROCESSING UNIT 
An Intel 80486DX2 CPU, as seen from above. 
An Intel 80486DX2, as seen from below 
A central processing unit (CPU) is the hardware within a computer that carries out 
the instructions of a computer program by performing the basic arithmetical, logical, 
and input/output operations of the system. The term has been in use in the computer 
industry at least since the early 1960s. The form, design, and implementation of CPUs have 
changed over the course of their history, but their fundamental operation remains much the 
same. 
A computer can have more than one CPU; this is called multiprocessing. All modern CPUs 
are microprocessors, meaning contained on a single chip. Some integrated circuits (ICs) can 
contain multiple CPUs on a single chip; those ICs are called multi-core processors. An IC 
containing a CPU can also contain peripheral devices, and other components of a computer 
system; this is called a system on a chip (SOC). 
Two typical components of a CPU are the arithmetic logic unit (ALU), which performs 
arithmetic and logical operations, and the control unit(CU), which extracts instructions 
from memory and decodes and executes them, calling on the ALU when necessary. 
Not all computational systems rely on a central processing unit. An array processor or vector 
processor has multiple parallel computing elements, with no one unit considered the 
"center". In the distributed computing model, problems are solved by a distributed 
interconnected set of processors.
TRANSISTOR AND INTEGRATED 
CIRCUIT CPU 
CPU, core memory, and external bus interface of a DEC PDP-8/I. Made of medium-scale integrated 
circuits. 
The design complexity of CPUs increased as various technologies facilitated building 
smaller and more reliable electronic devices. The first such improvement came with the 
advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had 
to be built out of bulky, unreliable, and fragile switching elements like vacuum 
tubes and electrical relays. With this improvement more complex and reliable CPUs 
were built onto one or several printed circuit boards containing discrete (individual) 
components. 
During this period, a method of manufacturing many interconnected transistors in a 
compact space was developed. The integrated circuit (IC) allowed a large number of 
transistors to be manufactured on a single semiconductor-based die, or "chip". At first 
only very basic non-specialized digital circuits such as NOR gates were miniaturized into 
ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale 
integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance 
computer, usually contained up to a few score transistors. To build an entire CPU out of 
SSI ICs required thousands of individual chips, but still consumed much less space and 
power than earlier discrete transistor designs. As microelectronic technology advanced, 
an increasing number of transistors were placed on ICs, thus decreasing the quantity of 
individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale 
integration) ICs increased transistor counts to hundreds, and then thousands. 
In 1964 IBM introduced its System/360 computer architecture which was used in a 
series of computers that could run the same programs with different speed and 
performance. This was significant at a time when most electronic computers were 
incompatible with one another, even those made by the same manufacturer. To facilitate 
this improvement, IBM utilized the concept of a micro program (often called 
"microcode"), which still sees widespread usage in modern CPUs.[4] The System/360 
architecture was so popular that it dominated the mainframe computer market for 
decades and left a legacy that is still continued by similar modern computers like the 
IBM z Series. In the same year (1964), Digital Equipment Corporation (DEC) introduced 
another influential computer aimed at the scientific and research markets, the PDP-8. 
DEC would later introduce the extremely popularPDP-11 line that originally was built with 
SSI ICs but was eventually implemented with LSI components once these became 
practical. In stark contrast with its SSI and MSI predecessors, the first LSI 
implementation of the PDP-11 contained a CPU composed of only four LSI integrated 
circuits. 
Transistor-based computers had several distinct advantages over their predecessors. 
Aside from facilitating increased reliability and lower power consumption, transistors also 
allowed CPUs to operate at much higher speeds because of the short switching time of a
transistor in comparison to a tube or relay. Thanks to both the increased reliability as 
well as the dramatically increased speed of the switching elements (which were almost 
exclusively transistors by this time), CPU clock rates in the tens of megahertz were 
obtained during this period. Additionally while discrete transistor and IC CPUs were in 
heavy usage, new high-performance designs like SIMD (Single Instruction Multiple 
Data) processors began to appear. These early experimental designs later gave rise to 
the era of specialized supercomputers like those made by Cray Inc. 
MICROPROCESSORS 
Die of an Intel 80486DX2microprocessor (actual size: 12×6.75 mm) in its packaging 
Intel Core i5 CPU on a Vaio E series laptop motherboard (on the right, beneath the heat pipe). 
In the 1970s the fundamental inventions by Federico Faggin (Silicon Gate MOS ICs 
with self-aligned gates along with his new random logic design methodology) changed 
the design and implementation of CPUs forever. Since the introduction of the first 
commercially available microprocessor (the Intel 4004) in 1970, and the first widely 
used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely 
overtaken all other central processing unit implementation methods. Mainframe and 
minicomputer manufacturers of the time launched proprietary IC development programs 
to upgrade their older computer architectures, and eventually produced instruction set 
compatible microprocessors that were backward-compatible with their older hardware 
and software. Combined with the advent and eventual success of the 
ubiquitous personal computer, the term CPU is now applied almost exclusively to 
microprocessors. Several CPUs (denoted 'cores') can be combined in a single 
processing chip. 
Previous generations of CPUs were implemented as discrete components and 
numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, 
on the other hand, are CPUs manufactured on a very small number of ICs; usually just 
one. The overall smaller CPU size as a result of being implemented on a single die 
means faster switching time because of physical factors like decreased gate parasitic
capacitance. This has allowed synchronous microprocessors to have clock rates ranging 
from tens of megahertz to several gigahertz. Additionally, as the ability to construct 
exceedingly small transistors on an IC has increased, the complexity and number of 
transistors in a single CPU has increased many fold. This widely observed trend is 
described by Moore's law, which has proven to be a fairly accurate predictor of the 
growth of CPU (and other IC) complexity. 
While the complexity, size, construction, and general form of CPUs have changed 
enormously since 1950, it is notable that the basic design and function has not changed 
much at all. Almost all common CPUs today can be very accurately described as von 
Neumann stored-program machines.[b] As the aforementioned Moore's law continues to 
hold true,[6] concerns have arisen about the limits of integrated circuit transistor 
technology. Extreme miniaturization of electronic gates is causing the effects of 
phenomena like electro migration and sub threshold to become much more significant. 
These newer concerns are among the many factors causing researchers to investigate 
new methods of computing such as the quantum computer, as well as to expand the 
usage of parallelism and other methods that extend the usefulness of the classical von 
Neumann model. 
OPERATION 
The fundamental operation of most CPUs, regardless of the physical form they take, is 
to execute a sequence of stored instructions called a program. The instructions are kept 
in some kind of computer memory. There are four steps that nearly all CPUs use in their 
operation: fetch, decode, execute, and write back. 
Fetch 
The first step, fetch, involves retrieving an instruction (which is represented by a number 
or sequence of numbers) from program memory. The instruction's location (address) in 
program memory is determined by a program counter (PC), which stores a number that 
identifies the address of the next instruction to be fetched. After an instruction is fetched, 
the PC is incremented by the length of the instruction in terms of memory units so that it 
will contain the address of the next instruction in the sequence.[c] Often, the instruction to 
be fetched must be retrieved from relatively slow memory, causing the CPU to stall while 
waiting for the instruction to be returned. This issue is largely addressed in modern 
processors by caches and pipeline architectures (see below). 
Decode 
The instruction that the CPU fetches from memory is used to determine what the CPU is 
to do. In the decode step, the instruction is broken up into parts that have significance to 
other portions of the CPU. The way in which the numerical instruction value is 
interpreted is defined by the CPU's instruction set architecture (ISA).[d] Often, one group 
of numbers in the instruction, called the op code, indicates which operation to perform. 
The remaining parts of the number usually provide information required for that 
instruction, such as operands for an addition operation. Such operands may be given as 
a constant value (called an immediate value), or as a place to locate a value: 
a register or a memory address, as determined by some addressing mode. In older 
designs the portions of the CPU responsible for instruction decoding were unchangeable 
hardware devices. However, in more abstract and complicated CPUs and ISAs, a micro 
program is often used to assist in translating instructions into various configuration 
signals for the CPU. This micro program is sometimes rewritable so that it can be 
modified to change the way the CPU decodes instructions even after it has been 
manufactured.
Execute 
After the fetch and decode steps, the execute step is performed. During this step, 
various portions of the CPU are connected so they can perform the desired operation. If, 
for instance, an addition operation was requested, the arithmetic logic unit (ALU) will be 
connected to a set of inputs and a set of outputs. The inputs provide the numbers to be 
added, and the outputs will contain the final sum. The ALU contains the circuitry to 
perform simple arithmetic and logical operations on the inputs (like addition and bitwise 
operations). If the addition operation produces a result too large for the CPU to handle, 
an arithmetic overflow flag in a flags register may also be set. 
The final step, write back, simply "writes back" the results of the execute step to some 
form of memory. Very often the results are written to some internal CPU register for 
quick access by subsequent instructions. In other cases results may be written to slower, 
but cheaper and larger, main memory. Some types of instructions manipulate the 
program counter rather than directly produce result data. These are generally called 
"jumps" and facilitate behavior like loops, conditional program execution (through the 
use of a conditional jump), and functions in programs. Many instructions will also change 
the state of digits in a "flags" register. These flags can be used to influence how a 
program behaves, since they often indicate the outcome of various operations. For 
example, one type of "compare" instruction considers two values and sets a number in 
the flags register according to which one is greater. This flag could then be used by a 
later jump instruction to determine program flow. 
After the execution of the instruction and write back of the resulting data, the entire 
process repeats, with the next instruction cycle normally fetching the next-in-sequence 
instruction because of the incremented value in the program counter. If the completed 
instruction was a jump, the program counter will be modified to contain the address of 
the instruction that was jumped to, and program execution continues normally. In more 
complex CPUs than the one described here, multiple instructions can be fetched, 
decoded, and executed simultaneously. This section describes what is generally referred 
to as the "classic RISC pipeline", which in fact is quite common among the simple CPUs 
used in many electronic devices (often called microcontroller). It largely ignores the 
important role of CPU cache, and therefore the access stage of the pipeline. 
PERFORMANCE 
The performance or speed of a processor depends on, among many other factors, the 
clock rate (generally given in multiples of hertz) and the instructions per clock (IPC), 
which together are the factors for the instructions per second (IPS) that the CPU can 
perform. Many reported IPS values have represented "peak" execution rates on artificial 
instruction sequences with few branches, whereas realistic workloads consist of a mix of 
instructions and applications, some of which take longer to execute than others. The 
performance of the memory hierarchy also greatly affects processor performance, an 
issue barely considered in MIPS calculations. Because of these problems, various 
standardized tests, often called "benchmarks" for this purpose—such as SPE Cint – 
have been developed to attempt to measure the real effective performance in commonly 
used applications. 
Processing performance of computers is increased by using multi-core processors, 
which essentially is plugging two or more individual processors (called cores in this 
sense) into one integrated circuit. Ideally, a dual core processor would be nearly twice 
as powerful as a single core processor. In practice, the performance gain is far smaller, 
only about 50%, due to imperfect software algorithms and implementation. Increasing 
the number of cores in a processor (i.e. dual-core, quad-core, etc.) increases the 
workload that can be handled. This means that the processor can now handle numerous 
asynchronous events, interrupts, etc. which can take a toll on the CPU (Central
Processing Unit) when overwhelmed. These cores can be thought of as different floors 
in a processing plant, with each floor handling a different task. Sometimes, these cores 
will handle the same tasks as cores adjacent to them if a single core is not enough to 
handle the information.

Más contenido relacionado

La actualidad más candente

Choosing the right processor for embedded system design
Choosing the right processor for embedded system designChoosing the right processor for embedded system design
Choosing the right processor for embedded system designPantech ProLabs India Pvt Ltd
 
Ppt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTPpt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTpreetigill309
 
Introduction to Embedded System I : Chapter 2 (3rd portion)
Introduction to Embedded System I : Chapter 2 (3rd portion)Introduction to Embedded System I : Chapter 2 (3rd portion)
Introduction to Embedded System I : Chapter 2 (3rd portion)Moe Moe Myint
 
Project Report on Embedded Systems
Project Report on Embedded Systems Project Report on Embedded Systems
Project Report on Embedded Systems Suhani Singh
 
Embedded systems - UNIT-1 - Mtech
Embedded systems - UNIT-1 - MtechEmbedded systems - UNIT-1 - Mtech
Embedded systems - UNIT-1 - Mtechsangeetha rakhi
 
Smartphone fpga based balloon payload using cots components
Smartphone fpga based balloon payload using cots componentsSmartphone fpga based balloon payload using cots components
Smartphone fpga based balloon payload using cots componentseSAT Journals
 
Thesis on Solar Based Hecto Robotic Vehicle
Thesis on Solar Based Hecto Robotic VehicleThesis on Solar Based Hecto Robotic Vehicle
Thesis on Solar Based Hecto Robotic VehicleAkhil Reddy Rondla
 
Rotary RFID Parking Management Solution Controlled By Microcontroller
Rotary RFID Parking Management Solution Controlled By MicrocontrollerRotary RFID Parking Management Solution Controlled By Microcontroller
Rotary RFID Parking Management Solution Controlled By MicrocontrollerIJSRD
 
Real Time Atomization of agriculture system for the modernization of indian a...
Real Time Atomization of agriculture system for the modernization of indian a...Real Time Atomization of agriculture system for the modernization of indian a...
Real Time Atomization of agriculture system for the modernization of indian a...SHAMEER C M
 
Introduction to Embedded Systems I : Chapter 1
Introduction to Embedded Systems I : Chapter 1Introduction to Embedded Systems I : Chapter 1
Introduction to Embedded Systems I : Chapter 1Moe Moe Myint
 
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...sipij
 
Training Report on embedded Systems and Robotics
Training Report on embedded  Systems and RoboticsTraining Report on embedded  Systems and Robotics
Training Report on embedded Systems and RoboticsNIT Raipur
 
Design & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOTDesign & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOTIRJET Journal
 
Introduction to embedded systems
Introduction to embedded systemsIntroduction to embedded systems
Introduction to embedded systemsApurva Zope
 

La actualidad más candente (20)

Choosing the right processor for embedded system design
Choosing the right processor for embedded system designChoosing the right processor for embedded system design
Choosing the right processor for embedded system design
 
Embedded system ppt
Embedded system pptEmbedded system ppt
Embedded system ppt
 
Ppt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOTPpt on six month training on embedded system & IOT
Ppt on six month training on embedded system & IOT
 
Introduction to Embedded System I : Chapter 2 (3rd portion)
Introduction to Embedded System I : Chapter 2 (3rd portion)Introduction to Embedded System I : Chapter 2 (3rd portion)
Introduction to Embedded System I : Chapter 2 (3rd portion)
 
Project Report on Embedded Systems
Project Report on Embedded Systems Project Report on Embedded Systems
Project Report on Embedded Systems
 
Embedded systems - UNIT-1 - Mtech
Embedded systems - UNIT-1 - MtechEmbedded systems - UNIT-1 - Mtech
Embedded systems - UNIT-1 - Mtech
 
Smartphone fpga based balloon payload using cots components
Smartphone fpga based balloon payload using cots componentsSmartphone fpga based balloon payload using cots components
Smartphone fpga based balloon payload using cots components
 
Thesis on Solar Based Hecto Robotic Vehicle
Thesis on Solar Based Hecto Robotic VehicleThesis on Solar Based Hecto Robotic Vehicle
Thesis on Solar Based Hecto Robotic Vehicle
 
Rotary RFID Parking Management Solution Controlled By Microcontroller
Rotary RFID Parking Management Solution Controlled By MicrocontrollerRotary RFID Parking Management Solution Controlled By Microcontroller
Rotary RFID Parking Management Solution Controlled By Microcontroller
 
Embedded System
Embedded SystemEmbedded System
Embedded System
 
Real Time Atomization of agriculture system for the modernization of indian a...
Real Time Atomization of agriculture system for the modernization of indian a...Real Time Atomization of agriculture system for the modernization of indian a...
Real Time Atomization of agriculture system for the modernization of indian a...
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Introduction to Embedded Systems I : Chapter 1
Introduction to Embedded Systems I : Chapter 1Introduction to Embedded Systems I : Chapter 1
Introduction to Embedded Systems I : Chapter 1
 
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
Design of Embedded Control System Using Super- Scalar ARM Cortex-A8 for Nano-...
 
Blue eyes
Blue eyesBlue eyes
Blue eyes
 
Training Report on embedded Systems and Robotics
Training Report on embedded  Systems and RoboticsTraining Report on embedded  Systems and Robotics
Training Report on embedded Systems and Robotics
 
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
 
Design & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOTDesign & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOT
 
Introduction to embedded systems
Introduction to embedded systemsIntroduction to embedded systems
Introduction to embedded systems
 
A42060105
A42060105A42060105
A42060105
 

Destacado

Casp report of shiva
Casp report of shivaCasp report of shiva
Casp report of shivachethanshiva
 
Mmm lab manual_10_mel47b
Mmm lab manual_10_mel47bMmm lab manual_10_mel47b
Mmm lab manual_10_mel47bDr B Sudarshan
 
A dialogue between two friends
A dialogue between two friendsA dialogue between two friends
A dialogue between two friendsDuvan Castro
 
2007 drilling drlg sym - optimizing bit performance
2007 drilling   drlg sym - optimizing bit performance2007 drilling   drlg sym - optimizing bit performance
2007 drilling drlg sym - optimizing bit performancefrancoiskdevos
 
CNC Turning and Milling centres
CNC Turning and Milling centresCNC Turning and Milling centres
CNC Turning and Milling centresAchyuth Padmanabh
 

Destacado (8)

Casp report of shiva
Casp report of shivaCasp report of shiva
Casp report of shiva
 
Karnataka State Report - December 2016
Karnataka State Report - December 2016Karnataka State Report - December 2016
Karnataka State Report - December 2016
 
Mmm lab manual_10_mel47b
Mmm lab manual_10_mel47bMmm lab manual_10_mel47b
Mmm lab manual_10_mel47b
 
A dialogue between two friends
A dialogue between two friendsA dialogue between two friends
A dialogue between two friends
 
First Synopsis Format
First Synopsis FormatFirst Synopsis Format
First Synopsis Format
 
2007 drilling drlg sym - optimizing bit performance
2007 drilling   drlg sym - optimizing bit performance2007 drilling   drlg sym - optimizing bit performance
2007 drilling drlg sym - optimizing bit performance
 
CNC Turning and Milling centres
CNC Turning and Milling centresCNC Turning and Milling centres
CNC Turning and Milling centres
 
Mechanical Engineering
Mechanical EngineeringMechanical Engineering
Mechanical Engineering
 

Similar a Casp report

A micro chip could be a pc processor which contains the functions of.pdf
A micro chip could be a pc processor which contains the functions of.pdfA micro chip could be a pc processor which contains the functions of.pdf
A micro chip could be a pc processor which contains the functions of.pdfangelnxcom
 
Project report on embedded system using 8051 microcontroller
Project  report on embedded system using 8051 microcontrollerProject  report on embedded system using 8051 microcontroller
Project report on embedded system using 8051 microcontrollerVandna Sambyal
 
Lecture notes on microprocessor and microcomputer
Lecture notes on microprocessor and microcomputerLecture notes on microprocessor and microcomputer
Lecture notes on microprocessor and microcomputerEkeedaPvtLtd
 
Wireless energy meter monitoring with automated tariff calculation
Wireless energy meter monitoring with automated tariff calculationWireless energy meter monitoring with automated tariff calculation
Wireless energy meter monitoring with automated tariff calculationUdayalakshmi JK
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)Jonah McLeod
 
Emb Sys Rev Ver1
Emb Sys   Rev Ver1Emb Sys   Rev Ver1
Emb Sys Rev Ver1ncct
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsROHIT89352
 
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...IRJET Journal
 
MergeResult_2023_04_02_05_26_56.pptx
MergeResult_2023_04_02_05_26_56.pptxMergeResult_2023_04_02_05_26_56.pptx
MergeResult_2023_04_02_05_26_56.pptxbhaveshagrawal35
 
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Cemal Ardil
 
Real time atomization of agriculture system for the modernization of indian a...
Real time atomization of agriculture system for the modernization of indian a...Real time atomization of agriculture system for the modernization of indian a...
Real time atomization of agriculture system for the modernization of indian a...SHAMEER C M
 
ETHERNET PACKET PROCESSOR FOR SOC APPLICATION
ETHERNET PACKET PROCESSOR FOR SOC APPLICATIONETHERNET PACKET PROCESSOR FOR SOC APPLICATION
ETHERNET PACKET PROCESSOR FOR SOC APPLICATIONcscpconf
 
Introductiontomsp430 180105110420
Introductiontomsp430 180105110420Introductiontomsp430 180105110420
Introductiontomsp430 180105110420DrRenumadhavi
 
Introduction to msp430
Introduction to msp430Introduction to msp430
Introduction to msp430Harsha herle
 
microprocessor
microprocessormicroprocessor
microprocessorillpa
 

Similar a Casp report (20)

A micro chip could be a pc processor which contains the functions of.pdf
A micro chip could be a pc processor which contains the functions of.pdfA micro chip could be a pc processor which contains the functions of.pdf
A micro chip could be a pc processor which contains the functions of.pdf
 
Project report on embedded system using 8051 microcontroller
Project  report on embedded system using 8051 microcontrollerProject  report on embedded system using 8051 microcontroller
Project report on embedded system using 8051 microcontroller
 
Lecture notes on microprocessor and microcomputer
Lecture notes on microprocessor and microcomputerLecture notes on microprocessor and microcomputer
Lecture notes on microprocessor and microcomputer
 
Wireless energy meter monitoring with automated tariff calculation
Wireless energy meter monitoring with automated tariff calculationWireless energy meter monitoring with automated tariff calculation
Wireless energy meter monitoring with automated tariff calculation
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
 
Emb Sys Rev Ver1
Emb Sys   Rev Ver1Emb Sys   Rev Ver1
Emb Sys Rev Ver1
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
 
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...
IRJET-Design of ARM Based Data Acquisition and Control System for Engine Asse...
 
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGSA STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
 
MergeResult_2023_04_02_05_26_56.pptx
MergeResult_2023_04_02_05_26_56.pptxMergeResult_2023_04_02_05_26_56.pptx
MergeResult_2023_04_02_05_26_56.pptx
 
GOWTHAM REPORT
GOWTHAM REPORTGOWTHAM REPORT
GOWTHAM REPORT
 
embedded systems
embedded systemsembedded systems
embedded systems
 
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
 
Real time atomization of agriculture system for the modernization of indian a...
Real time atomization of agriculture system for the modernization of indian a...Real time atomization of agriculture system for the modernization of indian a...
Real time atomization of agriculture system for the modernization of indian a...
 
ETHERNET PACKET PROCESSOR FOR SOC APPLICATION
ETHERNET PACKET PROCESSOR FOR SOC APPLICATIONETHERNET PACKET PROCESSOR FOR SOC APPLICATION
ETHERNET PACKET PROCESSOR FOR SOC APPLICATION
 
Lecture 1 m&ca
Lecture 1 m&caLecture 1 m&ca
Lecture 1 m&ca
 
Introductiontomsp430 180105110420
Introductiontomsp430 180105110420Introductiontomsp430 180105110420
Introductiontomsp430 180105110420
 
Introduction to msp430
Introduction to msp430Introduction to msp430
Introduction to msp430
 
microprocessor
microprocessormicroprocessor
microprocessor
 
M&i(lec#01)
M&i(lec#01)M&i(lec#01)
M&i(lec#01)
 

Último

Design Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryDesign Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryWilliamVickery6
 
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
shot list for my tv series two steps back
shot list for my tv series two steps backshot list for my tv series two steps back
shot list for my tv series two steps back17lcow074
 
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,Aginakm1
 
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一z xss
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxmapanig881
 
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一diploma 1
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造kbdhl05e
 
How to Empower the future of UX Design with Gen AI
How to Empower the future of UX Design with Gen AIHow to Empower the future of UX Design with Gen AI
How to Empower the future of UX Design with Gen AIyuj
 
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Nightssuser7cb4ff
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Sitegalleryaagency
 
cda.pptx critical discourse analysis ppt
cda.pptx critical discourse analysis pptcda.pptx critical discourse analysis ppt
cda.pptx critical discourse analysis pptMaryamAfzal41
 
Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.Mookuthi
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case StudySophia Viganò
 
3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdfSwaraliBorhade
 
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCRdollysharma2066
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree ttt fff
 
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdfPassbook project document_april_21__.pdf
Passbook project document_april_21__.pdfvaibhavkanaujia
 

Último (20)

Design Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryDesign Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William Vickery
 
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
办理卡尔顿大学毕业证成绩单|购买加拿大文凭证书
 
shot list for my tv series two steps back
shot list for my tv series two steps backshot list for my tv series two steps back
shot list for my tv series two steps back
 
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,
'CASE STUDY OF INDIRA PARYAVARAN BHAVAN DELHI ,
 
Call Girls in Pratap Nagar, 9953056974 Escort Service
Call Girls in Pratap Nagar,  9953056974 Escort ServiceCall Girls in Pratap Nagar,  9953056974 Escort Service
Call Girls in Pratap Nagar, 9953056974 Escort Service
 
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
 
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
办理(USYD毕业证书)澳洲悉尼大学毕业证成绩单原版一比一
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
 
How to Empower the future of UX Design with Gen AI
How to Empower the future of UX Design with Gen AIHow to Empower the future of UX Design with Gen AI
How to Empower the future of UX Design with Gen AI
 
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国亚利桑那大学毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full NightCall Girls Aslali 7397865700 Ridhima Hire Me Full Night
Call Girls Aslali 7397865700 Ridhima Hire Me Full Night
 
How to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our SiteHow to Be Famous in your Field just visit our Site
How to Be Famous in your Field just visit our Site
 
cda.pptx critical discourse analysis ppt
cda.pptx critical discourse analysis pptcda.pptx critical discourse analysis ppt
cda.pptx critical discourse analysis ppt
 
Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.Mookuthi is an artisanal nose ornament brand based in Madras.
Mookuthi is an artisanal nose ornament brand based in Madras.
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case Study
 
3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf3D Printing And Designing Final Report.pdf
3D Printing And Designing Final Report.pdf
 
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
8377877756 Full Enjoy @24/7 Call Girls in Nirman Vihar Delhi NCR
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdfPassbook project document_april_21__.pdf
Passbook project document_april_21__.pdf
 

Casp report

  • 1. DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY (POLYTECHNIC) BANGLORE Affilliated to A.I.C.T.E NEW D ELHI COMMUNICATION AND ANALYSIS SKILL D EVELOPMENT PROAMGRME REPORT ON “MICROPROCESSOR ” Submitted for the partial fulfillment of the requirement for the award of 5th sem diploma in electr onics and communication By Amiyodyuti ganguly (305EC10009) D EPARTMENT OF ELECTRONICS AND COMMUNICATION
  • 2. DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY SHAVIGE MALLESHWARA HILLS, KUMARASWAMY LAYOUT, BANSHANKARI BANGALORE-560078 DEPARTMENT OF ELECTRONICS AND COMMUNICATION CERTIFICATE This is to certify that the CASP seminar entitled “MICROPROCESSOR” Has been completed successfully Submitted By Amiyodyuti ganguly (305EC10009) It is by the confinement Board of Technical Education GOVERNMENT OF KARNATAKA This a confide work carried out by him in DAYANANDA SAGAR INSTITUTE OF TECHNOLOGY During the year 2014-2015 under the guidance of Mr s.Muyeen r eshma GUIDE: PRINCIPAL:
  • 3. ACKNOWLEDGEMENT This would be incomplete if I don’t take the opportunity to sincerely thank all the people before the successful completion of the project. We gratefully thank our principal Prof. PREM KUMAR of Dayananda Sagar Institute of Technology for providing this opportunity to work on this project. They have been the constant source inspiration for us throughout this project. We heartily thank them for their encouragement and support. We would like thank our internal guide Mrs. MUYEEN RESHMA for her valuable guidance and advice that has enabled us for the successful completion for this project. We would like to thank for Head of the Department and all other staff members for their moral support.
  • 4. CONTENTS 1. INTRODUCTION 2. STRUCTURE 3. SPECIAL-PURPOSE DESIGN 4. EMBEDDED APPLICATI0NS 5. HISTORY 6. CADC 7. TMS 1000 8. INTEL 4004 9. PICO/GENERAL INSTRUMENT 10. FOUR-PHASE SYSTEMS AL 1 11. 8-BIT DESIGNS 12. 12-BIT DESIGNS 13. 16-BIT DESIGNS 14. 32-BIT DESIGNS 15. 64-BIT DESIGNS IN PC 16. MULTI-CORE DESIGNS 17. RISC 18. MICROARCHITECTURE 19. ARITHMETIC LOGIC UNIT 20. CENTRAL PROCESSING UNIT
  • 5. ABSTRACT Microprocessor Intel commercial microprocessor 4004, the first A microprocessor incorporates the functions of a computer's central processing unit (CPU) on a single integrated circuit (IC), or at most a few integrated circuits. All modern CPUs are microprocessors making the micro- prefix redundant. The microprocessor is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and provides results as output. It is an example of sequential digital logic, as it has internal memory. Microprocessors operate on numbers and symbols represented in the binary numeral system. The integration of a whole CPU onto a single chip or on a few chips greatly reduced the cost of processing power. The integrated circuit processor was produced in large numbers by highly automated processes, so unit cost was low. Single-chip processors increase reliability as there are many fewer electrical connections to fail. As microprocessor designs get faster, the cost of manufacturing a chip (with smaller components built on a semiconductor chip the same size) generally stays the same. Before microprocessors, small computers had been implemented using racks of circuit boards with many medium- and small-scale integrated circuits. Microprocessors integrated this into one or a few large-scale ICs. Continued increases in microprocessor capacity have since rendered other forms of computers almost completely obsolete with one or more microprocessors used in everything from the smallest embedded systems and handheld devices to the largest mainframes and supercomputers.
  • 6. STRUCTURE A block diagram of the internal architecture of the Z80 microprocessor, showing the arithmetic and logic section, register file, control logic section, and buffers to external address and data lines The internal arrangement of a microprocessor varies depending on the age of the design and the intended purposes of the processor. The complexity of an integrated circuit is bounded by physical limitations of the number of transistors that can be put onto one chip, the number of package terminations that can connect the processor to other parts of the system, the number of interconnections it is possible to make on the chip, and the heat that the chip can dissipate. Advancing technology makes more complex and powerful chips feasible to manufacture. A minimal hypothetical microprocessor might only include an arithmetic logic unit (ALU) and a control logic section. The ALU performs operations such as addition, subtraction, and operations such as AND or each operation of the ALU sets one or more flags in a status register, which indicate the results of the last operation (zero value, negative number, overflow, or others). The control logic section retrieves instruction operation codes from memory, and initiates whatever sequence of operations of the ALU requires carrying out the instruction. A single operation code might affect many individual data paths, registers, and other elements of the processor. As integrated circuit technology advanced, it was feasible to manufacture more and more complex processors on a single chip. The size of data objects became larger; allowing more transistors on a chip allowed word sizes to increase from 4 and 8-bit words up to today’s 64- bit words. Additional features were added to the processor architecture; more on-chip registers sped up programs, and complex instructions could be used to make more compact programs. Floating-point arithmetic, for example, was often not available on 8-bit microprocessors, but had to be carried out in software. Integration of the floating point unit first as a separate integrated circuit and then as part of the same microprocessor chip, sped up floating point calculations. Occasionally, physical limitations of integrated circuits made such practices as a bit slice approach necessary. Instead of processing all of a long word on one integrated circuit, multiple circuits in parallel processed subsets of each data word. While this required extra logic to handle, for example, carry and overflow within each slice, the result was a system that could handle, say, 32-bit words using integrated circuits with a capacity for only four bits each.
  • 7. With the ability to put large numbers of transistors on one chip, it becomes feasible to integrate memory on the same die as the processor. This CPU cache has the advantage of faster access than off-chip memory, and increases the processing speed of the system for many applications. Processor clock frequency has increased more rapidly than external memory speed, except in the recent past, so cache memory is necessary if the processor is not delayed by slower external memory. SPECIAL PURPOSE DESIGN A microprocessor is a general purpose system. Several specialized processing devices have followed from the technology. Microcontrollers integrate a microprocessor with peripheral devices in embedded systems. A digital signal processor (DSP) is specialized for signal processing. Graphics processing units may have no, limited, or general programming facilities. For example, GPUs through the 1990s were mostly non-programmable and have only recently gained limited facilities like programmable vertex shades. 32-bit processors have more digital logic than narrower processors, so 32-bit (and wider) processors produce more digital noise and have higher static consumption than narrower processors. So 8-bit or 16-bit processors are better than 32-bit processors for system on a chip and microcontrollers that require extremely low-power electronics, or are part of admixed with noise-sensitive on-chip analog electronics such as high-resolution analog to digital converters, or both. When manufactured on a similar process, 8-bit micros use less power when operating and less power when sleeping than 32-bit micros. However, some people say a 32-bit micro may use less average power than an 8-bit micro, when the application requires certain operations, such as floating-point math, that take many more clock cycles on an 8-bit micro than a 32-bit micro, and so the 8-bit micro spends more time in high-power operating mode. EMBEDDED APPLICATION Thousands of items that were traditionally not computer-related include microprocessors. These include large and small household appliances, cars (and their accessory equipment units), car keys, tools and test instruments, toys, light switches/dimmers and electrical circuit breakers, smoke alarms, battery packs, and hi-fi audio/visual components (from DVD players to phonograph turntables). Such products as cellular telephones, DVD video system and HDTV broadcast systems fundamentally require consumer devices with powerful, low-cost, microprocessors. Increasingly stringent pollution control standards effectively require automobile manufacturers to use microprocessor engine management systems, to allow optimal control of emissions over widely varying operating conditions of an automobile. Non-programmable controls would require complex, bulky, or costly implementation to achieve the results possible with a microprocessor. A microprocessor control program (embedded software) can be easily tailored to different needs of a product line, allowing upgrades in performance with minimal redesign of the
  • 8. product. Different features can be implemented in different models of a product line at negligible production cost. Microprocessor control of a system can provide control strategies that would be impractical to implement using electromechanical controls or purpose-built electronic controls. For example, an engine control system in an automobile can adjust ignition timing based on engine speed, load on the engine, ambient temperature, and any observed tendency for knocking—allowing an automobile to operate on a range of fuel grades. HISTORY The advent of low-cost computers on integrated circuits has transformed modern society. General-purpose microprocessors in personal computers are used for computation, text editing, multimedia display, and communication over the Internet. Many more microprocessors are part of embedded systems, providing digital control over myriad objects from appliances to automobiles to cellular phones and industrial process control. The first use of the term "microprocessor" is attributed to via tron Computer Systems describing the custom integrated circuit used in their System 21 small computer system announced in 1968. Intel introduced its first 4-bit microprocessor 4004 in 1971 and its 8-bit microprocessor 8008 in 1972. During the 1960s, computer processors were constructed out of small and medium-scale ICs—each containing from tens of transistors to a few hundred. These were placed and soldered onto printed circuit boards, and often multiple boards were interconnected in a chassis. The large number of discrete logic gates used more electrical power—and therefore produced more heat— than a more integrated design with fewer ICs. The distance that signals had to travel between ICs on the boards limited a computer's operating speed. In the NASA Apollo space missions to the moon in the 1960s and 1970s, all onboard computations for primary guidance, navigation and control were provided by a small custom processor called "The Apollo Guidance Computer". It used wire wrap circuit boards whose only logic elements were three-input NOR gates. The first microprocessors emerged in the early 1970s and were used for electronic calculators, using binary-coded decimal (BCD) arithmetic on 4-bit words. Other embedded uses of 4-bit and 8-bit microprocessors, such as terminals, printers, various kinds of automation etc., followed soon after. Affordable 8-bit microprocessors with 16-bit addressing also led to the first general-purpose microcomputers from the mid-1970s on. Since the early 1970s, the increase in capacity of microprocessors has followed Moore's law; this originally suggested that the number of components that can be fitted onto a chip doubles every year. With present technology, it is actually every two years, and as such Moore later changed the period to two years.
  • 9. CADC In 1968, Garrett Air Research (which employed designers Ray Holt and Steve Geller) was invited to produce a digital computer to compete with electromechanical systems then under development for the main flight control computer in the US Navy's new F-14 Tomcat fighter. The design was complete by 1970, and used a MOS-based chipset as the core CPU. The design was significantly (approximately 20 times) smaller and much more reliable than the mechanical systems it competed against, and was used in all of the early Tomcat models. This system contained "a 20-bit, pipelined, parallel multi-microprocessor". The Navy refused to allow publication of the design until 1997. For this reason the CADC, and theMP944 chipset it used, are fairly unknown. Ray graduated California Polytechnic University in 1968, and began his computer design career with the CADC. From its inception, it was shrouded in secrecy until 1998 when at Holt's request, the US Navy allowed the documents into the public domain. Since then people have debated whether this was the first microprocessor. Holt has stated that no one has compared this microprocessor with those that came later. According to Parab et al. (2007), "The scientific papers and literature published around 1971 reveal that the MP944 digital processor used for the F-14 Tomcat aircraft of the US Navy qualifies as the first microprocessor. Although interesting, it was not a single-chip processor, as was not the Intel 4004 – they both were more like a set of parallel building blocks you could use to make a general-purpose form. It contains a CPU, RAM, ROM, and two other support chips like the Intel 4004. It was made from the same P-channel technology, operated at military specifications and had larger chips -- an excellent computer engineering design by any standards. Its design indicates a major advance over Intel, and two year earlier. It actually worked and was flying in the F-14 when the Intel 4004 was announced. It indicates that today’s industry theme of converging DSP-microcontroller architectures was started in 1971. This convergence of DSP and microcontroller architectures is known as a digital signal controller. TMS 1000 The Smithsonian Institution says TI engineers Gary Boone and Michael Cochran succeeded in creating the first microcontroller (also called a microcomputer) and the first single-chip CPU in 1971. The result of their work was the TMS 1000, which went commercial in 1974 TI stressed the 4-bit TMS 1000 for use in pre-programmed embedded applications, introducing a version called the TMS1802NC on September 17, 1971 that implemented a calculator on a chip. TI filed for a patent on the microprocessor. Gary Boone was awarded U.S. Patent 3,757,306 for the single-chip microprocessor architecture on September 4, 1973. In 1971 and again in 1976, Intel and TI entered into broad patent cross-licensing agreements, with Intel paying royalties to TI for the microprocessor patent. A history of these events is contained in court documentation from a legal dispute between Cyrix and Intel, with TI as intervener and owner of the microprocessor patent.
  • 10. A computer-on-a-chip combines the microprocessor core (CPU), memory, and I/O (input/output) lines onto one chip. The computer-on-a-chip patent, called the "microcomputer patent" at the time, U.S. Patent 4,074,351, was awarded to Gary Boone and Michael J. Cochran of TI. Aside from this patent, the standard meaning of microcomputer is a computer using one or more microprocessors as its CPU(s), while the concept defined in the patent is more akin to a microcontroller. INTEL 4004 The 4004 with cover removed (left) and as actually used (right) The Intel 4004 is generally regarded as the first commercially available microprocessor, and cost $60. The first known advertisement for the 4004 is dated November 15, 1971 and appeared in Electronic News. The project that produced the 4004 originated in 1969, when Busicom, a Japanese calculator manufacturer, asked Intel to build a chipset for high-performance desktop calculators. Bascom’s original design called for a programmable chip set consisting of seven different chips. Three of the chips were to make a special-purpose CPU with its program stored in ROM and its data stored in shift register read-write memory. Ted Hoff, the Intel engineer assigned to evaluate the project, believed the Busicom design could be simplified by using dynamic RAM storage for data, rather than shift register memory, and a more traditional general-purpose CPU architecture. Hoff came up with a four–chip architectural proposal: a ROM chip for storing the programs, a dynamic RAM chip for storing data, a simple I/O device and a 4-bit central processing unit (CPU). Although not a chip designer, he felt the CPU could be integrated into a single chip, but as he lacked the technical know-how the idea remained just a wish for the time being. While the architecture and specifications of the MCS-4 came from the interaction of Hoff with Stanley Mazor, a software engineer reporting to him, and with Busicom engineer Masatoshi Shima, during 1969, Mazor and Hoff moved on to other projects. In April 1970, Intel hired Italian-born engineer Federico Faggin as project leader, a move that ultimately made the single-chip CPU final design a reality (Shima meanwhile designed the Busicom calculator firmware and assisted Faggin during the first six months of the implementation). Faggin, who originally developed the silicon gate technology (SGT) in 1968 at Fairchild Semiconductor and designed the world’s first commercial integrated circuit using SGT, the Fairchild 3708, had the correct background to lead the project into what would become the first commercial general purpose microprocessor. Since SGT was his very own invention, it in addition to his new methodology for random logic design made it possible to implement a single-chip CPU with the proper speed, power dissipation and cost. The manager of Intel's MOS Design Department was Leslie L. Vadász at the time of the MCS-4 development, but Vadasz's attention was completely focused on the mainstream business of semiconductor memories and he left the leadership and the management of the MCS-4 project to Faggin,
  • 11. who was ultimately responsible for leading the 4004 project to its realization. Production units of the 4004 were first delivered to Busicom in March 1971 and shipped to other customers in late 1971. PICO/GENERAL INSTRUMENT The PICO1/GI250 chip introduced in 1971. This was designed by Pico Electronics (Glenrothes, Scotland) and manufactured by General Instrument of Hicksville NY. In 1971 Pico Electronics and General Instrument (GI) introduced their first collaboration in ICs, a complete single chip calculator IC for the Monroe/Litton Royal Digital III calculator. This chip could also arguably lay claim to be one of the first microprocessors or microcontrollers having ROM, RAM and a RISC instruction set on-chip. The layout for the four layers of the PMOS process was hand drawn at x500 scale on Mylar film, a significant task at the time given the complexity of the chip. Pico was a spinout by five GI design engineers whose vision was to create single chip calculator ICs. They had significant previous design experience on multiple calculator chipsets with both GI and Marconi-Elliott. The key team members had originally been tasked by Elliott Automation to create an 8-bit computer in MOS and had helped establish a MOS Research Laboratory in Glenrothes, Scotland in 1967. Calculators were becoming the largest single market for semiconductors and Pico and GI went on to have significant success in this burgeoning market. GI continued to innovate in microprocessors and microcontrollers with products including the CP1600, IOB1680 and PIC1650. In 1987 the GI Microelectronics business was spun out into the Microchip PIC microcontroller business. FOUR-PHASE SYSTEM ALI The Four-Phase Systems AL1 was an 8-bit bit slice chip containing eight registers and an ALU. It was designed by Lee Boysel in 1969. At the time, it formed part of a nine-chip, 24-bit CPU with three AL1s, but it was later called a microprocessor when, in response to 1990s litigation by Texas Instruments, a demonstration system was constructed where a single AL1 formed part of a courtroom demonstration computer system, together with RAM, ROM, and an input-output device.
  • 12. 8-BIT DESIGN The Intel 4004 was followed in 1972 by the Intel 8008, the world's first 8-bit microprocessor. The 8008 was not, however, an extension of the 4004 design, but instead the culmination of a separate design project at Intel, arising from a contract with Computer Terminals Corporation, of San Antonio TX, for a chip for a terminal they were designing, the Data point 2200 — fundamental aspects of the design came not from Intel but from CTC. In 1968, CTC's Vic Poor and Harry Pyle developed the original design for the instruction set and operation of the processor. In 1969, CTC contracted two companies, Intel and Texas Instruments, to make a single-chip implementation, known as the CTC 1201. In late 1970 or early 1971, TI dropped out being unable to make a reliable part. In 1970, with Intel yet to deliver the part, CTC opted to use their own implementation in the Data point 2200; using traditional TTL logic instead (thus the first machine to run “8008 code” was not in fact a microprocessor at all and was delivered a year earlier). Intel's version of the 1201 microprocessor arrived in late 1971, but was too late, slow, and required a number of additional support chips. CTC had no interest in using it. CTC had originally contracted Intel for the chip, and would have owed them $50,000 for their design work.[32] To avoid paying for a chip they did not want (and could not use), CTC released Intel from their contract and allowed them free use of the design. Intel marketed it as the 8008 in April, 1972, as the world's first 8-bit microprocessor. It was the basis for the famous "Mark-8" computer kit advertised in the magazine Radio-Electronics in 1974. The 8008 was the precursor to the very successful Intel 8080 (1974), which offered much improved performance over the 8008 and required fewer support chips, Zilog Z80 (1976), and derivative Intel 8-bit processors. The competing Motorola 6800 was released August 1974 and the similar MOS Technology 6502 in 1975 (both designed largely by the same people). The 6502 family rivaled the Z80 in popularity during the 1980s. A low overall cost, small packaging, simple computer bus requirements, and sometimes the integration of extra circuitry (e.g. the Z80's built-in memory refresh circuitry) allowed the home computer "revolution" to accelerate sharply in the early 1980s. This delivered such inexpensive machines as the Sinclair ZX-81, which sold for US$99. A variation of the 6502, the MOS Technology 6510 was used in the Commodore 64 and yet another variant, the 8502, powered the Commodore 128. The Western Design Center, Inc. (WDC) introduced the CMOS 65C02 in 1982 and licensed the design to several firms. It was used as the CPU in the Apple IIe and IIc personal computers as well as in medical implantable grade pacemakers and defibrillators, automotive, industrial and consumer devices. WDC pioneered the licensing of microprocessor designs, later followed by ARM (32-bit) and other microprocessor intellectual property (IP) providers in the 1990s. Motorola introduced the MC6809 in 1978, an ambitious and well thought-through 8-bit design which was source compatible with the 6800 and was implemented using purely hard-wired logic. (Subsequent 16-bit microprocessors typically used microcode to some extent, as CISC design requirements were getting too complex for pure hard-wired logic.)
  • 13. Another early 8-bit microprocessor was the Signe tics 2650, which enjoyed a brief surge of interest due to its innovative and powerful instruction set architecture. A seminal microprocessor in the world of spaceflight was RCA's RCA 1802 (aka CDP1802, RCA COSMAC) (introduced in 1976), which was used on board the Galileo probe to Jupiter (launched 1989, arrived 1995). RCA COSMAC was the first to implement CMOS technology. The CDP1802 was used because it could be run at very low power, and because a variant was available fabricated using a special production process, silicon on sapphire (SOS), which provided much better protection against cosmic radiation and electrostatic than that of any other processor of the era. Thus, the SOS version of the 1802 was said to be the first radiation-hardened microprocessor. The RCA 1802 had what is called a static design, meaning that the clock frequency could be made arbitrarily low, even to 0 Hz, a total stop condition. This let the spacecraft use minimum electric power for long uneventful stretches of a voyage. Timers or sensors would awaken the processor in time for important tasks, such as navigation updates, attitude control, data acquisition, and radio communication. Current versions of the Western Design Center 65C02 and 65C816 have static cores, and thus retain data even when the clock is completely halted. 12-BIT DESIGN The Intersil 6100 family consisted of a 12-bit microprocessor (the 6100) and a range of peripheral support and memory ICs. The microprocessor recognized the DEC PDP- 8minicomputer instruction set. As such it was sometimes referred to as the CMOS-PDP8. Since it was also produced by Harris Corporation, it was also known as the Harris HM-6100. By virtue of its CMOS technology and associated benefits, the 6100 was being incorporated into some military designs until the early 1980s. 16-BIT DESIGN The first multi-chip 16-bit microprocessor was the National Semiconductor IMP-16, introduced in early 1973. An 8-bit version of the chipset was introduced in 1974 as the IMP-8. Other early multi-chip 16-bit microprocessors include one that Digital Equipment Corporation (DEC) used in the LSI-11 OEM board set and the packaged PDP 11/03 minicomputer—and the Fairchild Semiconductor Micro Flame 9440, both introduced in 1975–1976. In 1975, National introduced the first 16-bit single-chip microprocessor, the National Semiconductor PACE, which was later followed by an NMOS version, the INS8900. Another early single-chip 16-bit microprocessor was TI's TMS 9900, which was also compatible with their TI-990 line of minicomputers. The 9900 was used in the TI 990/4 minicomputer, the TI- 99/4A home computer, and the TM990 line of OEM microcomputer boards. The chip was packaged in a large ceramic 64-pin DIP package, while most 8-bit microprocessors such as the Intel 8080 used the more common, smaller, and less expensive plastic 40-pin DIP. A follow-on chip, the TMS 9980, was designed to compete with the Intel 8080, had the full TI 990 16 -bit instruction set, used a plastic 40-pin package, moved data 8 bits at a time, but could only
  • 14. address 16 KB. A third chip, the TMS 9995, was a new design. The family later expanded to include the 99105 and 99110. The Western Design Center (WDC) introduced the CMOS 65816 16-bit upgrade of the WDC CMOS 65C02 in 1984. The 65816 16-bit microprocessor was the core of the Apple IIgsand later the Super Nintendo Entertainment System, making it one of the most popular 16-bit designs of all time. Intel "upsized" their 8080 design into the 16-bit Intel 8086, the first member of the x86 family, which powers most modern PC type computers. Intel introduced the 8086 as a cost-effective way of porting software from the 8080 lines, and succeeded in winning much business on that premise. The 8088, a version of the 8086 that used an 8-bit external data bus, was the microprocessor in the first IBM PC. Intel then released the 80186 and 80188, the 80286 and, in 1985, the 32-bit 80386, cementing their PC market dominance with the processor family's backwards compatibility. The 80186 and 80188 were essentially versions of the 8086 and 8088, enhanced with some onboard peripherals and a few new instructions; they were not used in IBM-compatible PCs because the built-in peripherals and their locations in the memory map were incompatible with the IBM design. The 8086 and successors had an innovative but limited method of memory segmentation, while the 80286 introduced a full-featured segmented memory management unit (MMU). The 80386 introduced a flat 32-bit memory model with paged memory management. The Intel x86 processors up to and including the 80386 do not include floating-point units (FPUs). Intel introduced the 8087, 80287, and 80387 math coprocessors to add hardware floating-point and transcendental function capabilities to the 8086 through 80386 CPUs. The 8087 works with the 8086/8088 and 80186/80188, the 80187 works with the 80186/80188, the 80287 works with the 80286 and 80386, and the 80387 works with the 80386 (yielding better performance than the 8028. The combination of an x86 CPU and an x87 coprocessor forms a single multi -chip microprocessor; the two chips are programmed as a unit using a single integrated instruction set. Though the 8087 coprocessor is interfaced to the CPU through I/O ports in the CPU's address space, this is transparent to the program, which does not need to know about or access these I/O ports directly; the program accesses the coprocessor and its registers through normal instruction op codes. Starting with the successor to the 80386, the 80486, the FPU was integrated with the control unit, MMU, and integer ALU in a pipelined design on a single chip (in the 80486DX version), or the FPU was eliminated entirely (in the 80486SX version). An ostensible coprocessor for the 80486SX, the 80487 was actually a complete 80486DX that disabled and replaced the coprocessor less 80486SX that it was installed to upgrade.
  • 15. 32-BIT DESIGN Upper interconnect layers on an Intel 80486DX2 die 16-bit designs had only been on the market briefly when 32-bit implementations started to appear. The most significant of the 32-bit designs is the Motorola MC68000, introduced in 1979. The 68k, as it was widely known, had 32-bit registers in its programming model but used 16-bit internal data paths, three 16-bit Arithmetic Logic Units, and a 16-bit external data bus (to reduce pin count), and externally supported only 24-bit addresses (internally it worked with full 32 bit addresses). In PC-based IBM-compatible mainframes the MC68000 internal microcode was modified to emulate the 32-bit System/370 IBM mainframe.[36] Motorola generally described it as a 16-bit processor, though it clearly has 32-bit capable architecture. The combination of high performance, large (16 megabytes or 224 bytes) memory space and fairly low cost made it the most popular CPU design of its class. The Apple Lisa and Macintosh designs made use of the 68000, as did a host of other designs in the mid-1980s, including the Atari ST and Commodore Amiga. The world's first single-chip fully 32-bit microprocessor, with 32-bit data paths, 32-bit buses, and 32-bit addresses, was the AT&T Bell LabsBELLMAC-32A, with first samples in 1980, and general production in 1982.[37][38] After the divestiture of AT&T in 1984, it was renamed the WE 32000 (WE for Western Electric), and had two follow-on generations, the WE 32100 and WE 32200. These microprocessors were used in the AT&T 3B5 and 3B15 minicomputers; in the 3B2, the world's first desktop super microcomputer; in the "Companion", the world's first 32-bit laptop computer; and in "Alexander", the world's first book-sized super microcomputer, featuring ROM-pack memory cartridges similar to today's gaming consoles. All these systems ran the UNIX System V operating system. The first commercial, single chip, fully 32-bit microprocessor available on the market was the HP FOCUS. Intel's first 32-bit microprocessor was the IAPX 432, which was introduced in 1981 but was not a commercial success. It had an advanced capability-based oriented architecture, but poor performance compared to contemporary architectures such as Intel's own 80286 (introduced 1982), which was almost four times as fast on typical benchmark tests. However,
  • 16. the results for the iAPX432 was partly due to a rushed and therefore suboptimal Ada compiler. Motorola's success with the 68000 led to the MC68010, which added virtual memory support. The MC68020, introduced in 1984 added full 32-bit data and address buses. The 68020 became hugely popular in the Unix super microcomputer market, and many small companies (e.g., Altos, Charles River Data Systems, Cromemco) produced desktop-size systems. The MC68030 was introduced next, improving upon the previous design by integrating the MMU into the chip. The continued success led to the MC68040, which included an FPU for better math performance. A 68050 failed to achieve its performance goals and was not released, and the follow-up MC68060 was released into a market saturated by much faster RISC designs. The 68k family faded from the desktop in the early 1990s. Other large companies designed the 68020 and follow-ons into embedded equipment. At one point, there were more 68020s in embedded equipment than there were Intel Pentiums in PCs.[39] The Cold Fire processor cores are derivatives of the venerable 68020. During this time (early to mid-1980s), National Semiconductor introduced a very similar 16- bit pin out, 32-bit internal microprocessor called the NS 16032 (later renamed 32016), the full 32-bit version named the NS 32032. Later, National Semiconductor produced the NS 32132, which allowed two CPUs to reside on the same memory bus with built in arbitration. The NS32016/32 outperformed the MC68000/10, but the NS32332—which arrived at approximately the same time as the MC68020—did not have enough performance. The third generation chip, the NS32532, was different. It had about double the performance of the MC68030, which was released around the same time. The appearance of RISC processors like the AM29000 and MC88000 (now both dead) influenced the architecture of the final core, the NS32764. Technically advanced—with a superscalar RISC core, 64-bit bus, and internally overclocked—it could still execute Series 32000 instructions through real-time translation. When National Semiconductor decided to leave the Unix market, the chip was redesigned into the Swordfish Embedded processor with a set of on chip peripherals. The chip turned out to be too expensive for the laser printer market and was killed. The design team went to Intel and there designed the Pentium processor, which is very similar to the NS32764 core internally. The big success of the Series 32000 was in the laser printer market, where the NS32CG16 with microcode Bit instructions had very good price/performance and was adopted by large companies like Canon. By the mid-1980s, Sequent introduced the first SMP server-class computer using the NS 32032. This was one of the design's few wins, and it disappeared in the late 1980s. The MIPS R2000 (1984) and R3000 (1989) were highly successful 32-bit RISC microprocessors. They were used in high-end workstations and servers by SGI, among others. Other designs included the Zilog Z80000, which arrived too late to market to stand a chance and disappeared quickly. The ARM first appeared in 1985. This is a RISC processor design, which has since come to dominate the 32-bit embedded systems processor space due in large part to its power efficiency, its licensing model, and its wide selection of system development tools.
  • 17. Semiconductor manufacturers generally license cores and integrate them into their own system on a chip products; only a few such vendors are licensed to modify the ARM cores. Most cell on phones include an ARM processor, as do a wide variety of other products. There are microcontroller-oriented ARM cores without virtual memory support, as well as symmetric multiprocessor (SMP) applications processors with virtual memory. In the late 1980s, "microprocessor wars" started killing off some of the microprocessors. Apparently, with only one bigger design win, sequent, the NS 32032 just faded out of existence, and Sequent switched to Intel microprocessors. From 1993 to 2003, the 32-bit x86 architectures became increasingly dominant in desktop, laptop, and server markets and these microprocessors became faster and more capable. Intel had licensed early versions of the architecture to other companies, but declined to license the Pentium, so AMD and Cyrix built later versions of the architecture based on their own designs. During this span, these processors increased in complexity (transistor count) and capability (instructions/second) by at least three orders of magnitude. Intel's Pentium line is probably the most famous and recognizable 32-bit processor model, at least with the public at broad. 64-BIT DESIGN IN PC While 64-bit microprocessor designs have been in use in several markets since the early 1990s (including the Nintendo 64 gaming console in 1996), the early 2000s saw the introduction of 64-bit microprocessors targeted at the PC market. With AMD's introduction of a 64-bit architecture backwards-compatible with x86, x86- 64 (also called AMD64), in September 2003, followed by Intel's near fully compatible 64-bit extensions (first called IA-32e or EM64T, later renamed Intel 64), the 64-bit desktop era began. Both versions can run 32-bit legacy applications without any performance penalty as well as new 64-bit software. With operating systems Windows XP x64, Windows Vista x64, Windows 7 x64, Linux, BSD, and Mac OS X that run 64-bit native, the software is also geared to fully utilize the capabilities of such processors. The move to 64 bits is more than just an increase in register size from the IA-32 as it also doubles the number of general-purpose registers. The move to 64 bits by PowerPC processors had been intended since the processors' design in the early 90s and was not a major cause of incompatibility. Existing integer registers are extended as are all related data pathways, but, as was the case with IA-32, both floating point and vector units had been operating at or above 64 bits for several years. Unlike what happened when IA-32 was extended to x86-64, no new general purpose registers were added in 64-bit PowerPC, so any performance gained when using the 64-bit mode for applications making no use of the larger address space is minimal. In 2011, ARM introduced a new 64-bit ARM architecture.
  • 18. MULTI-CORE DESIGNS A different approach to improving a computer's performance is to add extra processors, as in symmetric multiprocessing designs, which have been popular in servers and workstations since the early 1990s. Keeping up with Moore's Law is becoming increasingly challenging as chip-making technologies approach their physical limits. In response, microprocessor manufacturers look for other ways to improve performance so they can maintain the momentum of constant upgrades. A multi-core processor is a single chip that contains more than one microprocessor core. Each core can simultaneously execute processor instructions in parallel. This effectively multiplies the processor's potential performance by the number of cores, if the software is designed to take advantage of more than one processor core. Some components, such as bus interface and cache, may be shared between cores. Because the cores are physically close to each other, they can communicate with each other much faster than separate (off-chip) processors in a multiprocessor system, which improves overall system performance. In 2005, AMD released the first native dual-core processor, the Athlon X2. Intel's Pentium D had beaten the X2 to market by a few weeks, but it used two separate CPU dies and was less efficient than AMD's native design. As of 2012, dual-core and quad-core processors are widely used in home PCs and laptops, while quad, six, eight, ten, twelve, and sixteen-core processors are common in the professional and enterprise markets with workstations and servers. Sun Microsystems has released the Niagara and Niagara 2 chips, both of which feature an eight-core design. The Niagara 2 supports more threads and operates at 1.6 GHz. High-end Intel Xeon processors that are on the LGA 771, LGA1336, and LGA 2011 sockets and high-end AMD Opteron processors that are on the C32 and G34 sockets are DP (dual processor) capable, as well as the older Intel Core 2 Extreme QX9775 also used in an older Mac Pro by Apple and the Intel Skull trail motherboard. AMD's G34 motherboards can support up to four CPUs and Intel's LGA 1567 motherboards can support up to eight CPUs. Modern desktop computers do not support systems with multiple CPUs, but few applications outside of the professional market can make good use of more than four cores. Both Intel and AMD currently offer fast quad- and six-core desktop CPUs, making multi CPU systems obsolete for many purposes. AMD also offers the first and currently the only eight core desktop CPUs with the FX-8xxx line. The desktop market has been in a transition towards quad-core CPUs since Intel's Core 2 Quads were released and now are common, although dual-core CPUs are still more prevalent. Older or mobile computers are less likely to have more than two cores than newer desktops. Not all software is optimized for multi core CPU's, making fewer, more powerful cores preferable. AMD offers CPUs with more cores for a given amount of money than similarly priced Intel CPUs—but the AMD cores are somewhat slower, so the two trade blows in different applications depending on how well-threaded the programs running are.
  • 19. For example, Intel's cheapest Sandy Bridge quad-core CPUs often cost almost twice as much as AMD's cheapest Athlon II, Phenom II, and FX quad-core CPUs but Intel has dual-core CPUs in the same price ranges as AMD's cheaper quad core CPUs. In an application that uses one or two threads, the Intel dual cores outperform AMD's similarly priced quad-core CPUs—and if a program supports three or four threads the cheap AMD quad-core CPUs outperform the similarly priced Intel dual-core CPUs. Historically, AMD and Intel have switched places as the company with the fastest CPU several times. Intel currently leads on the desktop side of the computer CPU market, with their Sandy Bridge and Ivy Bridge series. In servers, AMD's new Opteron seem to have superior performance for their price point. This means that AMD are currently more competitive in low- to mid-end servers and workstations that more effectively use fewer cores and threads. RISC In the mid-1980s to early 1990s, a crop of new high-performance reduced instruction set computer (RISC) microprocessors appeared, influenced by discrete RISC-like CPU designs such as the IBM 801 and others. RISC microprocessors were initially used in special-purpose machines and UNIX workstations, but then gained wide acceptance in other roles. In 1986, HP released its first system with a PA-RISC CPU. The first commercial RISC microprocessor design was released in 1984 by MIPS Computer Systems, the 32- bit R2000(the R1000 was not released). In 1987 in the non-Unix Acorn computers' 32-bit, then cache-less, ARM2-based Acorn Archimedes the first commercial success using the ARM architecture, then known as Acorn RISC Machine (ARM); first silicon ARM1 in 1985. The R3000 made the design truly practical, and the R4000 introduced the world's first commercially available 64-bit RISC microprocessor. Competing projects would result in the IBM POWER and Sun SPARC architectures. Soon every major vendor was releasing a RISC design, including the AT&T CRISP, AMD 29000, Intel i860 and Intel i960, Motorola 88000, DEC Alpha. In the late 1990s, only two 64-bit RISC architectures were still produced in volume for non-embedded applications: SPARC and Power ISA, but as ARM has become increasingly powerful, in the early 2010s, it became the third RISC architecture in the general computing segment. MARKET STATISTICS In 2003, about US$44 billion worth of microprocessors were manufactured and sold. Although about half of that money was spent on CPUs used in desktop or laptop personal computers, those count for only about 2% of all CPUs sold. About 55% of all CPUs sold in the world are 8-bit microcontrollers, over two billion of which were sold in 1997. In 2002, less than 10% of all the CPUs sold in the world were 32-bit or more. Of all the 32-bit CPUs sold, about 2% are used in desktop or laptop personal computers. Most microprocessors
  • 20. are used in embedded control applications such as household appliances, automobiles, and computer peripherals. Taken as a whole, the average price for a microprocessor, microcontroller, or DSP is just over $6. About ten billion CPUs were manufactured in 2008. About 98% of new CPUs produced each year are embedded. MICROARCHITECTURE Intel Core microarchitecture In electronics engineering and computer engineering, microarchitecture (sometimes abbreviated to μarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology. Computer architecture is the combination of microarchitecture and instruction set design. RELATION TO INSTRUCTION SET ARCHITECTURE The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor, address and data formats among other things. The microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA.
  • 21. Single bus organization microarchitecture The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various micro architectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALUs) and even larger elements. These diagrams generally separate the datapath (where data is placed) and the control path (which can be said to steer the data) The person designing a system usually draws the specific microarchitecture as a kind of data flow diagram. Like a block diagram, the microarchitecture diagram shows micro architectural elements such as the arithmetic and logic unit and the register file as a single schematic symbol. Typically the diagram connects those elements with arrows and thick lines and thin lines to distinguish between three-state buses -- which require a three state buffer for each device that drives the bus; unidirectional buses -- always driven by a single source, such as the way the address bus on simpler computers is always driven by the memory address register; and individual control lines. Very simple computers have a single data bus organization -- they have a single three-state bus. The diagram of more complex computers usually shows multiple three-state buses, which help the machine do more operations simultaneously. Each micro architectural element is in turn represented by a schematic describing the interconnections of logic gates used to implement it. Each logic gate is in turn represented by a circuit diagram describing the connections of the transistors used to implement it in some particular logic family. Machines with different microarchitectures may have the same instruction set architecture, and thus be capable of executing the same programs. New microarchitectures and/or circuitry solutions, along with advances in semiconductor manufacturing, are what allows newer generations of processors to achieve higher performance while using the same ISA. In principle, a single microarchitecture could execute several different ISAs with only minor changes to the microcode.
  • 22. ASPECT OF MICROARCHITECTURE Intel 80286 microarchitecture The pipelined data path is the most commonly used data path design in microarchitecture today. This technique is used in most modern microprocessors, microcontrollers, and DSPs. The pipelined architecture allows multiple instructions to overlap in execution, much like an assembly line. The pipeline includes several different stages which are fundamental in microarchitecture designs Some of these stages include instruction fetch, instruction decode, execute, and write back. Some architectures include other stages such as memory access. The design of pipelines is one of the central micro architectural tasks. Execution units are also essential to microarchitecture. Execution units include arithmetic logic units (ALU), floating point units (FPU), load/store units, branch prediction, and SIMD. These units perform the operations or calculations of the processor. The choice of the number of execution units, their latency and throughput is a central micro architectural design task. The size, latency, throughput and connectivity of memories within the system are also micro architectural decisions. System-level design decisions such as whether or not to include peripherals, such as memory controllers, can be considered part of the micro architectural design process. This includes decisions on the performance-level and connectivity of these peripherals. Unlike architectural design, where achieving a specific performance level is the main goal, micro architectural design pays closer attention to other constraints. Since microarchitecture design decisions directly affect what goes into a system, attention must be paid to such issues as:  Chip area/cost  Power consumption  Logic complexity  Ease of connectivity  Manufacturability  Ease of debugging  Testability MICRO ARCHITRUCTURAL CONCEPT Instruction cycle In general, all CPUs, single-chip microprocessors or multi-chip implementations run programs by performing the following steps: 1. Read an instruction and decode it
  • 23. 2. Find any associated data that is needed to process the instruction 3. Process the instruction 4. Write the results out The instruction cycle is repeated continuously until the power is turned off. Increasing execution speed Complicating this simple-looking series of steps is the fact that the memory hierarchy, which includes caching, main memory and non-volatile storage like hard disks (where the program instructions and data reside), has always been slower than the processor itself. Step (2) often introduces a lengthy (in CPU terms) delay while the data arrives over the computer. A considerable amount of research has been put into designs that avoid these delays as much as possible. Over the years, a central goal was to execute more instructions in parallel, thus increasing the effective execution speed of a program. These efforts introduced complicated logic and circuit structures. Initially, these techniques could only be implemented on expensive mainframes or supercomputers due to the amount of circuitry needed for these techniques. As semiconductor manufacturing progressed, more and more of these techniques could be implemented on a single semiconductor chip. See Moore's law. Instruction set choice Instruction sets have shifted over the years, from originally very simple to sometimes very complex (in various respects). In recent years, load-store architectures, VLIW and EPIC types have been in fashion. Architectures that are dealing with data parallelism include SIMD and Vectors. Some labels used to denote classes of CPU architectures are not particularly descriptive, especially so the CISC label; many early designs retroactively denoted "CISC" are in fact significantly simpler than modern RISC processors (in several respects). However, the choice of instruction set architecture may greatly affect the complexity of implementing high performance devices. The prominent strategy, used to develop the first RISC processors, was to simplify instructions to a minimum of individual semantic complexity combined with high encoding regularity and simplicity. Such uniform instructions were easily fetched, decoded and executed in a pipelined fashion and a simple strategy to reduce the number of logic levels in order to reach high operating frequencies; instruction cache-memories compensated for the higher operating frequency and inherently low code density while large register sets were used to factor out as much of the (slow) memory accesses as possible. Instruction pipelining One of the first, and most powerful, techniques to improve performance is the use of the instruction pipeline. Early processor designs would carry out all of the steps above for one instruction before moving onto the next. Large portions of the circuitry were left idle at any one step; for instance, the instruction decoding circuitry would be idle during execution and so on. Pipelines improve performance by allowing a number of instructions to work their way through the processor at the same time. In the same basic example, the processor would start to decode (step 1) a new instruction while the last one was waiting for results. This would allow up to four instructions to be "in flight" at one time, making the processor look four times as fast. Although any one instruction takes just as long to complete (there are still four steps) the CPU as a whole "retires" instructions much faster. RISC make pipelines smaller and much easier to construct by cleanly separating each stage of the instruction process and making them take the same amount of time — one cycle. The processor as a whole operates in an assembly line fashion, with instructions coming in one side and results out the other. Due to the reduced complexity of the Classic RISC pipeline, the pipelined core and an instruction cache could be placed on the same size die that would otherwise fit the core alone on a CISC design. This was the real reason that RISC was faster. Early designs like the SPARC and MIPS often ran over 10 times as fast as Intel and Motorola CISC solutions at the same clock speed and price.
  • 24. Pipelines are by no means limited to RISC designs. By 1986 the top-of-the-line VAX implementation (VAX 8800) was a heavily pipelined design, slightly predating the first commercial MIPS and SPARC designs. Most modern CPUs (even embedded CPUs) are now pipelined, and micro coded CPUs with no pipelining are seen only in the most area-constrained embedded processors. Large CISC machines, from the VAX 8800 to the modern Pentium 4 and Athlon, are implemented with both microcode and pipelines. Improvements in pipelining and caching are the two major micro architectural advances that have enabled processor performance to keep pace with the circuit technology on which they are based. Cache It was not long before improvements in chip manufacturing allowed for even more circuitry to be placed on the die, and designers started looking for ways to use it. One of the most common was to add an ever-increasing amount of cache memory on-die. Cache is simply very fast memory, memory that can be accessed in a few cycles as opposed to many needed to "talk" to main memory. The CPU includes a cache controller which automates reading and writing from the cache, if the data is already in the cache it simply "appears", whereas if it is not the processor is "stalled" while the cache controller reads it in. RISC designs started adding cache in the mid-to-late 1980s, often only 4 KB in total. This number grew over time, and typical CPUs now have at least 512 KB, while more powerful CPUs come with 1 or 2 or even 4, 6, 8 or 12 MB, organized in multiple levels of a memory hierarchy. Generally speaking, more cache means more performance, due to reduced stalling. Caches and pipelines were a perfect match for each other. Previously, it didn't make much sense to build a pipeline that could run faster than the access latency of off-chip memory. Using on-chip cache memory instead, meant that a pipeline could run at the speed of the cache access latency, a much smaller length of time. This allowed the operating frequencies of processors to increase at a much faster rate than that of off-chip memory. Branch prediction One barrier to achieving higher performance through instruction-level parallelism stems from pipeline stalls and flushes due to branches. Normally, whether a conditional branch will be taken isn't known until late in the pipeline as conditional branches depend on results coming from a register. From the time that the processor's instruction decoder has figured out that it has encountered a conditional branch instruction to the time that the deciding register value can be read out, the pipeline needs to be stalled for several cycles, or if it's not and the branch is taken, the pipeline needs to be flushed. As clock speeds increase the depth of the pipeline increases with it, and some modern processors may have 20 stages or more. On average, every fifth instruction executed is a branch, so without any intervention, that's a high amount of stalling. Techniques such as branch prediction and speculative execution are used to lessen these branch penalties. Branch prediction is where the hardware makes educated guesses on whether a particular branch will be taken. In reality one side or the other of the branch will be called much more often than the other. Modern designs have rather complex statistical prediction systems, which watch the results of past branches to predict the future with greater accuracy. The guess allows the hardware to prefetch instructions without waiting for the register read. Speculative execution is a further enhancement in which the code along the predicted path is not just perfected but also executed before it is known whether the branch should be taken or not. This can yield better performance when the guess is good, with the risk of a huge penalty when the guess is bad because instructions need to be undone. Superscalar Even with all of the added complexity and gates needed to support the concepts outlined above, improvements in semiconductor manufacturing soon allowed even more logic gates to be used.
  • 25. In the outline above the processor processes parts of a single instruction at a time. Computer programs could be executed faster if multiple instructions were processed simultaneously. This is what superscalar processors achieve, by replicating functional units such as ALUs. The replication of functional units was only made possible when the die area of a single-issue processor no longer stretched the limits of what could be reliably manufactured. By the late 1980s, superscalar designs started to enter the market place. In modern designs it is common to find two load units, one store (many instructions have no results to store), two or more integer math units, two or more floating point units, and often a SIMD unit of some sort. The instruction issue logic grows in complexity by reading in a huge list of instructions from memory and handing them off to the different execution units that are idle at that point. The results are then collected and re-ordered at the end. Out-of-order execution The addition of caches reduces the frequency or duration of stalls due to waiting for data to be fetched from the memory hierarchy, but does not get rid of these stalls entirely. In early designs a cache miss would force the cache controller to stall the processor and wait. Of course there may be some other instruction in the program whose data is available in the cache at that point. Out-of-order execution allows that ready instruction to be processed while an older instruction waits on the cache, then re-orders the results to make it appear that everything happened in the programmed order. This technique is also used to avoid other operand dependency stalls, such as an instruction awaiting a result from a long latency floating-point operation or other multi-cycle operations. Register renaming Register renaming refers to a technique used to avoid unnecessary serialized execution of program instructions because of the reuse of the same registers by those instructions. Suppose we have two groups of instruction that will use the same register. One set of instructions is executed first to leave the register to the other set, but if the other set is assigned to a different similar register, both sets of instructions can be executed in parallel (or) in series. Multiprocessing and multithreading Computer architects have become stymied by the growing mismatch in CPU operating frequencies and DRAM access times. None of the techniques that exploited instruction-level parallelism (ILP) within one program could make up for the long stalls that occurred when data had to be fetched from main memory. Additionally, the large transistor counts and high operating frequencies needed for the more advanced ILP techniques required power dissipation levels that could no longer be cheaply cooled. For these reasons, newer generations of computers have started to exploit higher levels of parallelism that exist outside of a single program or program thread. This trend is sometimes known as throughput computing. This idea originated in the mainframe market where online transaction processing emphasized not just the execution speed of one transaction, but the capacity to deal with massive numbers of transactions. With transaction-based applications such as network routing and web-site serving greatly increasing in the last decade, the computer industry has re-emphasized capacity and throughput issues. One technique of how this parallelism is achieved is through multiprocessing systems, computer systems with multiple CPUs. Once reserved for high-end mainframes and supercomputers, small scale (2-8) multiprocessors servers have become commonplace for the small business market. For large corporations, large scale (16-256) multiprocessors are common. Even personal computers with multiple CPUs have appeared since the 1990s. With further transistor size reductions made available with semiconductor technology advances, multicore CPUs have appeared where multiple CPUs are implemented on the same silicon chip. Initially used in chips targeting embedded markets, where simpler and smaller CPUs would allow multiple instantiations to fit on one piece of silicon. By 2005, semiconductor technology allowed dual high-end desktop CPUs CMP chips to be
  • 26. manufactured in volume. Some designs, such as Sun Microsystems' Ultra SPARC T1 have reverted to simpler (scalar, in-order) designs in order to fit more processors on one piece of silicon. Another technique that has become more popular recently is multithreading. In multithreading, when the processor has to fetch data from slow system memory, instead of stalling for the data to arrive, the processor switches to another program or program thread which is ready to execute. Though this does not speed up a particular program/thread, it increases the overall system throughput by reducing the time the CPU is idle. Conceptually, multithreading is equivalent to a context switch at the operating system level. The difference is that a multithreaded CPU can do a thread switch in one CPU cycle instead of the hundreds or thousands of CPU cycles a context switch normally requires. This is achieved by replicating the state hardware (such as the register file and program counter) for each active thread. A further enhancement is simultaneous multithreading. This technique allows superscalar CPUs to execute instructions from different programs/threads simultaneously in the same cycle. ARITHMETIC LOGIC UNIT In digital electronics, an arithmetic logic unit (ALU) is a digital circuit that performs integer arithmetic and logical operations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers. The processors found inside modern CPUs and graphics processing units (GPUs) accommodate very powerful and very complex ALUs; a single component may contain a number of ALUs. Mathematician John von Neumann proposed the ALU concept in 1945, when he wrote a report on the foundations for a new computer called the EDVAC. Arithmetic And Logic Unit schematic symbol Casacdable 8 Bit ALU Texas Instruments SN74AS888
  • 27. NUMERICAL SYSTEM An ALU must process numbers using the same formats as the rest of the digital circuit. The format of modern processors is almost always the two's complement binary number representation. Early computers used a wide variety of number systems, including ones' complement, two's complement, sign-magnitude format, and even true decimal systems, with various representation of the digits. The ones' complement and two's complement number systems allow for subtraction to be accomplished by adding the negative of a number in a very simple way which negates the need for specialized circuits to do subtraction; however, calculating the negative in two's complement requires adding a one to the low order bit and propagating the carry. An alternative way to do two's complement subtraction of A−B is to present a one to the carry input of the adder and use ¬B rather than B as the second input. The arithmetic, logic and shift circuits introduced in previous sections can be combined into one ALU with common selection. PRACTICAL OVERVIEW Most of a processor's operations are performed by one or more ALUs. An ALU loads data from input registers. Then an external control unit tells the ALU what operation to perform on that data, and then the ALU stores its result into an output register. The control unit is responsible for moving the processed data between these registers, ALU and memory. Complex operations Engineers can design an arithmetic logic unit to calculate most operations. The more complex the operation, the more expensive the ALU is, the more space it uses in the processor, and the more power it dissipates. Therefore, engineers compromise. They make the ALU powerful enough to make the processor fast, yet not so complex as to become prohibitive. For example, computing the square root of a number might use: 1. Calculation in a single clock Design an extraordinarily complex ALU that calculates the square root of any number in a single step. 2. Calculation pipeline Design a very complex ALU that calculates the square root of any number in several steps. The intermediate results go through a series of circuits arranged like a factory production line. The ALU can accept new numbers to calculate even before having finished the previous ones. The ALU can now produce numbers as fast as a single-clock ALU, although the results start to flow out of the ALU only after an initial delay. 3. Iterative calculation Design a complex ALU that calculates the square root through several steps. This usually relies on control from a complex control unit with built-in microcode. 4. Co-processor Design a simple ALU in the processor, and sell a separate specialized and costly processor that the customer can install just beside this one, and implements one of the options above.
  • 28. 5. Software libraries Tell the programmers that there is no co-processor and there is no emulation, so they will have to write their own algorithms to calculate square roots by software. 6. Software emulation Emulate the existence of the co-processor. Whenever a program attempts to perform the square root calculation, make the processor check if there is a co-processor present and use it if there is one; if there is not one, interrupt the processing of the program and invoke the operating system to perform the square root calculation through some software algorithm. The options above go from the fastest and most expensive one to the slowest and least expensive one. Therefore, while even the simplest computer can calculate the most complicated formula, the simplest computers will usually take a long time doing that because of the several steps for calculating the formula. Powerful processors like the Intel Core and AMD64 implement option #1 for several simple operations, #2 for the most common complex operations and #3 for the extremely complex operations. Inputs and outputs The inputs to the ALU are the data to be operated on (called operands) and a code from the control unit indicating which operation to perform. Its output is the result of the computation. One thing designers must keep in mind is whether the ALU will operate on big-endian or little-endian numbers. In many designs, the ALU also takes or generates inputs or outputs a set of condition codes from or to a status register. These codes are used to indicate cases such as carry-in or carry-out, overflow, divide-by-zero, etc. A floating-point unit also performs arithmetic operations between two values, but they do so for numbers in floating-point representation, which is much more complicated than the two's complement representation used in a typical ALU. In order to do these calculations, a FPU has several complex circuits built-in, including some internal ALUs. In modern practice, engineers typically refer to the ALU as the circuit that performs integer arithmetic operations (like two's complement and BCD). Circuits that calculate more complex formats like floating point, complex numbers, etc. usually receive a more specific name such as floating-point unit (FPU).
  • 29. CENTRAL PROCESSING UNIT An Intel 80486DX2 CPU, as seen from above. An Intel 80486DX2, as seen from below A central processing unit (CPU) is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system. The term has been in use in the computer industry at least since the early 1960s. The form, design, and implementation of CPUs have changed over the course of their history, but their fundamental operation remains much the same. A computer can have more than one CPU; this is called multiprocessing. All modern CPUs are microprocessors, meaning contained on a single chip. Some integrated circuits (ICs) can contain multiple CPUs on a single chip; those ICs are called multi-core processors. An IC containing a CPU can also contain peripheral devices, and other components of a computer system; this is called a system on a chip (SOC). Two typical components of a CPU are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit(CU), which extracts instructions from memory and decodes and executes them, calling on the ALU when necessary. Not all computational systems rely on a central processing unit. An array processor or vector processor has multiple parallel computing elements, with no one unit considered the "center". In the distributed computing model, problems are solved by a distributed interconnected set of processors.
  • 30. TRANSISTOR AND INTEGRATED CIRCUIT CPU CPU, core memory, and external bus interface of a DEC PDP-8/I. Made of medium-scale integrated circuits. The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tubes and electrical relays. With this improvement more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete (individual) components. During this period, a method of manufacturing many interconnected transistors in a compact space was developed. The integrated circuit (IC) allowed a large number of transistors to be manufactured on a single semiconductor-based die, or "chip". At first only very basic non-specialized digital circuits such as NOR gates were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance computer, usually contained up to a few score transistors. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands. In 1964 IBM introduced its System/360 computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a micro program (often called "microcode"), which still sees widespread usage in modern CPUs.[4] The System/360 architecture was so popular that it dominated the mainframe computer market for decades and left a legacy that is still continued by similar modern computers like the IBM z Series. In the same year (1964), Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8. DEC would later introduce the extremely popularPDP-11 line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits. Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a
  • 31. transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like SIMD (Single Instruction Multiple Data) processors began to appear. These early experimental designs later gave rise to the era of specialized supercomputers like those made by Cray Inc. MICROPROCESSORS Die of an Intel 80486DX2microprocessor (actual size: 12×6.75 mm) in its packaging Intel Core i5 CPU on a Vaio E series laptop motherboard (on the right, beneath the heat pipe). In the 1970s the fundamental inventions by Federico Faggin (Silicon Gate MOS ICs with self-aligned gates along with his new random logic design methodology) changed the design and implementation of CPUs forever. Since the introduction of the first commercially available microprocessor (the Intel 4004) in 1970, and the first widely used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual success of the ubiquitous personal computer, the term CPU is now applied almost exclusively to microprocessors. Several CPUs (denoted 'cores') can be combined in a single processing chip. Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic
  • 32. capacitance. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased many fold. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity. While the complexity, size, construction, and general form of CPUs have changed enormously since 1950, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines.[b] As the aforementioned Moore's law continues to hold true,[6] concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electro migration and sub threshold to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer, as well as to expand the usage of parallelism and other methods that extend the usefulness of the classical von Neumann model. OPERATION The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The instructions are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and write back. Fetch The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The instruction's location (address) in program memory is determined by a program counter (PC), which stores a number that identifies the address of the next instruction to be fetched. After an instruction is fetched, the PC is incremented by the length of the instruction in terms of memory units so that it will contain the address of the next instruction in the sequence.[c] Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below). Decode The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture (ISA).[d] Often, one group of numbers in the instruction, called the op code, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a micro program is often used to assist in translating instructions into various configuration signals for the CPU. This micro program is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.
  • 33. Execute After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, the arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set. The final step, write back, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs. Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow. After the execution of the instruction and write back of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "classic RISC pipeline", which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline. PERFORMANCE The performance or speed of a processor depends on, among many other factors, the clock rate (generally given in multiples of hertz) and the instructions per clock (IPC), which together are the factors for the instructions per second (IPS) that the CPU can perform. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches, whereas realistic workloads consist of a mix of instructions and applications, some of which take longer to execute than others. The performance of the memory hierarchy also greatly affects processor performance, an issue barely considered in MIPS calculations. Because of these problems, various standardized tests, often called "benchmarks" for this purpose—such as SPE Cint – have been developed to attempt to measure the real effective performance in commonly used applications. Processing performance of computers is increased by using multi-core processors, which essentially is plugging two or more individual processors (called cores in this sense) into one integrated circuit. Ideally, a dual core processor would be nearly twice as powerful as a single core processor. In practice, the performance gain is far smaller, only about 50%, due to imperfect software algorithms and implementation. Increasing the number of cores in a processor (i.e. dual-core, quad-core, etc.) increases the workload that can be handled. This means that the processor can now handle numerous asynchronous events, interrupts, etc. which can take a toll on the CPU (Central
  • 34. Processing Unit) when overwhelmed. These cores can be thought of as different floors in a processing plant, with each floor handling a different task. Sometimes, these cores will handle the same tasks as cores adjacent to them if a single core is not enough to handle the information.