What is the purpose of computer technology. The history of the development of computer technology. Classification of computers. The composition of the computing system. Hardware and software. Classification of service and applied software. By stages

3. Computing 1

3.1 History of the development of computer technology 1

3.2 Computer classification methods 3

3.3 Other classifications of computers 5

3.4 Composition computing system 7

3.4.1 Hardware 7

3.4.2 Software 7

3.5 Classification of applied software tools 9

3.6 Classification of utility software 12

3.7 The concept of information and mathematical support of computing systems 13

3.8 Debriefing 13

Computer Engineering
1. History of the development of computer technology

Computing system, computer

Finding means and methods of mechanization and automation of work is one of the main tasks of technical disciplines. Automation of work with data has its own characteristics and differences from the automation of other types of work. For this class of tasks, special types of devices are used, most of which are electronic devices. A set of devices designed for automatic or automated data processing is called computer technology, A specific set of interacting devices and programs designed to serve one work area is called computing system. The heart of most computing systems is computer.

A computer is an electronic device designed to automate the creation, storage, processing and transportation of data.

The principle of operation of the computer

In the definition of a computer as a device, we indicated a defining feature - electronic. However, automatic calculations were not always performed by electronic devices. Known and mechanical devices capable of performing calculations automatically.

Analyzing the early history of computer technology, some foreign researchers often refer to a mechanical counting device as an ancient predecessor of a computer. abacus. The approach “from the abacus” indicates a deep methodological error, since the abacus does not have the property of automatically performing calculations, and for a computer it is decisive.

The abacus is the earliest calculating mechanical device, originally a clay plate with grooves in which stones representing numbers were laid out. The appearance of the abacus is attributed to the fourth millennium BC. e. The place of origin is Asia. In the Middle Ages in Europe, the abacus was replaced by graphed tables. Calculations with their help were called account on the lines, and in Russia in the 16th-17th centuries a much more advanced invention appeared, which is still used today - Russian abacus.

At the same time, we are well aware of another device that can automatically perform calculations - this is a watch. Regardless of the principle of operation, all types of watches (sand, water, mechanical, electrical, electronic, etc.) have the ability to generate movements or signals at regular intervals and register the changes that occur in this case, that is, perform automatic summation of signals or movements. This principle can be traced even in sundials containing only a recording device (the role of the generator is performed by the Earth-Sun system).

A mechanical clock is a device consisting of a device that automatically performs movements at regular intervals and a device for recording these movements. The origin of the first mechanical watch is unknown. The earliest examples date back to the 14th century and belong to monasteries (tower clock).

At the heart of any modern computer, as in electronic watches, is clock generator, generating, at regular intervals, electrical signals that are used to power all the devices in a computer system. Computer management actually comes down to managing the distribution of signals between devices. Such control can be performed automatically (in this case, one speaks of program control) or manually using external controls - buttons, switches, jumpers, etc. (in early models). In modern computers, external control is largely automated using special hardware-logical interfaces to which control and data input devices (keyboard, mouse, joystick, and others) are connected. In contrast to program control, such control is called interactive.

Mechanical primary sources

The world's first automatic device for performing the addition operation was created on the basis of a mechanical watch. In 1623 it was developed by Wilhelm Schickard, professor of Oriental languages at the University of Tübingen (Germany). Nowadays working model The device was reproduced according to the drawings and confirmed its operability. The inventor himself in his letters called the machine "summing clock".

In 1642, the French mechanic Blaise Pascal (1623-1662) developed a more compact adding device, which became the world's first mass-produced mechanical calculator (mainly for the needs of Paris moneylenders and money changers). In 1673, the German mathematician and philosopher G. W. Leibniz (1646-1717) created a mechanical calculator that could perform multiplication and division operations by repeatedly repeating addition and subtraction operations.

During the 18th century, known as the Age of Enlightenment, new, more advanced models appeared, but the principle of mechanical control of computational operations remained the same. The idea of programming computational operations came from the same watch industry. The ancient monastic tower clocks were set up in such a way as to turn on the mechanism associated with the bell system at a given time. Such programming was tough - the same operation was performed at the same time.

The idea of flexible programming of mechanical devices using perforated paper tape was first realized in 1804 in the Jacquard loom, after which there was only one step left to program control computing operations.

This step was taken by the outstanding English mathematician and inventor Charles Babbage (1792-1871) in his Analytical Engine, which, unfortunately, was never fully built by the inventor during his lifetime, but was reproduced today according to his drawings, so that today we have the right to speak of the Analytical Engine as a real-life device. A feature of the Analytical Engine was that it was first implemented here the principle of separating information into commands and data. The analytical engine contained two large nodes - a "warehouse" and a "mill". Data was entered into the mechanical memory of the "warehouse" by installing blocks of gears, and then processed in the "mill" using commands that were entered from perforated cards (as in a Jacquard loom).

Researchers of Charles Babbage's work certainly note the special role in the development of the Analytical Engine project of Countess Augusta Ada Lovelace (1815-1852), daughter of the famous poet Lord Byron. It was she who came up with the idea of using perforated cards for programming computational operations (1843). In particular, in one of her letters she wrote: “The Analytical Engine weaves algebraic patterns in the same way as a loom reproduces flowers and leaves.” Lady Adu can rightly be called the very first programmer in the world. Today one of the known languages programming.

Charles Babbage's idea of separate consideration commands And data proved to be extraordinarily fruitful. In the XX century. it was developed in the principles of John von Neumann (1941), and today in computing the principle of separate consideration programs And data is very important. It is taken into account both in the development of modern computer architectures and in the development of computer programs.

Mathematical sources

If we think about what objects the first mechanical precursors of the modern electronic computer worked with, we must recognize that numbers were represented either as linear movements of chain and rack mechanisms, or as angular movements of gear and lever mechanisms. In both cases, these were movements, which could not but affect the dimensions of the devices and the speed of their work. Only the transition from recording movements to recording signals made it possible to significantly reduce the dimensions and increase the speed. However, on the way to this achievement, it was necessary to introduce several more important principles and concepts.

Binary Leibniz system. In mechanical devices, gears can have quite a lot of fixed and, importantly, different between provisions. The number of such positions is at least equal to the number of gear teeth. In electrical and electronic devices, it is not about registration provisions structural elements, but about registration states device elements. so stable and distinguishable there are only two states: on - off; open - closed; charged - discharged, etc. Therefore, the traditional decimal system used in mechanical calculators is inconvenient for electronic computing devices.

The possibility of representing any numbers (and not only numbers) with binary digits was first proposed by Gottfried Wilhelm Leibniz in 1666. He came to the binary number system while researching the philosophical concept of unity and struggle of opposites. An attempt to present the universe as a continuous interaction of two principles (“black” and “white”, male and female, good and evil) and apply the methods of “pure” mathematics to its study prompted Leibniz to study the properties of binary data representation. It must be said that Leibniz already then came up with the idea of the possibility of using the binary system in a computing device, but since there was no need for this for mechanical devices, he did not use the principles of the binary system in his calculator (1673).

The Mathematical Logic of George Boole Speaking about the work of George Boole, researchers in the history of computer technology certainly emphasize that this outstanding English scientist of the first half of the 19th century was self-taught. Perhaps it was precisely due to the lack of a “classical” (in the understanding of that time) education that George Boole introduced revolutionary changes into logic as a science.

Being engaged in the study of the laws of thinking, he applied in logic a system of formal notation and rules, close to mathematical. Subsequently, this system called logical algebra or boolean algebra. The rules of this system are applicable to a wide variety of objects and their groups. (sets, according to the terminology of the author). The main purpose of the system, as conceived by J. Boole, was to encode logical statements and reduce the structures of logical inferences to simple expressions close in form to mathematical formulas. The result of the formal evaluation of a logical expression is one of two logical values: true or lie.

The significance of logical algebra was ignored for a long time, since its techniques and methods did not contain practical benefits for the science and technology of that time. However, when it became possible in principle to create computer facilities on an electronic basis, the operations introduced by Boole turned out to be very useful. They are initially focused on working with only two entities: true And lie. It is easy to see how they came in handy for working with binary code, which in modern computers is also represented by just two signals: zero And unit.

Not the entire system of George Boole (as well as not all the logical operations he proposed) were used in the creation of electronic computers, but four basic operations: And (crossing), OR (Union), NOT (appeal) and EXCLUSIVE OR - underlie the work of all types of processors of modern computers.

Rice. 3.1. Basic Boolean Algebra Operations

Electronic computers are usually classified according to a number of criteria, in particular: according to the functionality and nature of the tasks being solved, according to the method of organizing the computing process, according to architectural features and computing power.

According to the functionality and nature of the tasks to be solved, there are:

Universal (general purpose) computers;

Problem-oriented computers;

specialized computers.

Mainframes are designed to solve a variety of engineering and technical problems, characterized by the complexity of algorithms and a large amount of processed data.

Problem-oriented computers designed to solve a narrower range of tasks related to the registration, accumulation and processing of small amounts of data.

Specialized computers are used to solve a narrow range of tasks (microprocessors and controllers that perform the functions of controlling technical devices).

According to the method of organizing the computing process Computers are divided into uniprocessor and multiprocessor, as well as serial and parallel.

Uniprocessor. The computer has one central processor and all computational operations and operations to control input-output devices are carried out on this processor.

Multiprocessor. The computer has several processors between which the functions of organizing the computing process and controlling information input-output devices are redistributed.

Sequential. They work in a single-program mode, when the computer is designed in such a way that it can execute only one program, and all its resources are used only in the interests of the executable program.

Parallel. They work in a multiprogram mode, when several user programs are being executed in the computer, and resources are shared between these programs, ensuring their parallel execution.

According to architectural features and computing power, there are:

Let's consider the classification scheme of computers according to this feature (Fig. 1).

Fig.1. Classification of computers by architectural features

and computing power.

Supercomputers- These are the most powerful computers in terms of speed and performance. Supercomputers include "Cray" and "IBM SP2" (USA). They are used to solve large-scale computational problems and simulations, for complex calculations in aerodynamics, meteorology, high energy physics, and also find application in the financial sector.

big cars or mainframes (Mainframe). Mainframes are used in the financial sector, the defense industry, and are used to equip departmental, territorial and regional computer centers.

Medium computers general purpose are used to control complex technological production processes.

minicomputer are oriented to use as control computing complexes, as network servers.

Microcomputer are computers that use a microprocessor as the central processing unit. These include embedded microcomputers (embedded in various equipment, apparatus or devices) and personal computers PC.

Personal computers. Rapid development acquired in the last 20 years. A personal computer (PC) is designed to serve a single workplace and is able to meet the needs of small businesses and individuals. With the advent of the Internet, the popularity of the PC has increased significantly, because with the help of personal computer You can use scientific, reference, educational and entertaining information.

Personal computers include desktop and laptop PCs. Portable computers include Notebook (notebook or notebook) and personal digital assistants (Personal Computers Handheld - Handheld PC, Personal Digital Assistants - PDA and Palmtop).

Embedded computers. Computers that are used in various devices, systems, complexes for the implementation of specific functions. For example, car diagnostics.

Since 1999, an international certification standard, the RS99 specification, has been used to classify PCs. According to this specification, PCs are divided into the following groups:

Mass PCs (Consumer PC);

Business PCs (Office PC);

portable PCs (Mobile PC);

workstations (WorkStation);

Entertainment PCs (Entertainment PC).

Most PCs are massive and include a standard (minimum required) set of hardware. This set includes: system unit, display, keyboard, mouse. If necessary, this set can be easily supplemented with other devices at the request of the user, for example, a printer.

Business PCs include a minimum means of reproducing graphics and sound.

Portable PCs differ in the presence of means of communication of remote access.

Workstations meet the increased memory requirements of storage devices.

Entertainment PCs focused on high-quality reproduction of graphics and sound.

By design features PCs are divided into:

stationary (desktop, Desktop);

portable:

Portable (Laptop);

notepads (Notebook);

pocket (Palmtop).

The main characteristics of computer technology include its operational and technical characteristics, such as speed, memory capacity, calculation accuracy, etc.

Computer speed considered in two aspects. On the one hand, it is characterized by the number of elementary operations performed central processing unit per second. An elementary operation is understood as any simple operation such as addition, transfer, comparison, etc. On the other hand, the performance

The computer essentially depends on the organization of its memory. The time spent searching for the necessary information in memory significantly affects the speed of the computer.

Depending on the field of application, computers are produced with a speed of several hundred thousand to billions of operations per second. To solve complex problems, it is possible to combine several computers into a single computing complex with the required total speed.

Along with speed, the concept is often used performance . If the first is due mainly to the system of elements used in the computer, then the second is associated with its architecture and the types of tasks being solved. Even for one "computer, such a characteristic as speed is not a constant value. In this regard, there are:

peak performance, determined by the clock frequency of the processor without taking into account access to random access memory;

rated speed, determined taking into account the time of access to RAM;

system speed, determined taking into account the system costs for the organization of the computing process;

operational, determined taking into account the nature of the tasks being solved (composition, operations or their "mixture").

Capacity, or Memory is determined by the maximum amount of information that can be placed in the computer memory. Memory capacity is usually measured in bytes. As already noted, computer memory is divided into internal and external. Internal, or random access memory, varies in size for different classes of machines and is determined by the computer's addressing system. The capacity of external memory due to the block structure and removable storage designs is practically unlimited.

Calculation accuracy depends on the number of digits used to represent a single number. Modern computers are equipped with 32- or 64-bit microprocessors, which is quite enough to ensure high accuracy of calculations in a wide variety of applications. However, if this is not enough, a double or triple bit grid can be used.

Command system is a list of instructions that the computer processor is capable of executing. The instruction system establishes what specific operations the processor can perform, how many operands are required to be specified in the instruction, what form (format) the instruction must have to recognize it. The number of basic types of commands is small, with their help, computers are able to perform addition, subtraction, multiplication, division, comparison, writing to memory, transferring a number from register to register, converting from one number system to another, etc. If necessary, modify commands , which takes into account the specifics of calculations. Usually, a computer uses from tens to hundreds of instructions (taking into account their modification). At the present stage of the development of computer technology, two main approaches are used in the formation of a processor instruction set. On the one hand, this is a traditional approach associated with the development of processors with a full set of instructions - the architecture CIS(Complete Instruction Set Computer - a computer with a complete set of commands). On the other hand, this is the implementation in the computer of a reduced set of the simplest, but frequently used commands, which makes it possible to simplify the hardware of the processor and increase its speed - architecture RISC(Reduced Instruction Set Computer - a computer with a reduced set of commands).

Computer cost depends on many factors, in particular, speed, memory capacity, instruction set, etc. Big influence the cost is affected by the specific configuration of the computer and, first of all, by the external devices that are part of the machine. Finally, the cost of software significantly affects the cost of computers.

Computer reliability - this is the ability of a machine to maintain its properties under given operating conditions for a certain period of time. The following indicators can serve as a quantitative assessment of the reliability of a computer containing elements whose failure leads to the failure of the entire machine:

the probability of failure-free operation for a certain time under given operating conditions;

computer time to failure;

average recovery time of the machine, etc.

For more complex structures such as a computer complex or system, the concept of “failure” does not make sense. In such systems, failures of individual elements lead to some decrease in the efficiency of functioning, and not to a complete loss of efficiency as a whole.

Other characteristics of computer technology are also important, for example: versatility, software compatibility, weight, dimensions, power consumption, etc. They are taken into account when evaluating specific areas of computer application.

Methods for organizing software and hardware in AWS complexes should be determined in the general context of the considered processes of operational production management (OUP) of industrial enterprises, the target function of which is to minimize the costs of all types of resources for the manufacture of the established nomenclature of labor objects.

The synthesis of methods and models for the organization of software and hardware when presenting AS OUP as AWS complexes of self-supporting production teams must go through two stages: the stage of determining the rational composition of CT tools and the stage of solving the problem of distributing the resources of the computer system of AWS complexes to its end users.

Technical (hardware) compatibility of new VT facilities in relation to the customer's existing VT fleet and to the VT fleet predicted for acquisition in the future. Practice shows that this indicator is one of the most important, taken into account when choosing a VT. The trend of acquiring VT hardware compatible with the existing ones is associated with many objective and subjective reasons, where the customer's psychology, his feeling of confidence in the success of using this particular class of hardware, occupies not the last place. Software compatibility, which is determined by the compatibility of the hardware-implemented command system, compatibility of data representation formats, compatibility of translators, DBMS, etc. The significant impact of this indicator on resource consumption can be explained by the presence of large volumes of previously prepared regulatory, archival and statistical data, as well as the specialization of trained personnel in the enterprise with experience working with specific basic software tools.

Interoperability within the acquired complex of VT tools, which allows, in the event of failure of individual modules of the workstation, either to quickly replace the failed module, or to reassign the devices used between specific workstations within the computing resources of all complexes (inside the workshop complex, within the inter-shop complex, within the system of any enterprise).

Reliability of VT equipment according to technical specifications and its compliance with specific operating conditions: vibration, oxidation, dust, gas pollution, power surges, etc. requires additional protection.

The total speed of solving functional problems by types of workstations of the complex is the speed of processing existing volumes of data in various modes of operation. Usually, to determine the values of this indicator, it is not enough to know only the volumes information base specific workstation and passport characteristics and provided computing resources.

Therefore, for an approximate (ordinal) assessment of the values of this indicator, it is essential either the operating experience at similar class VT objects, or the results obtained on simulation models, where the databases correspond in volume and structure of data to the real ones. Approximation of the data obtained on test examples can lead to an error in the results, which differ by an order of magnitude from the real estimates obtained later during the operation of the system. The source of error is most often the ambiguity of work algorithms, utilities operating systems, communication protocols, drivers and basic language tools when operating systems in a multi-user multitasking mode at the limiting resources of computing systems or for their elements in volumes. In this case, the possibilities of direct calculation using the performance characteristics of processors, intramachine communication channels, network communication channels, data access speeds by types of external devices cannot be used inefficiently. At present, the capacity of many processors and the implemented language tools oriented to them does not allow providing the entire potential set of tasks of the PPP control system with the necessary computational accuracy. Therefore, when determining the values of this indicator, it is necessary to introduce detailing by task classes of specific types of workstations with reference to the considered combination of CT tools and basic software.

The cost of implementing a "friendly interface" includes both training programs and the possibility of obtaining references in the process of working on the workstation on how to continue or end the dialogue.

The possibility of changing the composition and content of the functions implemented at specific workstations, including redistribution between personnel.

Ensuring the requirements of protection against unauthorized access for knowledge bases and databases, as well as ensuring their "transparency" if necessary.

Classification of computer equipment

1. Hardware

The composition of a computer system is called a configuration. Computer hardware and software are considered separately. Accordingly, the hardware configuration of computing systems and their software configuration are considered separately. This separation principle is of particular importance for computer science, since very often the solution of the same problems can be provided both by hardware and software. The criteria for choosing a hardware or software solution are performance and efficiency. It is generally accepted that hardware solutions are on average more expensive, but the implementation of software solutions requires more highly qualified personnel.

TO hardware computing systems include devices and devices that form a hardware configuration. Modern computers and computing systems have a block-modular design - a hardware configuration necessary for the performance of specific types of work, which can be assembled from ready-made nodes and blocks.

The main hardware components of the computing system are: memory, central processor and peripheral devices, which are interconnected by a system highway (Fig. 1.) The main memory is designed to store programs and data in binary form and is organized as an ordered array of cells, each of which has unique digital address. Typically, the cell size is 1 byte. Typical operations on the main memory: reading and writing the contents of a cell with a specific address.

2. CPU

The central processing unit is the central device of a computer that performs data processing operations and controls the computer's peripheral devices. The composition of the central processing unit includes:

Control device - organizes the process of executing programs and coordinates the interaction of all devices of the computing system during its operation;

Arithmetic logic unit - performs arithmetic and logical operations on data: addition, subtraction, multiplication, division, comparison, etc.;

The storage device is internal memory processor, which consists of registers, when using which, the processor performs calculations and stores intermediate results; to speed up work with RAM, a cache memory is used, into which commands and data from RAM are pumped ahead of time, which are necessary for the processor for subsequent operations;

Clock generator - generates electrical impulses that synchronize the operation of all computer nodes.

The central processor performs various data operations using specialized cells for storing key variables and temporary results - internal registers. Registers are divided into two types (Fig. 2.):

General purpose registers - used for temporary storage of key local variables and intermediate results of calculations, include data registers and pointer registers; main function is to provide quick access to frequently used data (usually without memory accesses).

Specialized registers - used to control the operation of the processor, the most important of them are: the instruction register, the stack pointer, the flag register and the register containing information about the state of the program.

The programmer can use data registers at his own discretion to temporarily store any objects (data or addresses) and perform the required operations on them. Index registers, like data registers, can be used arbitrarily; their main purpose is to store indexes or offsets of data and instructions from the beginning of the base address (when fetching operands from memory). The base address may be in the base registers.

Segment registers are a critical element of the processor architecture, providing a 20-bit address space with 16-bit operands. Main segment registers: CS - code segment register; DS - data segment register; SS - stack segment register, ES - additional segment register. Memory is accessed through segments - logical formations superimposed on any part of the physical address space. The segment start address divided by 16 (without the least significant hexadecimal digit) is entered into one of the segment registers; after which access is granted to a section of memory starting from a given segment address.

The address of any memory cell consists of two words, one of which determines the location in the memory of the corresponding segment, and the other - the offset within this segment. The segment size is determined by the amount of data it contains, but can never exceed 64 KB, which is determined by the maximum possible offset value. The segment address of the instruction segment is stored in the CS register, and the offset to the addressed byte is stored in the IP instruction pointer register.

Fig.2. 32-bit processor registers

After loading the program, the offset of the first command of the program is entered into the IP. The processor, reading it from memory, increments the content of IP exactly by the length of this instruction (Intel processor instructions can be from 1 to 6 bytes long), as a result of which IP points to the second instruction of the program. After executing the first command, the processor reads the second from memory, again increasing the value of IP. As a result, IP always contains the offset of the next command - the command following the one being executed. The described algorithm is violated only when executing jump commands, subroutine calls, and interrupt servicing.

The segment address of the data segment is stored in the DS register, the offset can be in one of the general purpose registers. An additional ES segment register is used to access data fields that are not part of the program, such as the video buffer or system cells. However, if necessary, it can be configured for one of the segments of the program. For example, if the program works with a large amount of data, you can provide two segments for them and access one of them through the DS register, and the other through the ES register.

The stack pointer register SP is used as the stack top pointer. A stack is a program area for temporary storage of arbitrary data. The convenience of the stack lies in the fact that its area is reused, and storing data on the stack and fetching them from there is done using push and pop commands without specifying names. The stack is traditionally used to store the contents of the registers used by the program before calling the subroutine, which in turn will use the processor registers for its own purposes. The original contents of the registers are popped from the stack upon return from the subroutine. Another common technique is to pass the parameters it requires to a subroutine via the stack. The subroutine, knowing in what order the parameters are placed on the stack, can take them from there and use them in its execution.

A distinctive feature of the stack is the peculiar order of fetching the data contained in it: at any time, only the top element is available on the stack, that is, the element loaded onto the stack last. Popping the top element from the stack makes the next element available. The elements of the stack are located in the memory area allocated for the stack, starting from the bottom of the stack (from its maximum address) to successively decreasing addresses. The address of the top accessible element is stored in the stack pointer register SP.

Special registers are available only in privileged mode and are used by the operating system. They are in control various blocks cache memory, main memory, input-output devices and other devices of the computing system.

There is one register that is available in both privileged and user modes. This is the PSW (Program State Word) register, which is called the flag register. The flag register contains various bits needed by the CPU, the most important being condition codes that are used in comparisons and conditional jumps. They are set in each cycle of the processor's ALU and reflect the state of the result of the previous operation. The content of the flag register depends on the type of computing system and may include additional fields that indicate: machine mode (for example, user or privileged); trace bit (which is used for debugging); processor priority level; interrupt enable status. The flag register is usually read in user mode, but some fields can only be written in privileged mode (for example, the bit that specifies the mode).

The instruction pointer register contains the address of the next instruction in the queue for execution. After an instruction is selected from memory, the instruction register is updated and the pointer moves to the next instruction. The instruction pointer keeps track of the execution of the program, indicating at each moment the relative address of the instruction following the one being executed. The register is programmatically inaccessible; the address is incremented by the microprocessor, taking into account the length of the current instruction. Instructions for jumps, interrupts, calling subroutines and returning from them change the contents of the pointer, thereby making jumps to the required points in the program.

The accumulator register is used in the vast majority of commands. Frequently used commands that use this register have a shortened format.

To process information, data is usually transferred from memory cells to general-purpose registers, the operation is performed by the central processor, and the results are transferred to the main memory. Programs are stored as a sequence of machine instructions to be executed by the CPU. Each command consists of an operation field and operand fields - the data on which this operation is performed. The set of machine instructions is called machine language. Program execution is carried out as follows. The machine instruction pointed to by the program counter is read from memory and copied into the instruction register, where it is decoded and then executed. After it is executed, the program counter points to the next instruction, and so on. These actions are called a machine cycle.

Most CPUs have two modes of operation: kernel mode and user mode, which is specified by a bit in the processor status word (flag register). When the processor is running in kernel mode, it can execute all the instructions in the instruction set and use all the hardware's capabilities. The operating system runs in kernel mode and provides access to all hardware. User programs run in user mode, which allows many instructions to be executed, but makes only a portion of the hardware available.

To communicate with the operating system, the user program must issue a system call that provides a transition to kernel mode and activates the functions of the operating system. The trap instruction (emulated interrupt) switches the processor mode from user mode to kernel mode and transfers control to the operating system. After completion of work, control returns to the user program, to the instruction following the system call.

In computers, in addition to instructions for making system calls, there are interrupts that are called in hardware to warn of exceptional situations, for example, an attempt to divide by zero or an overflow during floating point operations. In all such cases, control passes to the operating system, which must decide what to do next. Sometimes you need to terminate the program with an error message, sometimes you can ignore it (for example, if the number loses its significance, you can take it equal to zero) or transfer control to the program itself to handle certain types of conditions.

According to the way the devices are located relative to the central processor, internal and external devices are distinguished. External devices typically include most I/O devices (also called peripherals) and some devices designed for long-term data storage.

Coordination between individual nodes and blocks is performed using transitional hardware-logical devices called hardware interfaces. Standards for hardware interfaces in computing are called protocols - a set of technical conditions that must be provided by device developers in order to successfully coordinate their work with other devices.

Numerous interfaces present in the architecture of any computer system can be conditionally divided into two large groups: serial and parallel. Through a serial interface, data is transmitted sequentially, bit by bit, and through a parallel interface, simultaneously in groups of bits. The number of bits involved in one package is determined by the bit width of the interface, for example, eight-bit parallel interfaces transmit one byte (8 bits) per cycle.

Parallel interfaces are usually more complex than serial interfaces, but provide better performance. They are used where data transfer speed is important: for connecting printing devices, input devices graphic information, devices for recording data on external media, etc. The performance of parallel interfaces is measured in bytes per second (bytes/s; Kbytes/s; Mbytes/s).

Device serial interfaces easier; as a rule, they do not need to synchronize the operation of the transmitting and receiving device (which is why they are often called asynchronous interfaces), but their bandwidth is less and the coefficient useful action below. Because serial devices communicate in bits rather than bytes, their performance is measured in bits per second (bps, kbps, Mbps). Despite the apparent simplicity of converting units of measurement of the serial transfer rate into units of measurement of the parallel data transfer rate by mechanical division by 8, such a conversion is not performed, since it is not correct due to the presence of service data. In the extreme case, adjusted for service data, sometimes the speed of serial devices is expressed in characters per second or symbols per second (s / s), but this value is not technical, but a reference, consumer character.

Serial interfaces are used to connect slow devices (the simplest low-quality printing devices: input and output devices for sign and signal information, control sensors, low-performance communication devices, etc.), as well as in cases where there are no significant restrictions on the duration of data exchange (digital cameras).

The second main component of a computer is memory. The memory system is designed as a hierarchy of layers (Fig. 3.). The top layer consists of the internal registers of the CPU. Internal registers provide the ability to store 32 x 32 bits on a 32-bit processor and 64 x 64 bits on a 64-bit processor, which is less than one kilobyte in both cases. Programs themselves can manage registers (that is, decide what to store in them) without hardware intervention.

Fig.3. Typical hierarchical memory structure

The next layer is cache memory, mostly controlled by the hardware. RAM is divided into cache lines, usually 64 bytes each, addressing 0 to 63 on line 0, 64 to 127 on line 1, and so on. The most frequently used cache lines are stored in a high-speed cache located within or very close to the CPU. When a program needs to read a word from memory, the cache chip checks to see if there is desired string in the cache. If so, then the cache is effectively accessed, the request is satisfied entirely from the cache, and the memory request is not placed on the bus. A successful cache access, as a rule, takes about two clock cycles, and an unsuccessful one leads to a memory access with a significant loss of time. Cache memory is limited in size due to its high cost. Some machines have two or even three levels of cache, each slower and larger than the last.

This is followed by RAM (RAM - Random Access Memory, English RAM, Random Access Memory - memory with random access). This is the main working area of the storage device of the computing system. All CPU requests that cannot be fulfilled by the cache go to main memory for processing. When running several programs on a computer, it is desirable to place complex programs in RAM. Protection of programs from each other and their movement in memory is implemented by means of computer equipment with two specialized registers: a base register and a limit register.

In the simplest case (Fig.4.a), when the program starts running, the address of the beginning of the program's executable module is loaded into the base register, and the limit register tells how much the program's executable module takes along with the data. When an instruction is fetched from memory, the hardware checks the instruction counter, and if it is less than the limit register, it adds the value of the base register to it, and transfers the sum to the memory. When the program wants to read a word of data (for example, from address 10000), the hardware automatically adds the contents of the base register (for example, 50000) to this address and transfers the sum (60000) of memory. The base register allows the program to refer to any part of the memory following the address stored in it. In addition, the limit register prevents the program from accessing any part of the memory after the program. Thus, with the help of this scheme, both problems are solved: protection and movement of programs.

As a result of checking and converting data, the address generated by the program and called the virtual address is translated into the address used by the memory and called the physical address. The device that performs the verification and conversion is called a Memory Management Unit (MMU). The memory manager resides either in the processor circuitry, or close to it, but logically sits between the processor and memory.

A more complex memory manager consists of two pairs of base and limit registers. One pair is for program text, the other pair is for data. The command register and all references to the program text work with the first pair of registers, the data references use the second pair of registers. Thanks to this mechanism, it becomes possible to share one program among several users while storing only one copy of the program in RAM, which is excluded in a simple scheme. When program No. 1 is running, four registers are located as shown in Fig. 4 (b) on the left, when program No. 2 is running - on the right. Managing the memory manager is a function of the operating system.

Next in the memory structure is magnetic disk(HDD). Disk memory is two orders of magnitude cheaper than RAM in terms of bits and larger in size, but accessing data located on disk takes about three orders of magnitude longer. Cause of low speed hard drive is the fact that the disc is a mechanical structure. A hard disk consists of one or more metal plates rotating at 5400, 7200 or 10800 rpm (Fig. 5.). Information is recorded on the plates in the form of concentric circles. The read/write heads at each given position can read a ring on the platter called a track. Together, the tracks for a given fork position form a cylinder.

Each track is divided into a number of sectors, typically 512 bytes per sector. On modern disks, the outer cylinders contain more sectors than the inner ones. Moving the head from one cylinder to another takes about 1 ms, and moving to an arbitrary cylinder takes 5 to 10 ms, depending on the disk. When the head is located over the desired track, you need to wait until the engine turns the disk so that the required sector becomes under the head. This takes an additional 5 to 10 ms, depending on the disk rotation speed. When the sector is under the head, the process of reading or writing occurs at a speed of 5 MB / s (for low-speed disks) to 160 MB / s (for high-speed disks).

The last layer is occupied by a magnetic tape. This media was often used to create hard disk space backups or to store large datasets. To access information, the tape was placed in a magnetic tape reader, then it was rewound to the requested block with information. The whole process took minutes. The described memory hierarchy is typical, but in some embodiments, not all levels or their other types (for example, an optical disk) may be present. In any case, when moving down the hierarchy, the random access time increases significantly from device to device, and the capacity grows equivalent to the access time.

In addition to the types described above, many computers have random-access read-only memory (ROM - read-only memory, ROM, Read Only Memory - read-only memory), which does not lose its contents when the computer system is turned off. The ROM is programmed during the manufacturing process and its contents cannot be changed afterwards. On some computers, the ROM contains the bootstrap programs used to start the computer and some I/O cards for controlling low-level devices.

Electrically erasable ROM (EEPROM, Electrically Erasable ROM) and flash RAM (flash RAM) are also non-volatile, but unlike ROM, their contents can be erased and rewritten. However, writing data to them takes much more time than writing to RAM. Therefore, they are used in the same way as ROM.

There is another type of memory - CMOS memory, which is volatile and is used to store the current date and current time. The memory is powered by a battery built into the computer, and may contain configuration parameters (for example, an indication of which hard drive to boot from).

3. I/O devices

Other devices that interact closely with the operating system are I/O devices, which consist of two parts: the controller and the device itself. The controller is a microchip (chipset) on a plug-in board that receives and executes commands from the operating system.

For example, the controller receives a command to read a specific sector from the disk. To execute the command, the controller converts the linear sector number of the disk into the number of the cylinder, sector, and head. The conversion operation is complicated by the fact that the outer cylinders may have more sectors than the inner ones. The controller then determines which cylinder the head is currently over and gives a sequence of pulses to move the head the required number of cylinders. After that, the controller waits for the disk to rotate, placing the required sector under the head. Then, the processes of reading and storing bits as they arrive from the disk, the processes of removing the header and calculating checksum. Next, the controller collects the received bits into words and stores them in memory. To carry out this work, the controllers contain built-in firmware.

The I / O device itself has a simple interface that must comply with a single IDE standard (IDE, Integrated Drive Electronics - built-in drive interface). Since the device interface is hidden by the controller, the operating system sees only the controller interface, which may be different from the device interface.

Since the controllers different devices I/Os differ from each other, then their management requires an appropriate software- drivers. Therefore, each controller manufacturer must supply drivers for the operating systems they support. There are three ways to install the driver into the operating system:

Relink the kernel with the new driver and then reboot the system, this is how many UNIX systems work;

Create an entry in the file included in the operating system that the driver is required and reboot the system, during the initial boot, the operating system will find correct driver and download it; this is how the Windows operating system works;

Accept new drivers and quickly install them using the operating system while it is running; the method is used by removable USB and IEEE 1394 buses, which always need dynamically loaded drivers.

There are specific registers to communicate with each controller. For example, a minimal disk controller might have registers for specifying the disk address, the memory address, the sector number, and the direction of the operation (read or write). To activate the controller, the driver receives a command from the operating system, then translates it into values suitable for writing to the device registers.

On some computers, I/O device registers are mapped to the operating system's address space, so they can be read or written like ordinary words in memory. Register addresses are placed in RAM beyond the reach of user programs in order to protect user programs from hardware (for example, using base and limit registers).

On other computers, device registers are located in special I/O ports, and each register has its own port address. On such machines, IN and OUT instructions are available in privileged mode, which allow drivers to read and write registers. The first scheme eliminates the need for special I/O commands, but uses some address space. The second scheme does not affect the address space, but requires the presence of special instructions. Both schemes are widely used. Input and output of data is carried out in three ways.

1. The user program issues a system request, which the kernel translates into a procedure call to the appropriate driver. The driver then starts the I/O process. During this time, the driver performs a very short program cycle, constantly polling for the readiness of the device it is working with (usually there is some bit that indicates that the device is still busy). When the I/O operation completes, the driver places the data where it is needed and returns to its original state. The operating system then returns control to the program that made the call. This method is called ready-waiting or active-waiting, and has one disadvantage: the processor must poll the device until it has completed its work.

2. The driver starts the device and asks it to issue an interrupt at the end of the I / O. After that, the driver returns the data, the operating system blocks the caller, if necessary, and begins to perform other tasks. When the controller detects the end of a data transfer, it generates an interrupt to signal the completion of the operation. The I/O implementation mechanism is as follows (Fig. 6.a):

Step 1: the driver sends a command to the controller, writing information to the device registers; the controller starts the I/O device.

Step 2: After finishing reading or writing, the controller sends a signal to the interrupt controller chip.

Step 3: If the interrupt controller is ready to receive an interrupt, then it sends a signal to a specific pin on the CPU.

Step 4: The interrupt controller puts the I/O device number on the bus so that the CPU can read it and know which device has completed. When an interrupt is received by the CPU, the contents of the program counter (PC) and the processor status word (PSW) are pushed onto the current stack, and the processor switches to the privileged mode of operation (operating system kernel mode). The I/O device number can be used as an index to a piece of memory used to look up the address of an interrupt handler. this device. This piece of memory is called the interrupt vector. When the interrupt handler (part of the device driver that sent the interrupt) starts, it removes the program counter and processor status word from the stack, saves them, and queries the device for information about its state. After the interrupt processing is completed, control returns to the previously running user program, to the command whose execution has not yet been completed (Fig. 6 b).

3. For input-output information, a direct memory access controller (DMA, Direct Memory Access) is used, which controls the flow of bits between the RAM and some controllers without the constant intervention of the central processor. The processor calls the DMA chip, tells it how many bytes to transfer, tells it the device and memory addresses and the direction of the data transfer, and lets the chip take care of itself. Upon completion, the DMA initiates an interrupt, which is handled appropriately.

Interrupts can occur at inopportune times, such as while processing another interrupt. For this reason, the CPU has the ability to disable interrupts and enable them later. While interrupts are disabled, all devices that have completed their work continue to send their signals, but the processor is not interrupted until interrupts are enabled. If multiple devices terminate at once while interrupts are disabled, the interrupt controller decides which one should be handled first, usually based on the static priorities assigned to each device.

The Pentium computer system has eight buses (cache bus, local bus, memory bus, PCI, SCSI, USB, IDE, and ISA). Each bus has its own data rate and its own functions. The operating system must have information about all buses in order to manage the computer and its configuration.

ISA bus (Industry Standard Architecture, industry standard architecture) - first appeared on IBM PC / AT computers, operates at a frequency of 8.33 MHz and can transfer two bytes per clock with maximum speed 16.67 MB/s; it is included for backward compatibility with older slow I/O cards.

PCI bus (Peripheral Component Interconnect, interface peripherals) - created by Intel as a successor to the ISA bus, can operate at a frequency of 66 MHz and transfer 8 bytes per clock at a speed of 528 MB / s. Currently PCI bus use most high-speed I/O devices, as well as computers with non-Intel processors, as many I/O cards are compatible with it.

The local bus on the Pentium system is used by the CPU to send data to the PCI bridge chip, which accesses the memory over a dedicated memory bus, often running at 100 MHz.

The cache bus is used to connect an external cache, since Pentium systems have a first level cache (L1 cache) built into the processor and a large external second level cache (L2 cache).

The IDE bus is used to connect peripheral devices: disks and CD-ROM drives. The bus is a descendant of the PC/AT disk controller interface and is now standard on all Pentium-based systems.

USB bus (Universal Serial Bus, universal serial bus) is designed to connect slow I/O devices (keyboards, mice) to the computer. It uses a small four-wire connector, two wires of which supply power to USB devices.

The USB bus is a centralized bus where the host polls I/O devices every millisecond to see if they have data. It can manage data downloads at 1.5 MB/s. All USB devices use the same driver, so they can be connected to the system without rebooting the system.

The SCSI bus (Small Computer System Interface, system interface of small computers) is a high-performance bus used for fast drives, scanners, and other devices that require significant bandwidth. Its performance reaches 160 MB / s. The SCSI bus is used on Macintosh systems and is popular on UNIX systems and other Intel-based systems.

The IEEE 1394 (FireWire) bus is a bit-serial bus and supports burst data transfer rates up to 50 MB/s. This feature allows you to connect portable digital camcorders and other multimedia devices to your computer. Unlike the USB bus, the IEEE 1394 bus does not have a central controller.

The operating system must be able to recognize hardware components and be able to configure them. This requirement has led by Intel and Microsoft to develop a personal computer system called plug and play. Prior to this system, each I/O board had fixed I/O register addresses and an interrupt request level. For example, the keyboard used interrupt 1 and addresses in the range 0x60 to 0x64; the floppy disk controller used interrupt 6 and addresses 0x3F0 to 0x3F7; the printer used interrupt 7 and addresses from 0x378 to 0x37A.

If the user purchased sound card and the modem, it happened that these devices accidentally used the same interrupt. There was a conflict, so the devices could not work together. A possible solution was to build a set of DIP switches (jumpers, jumpers) into each board and configure each board so that the port addresses and interrupt numbers of different devices do not conflict with each other.

Plug and play allows the operating system to automatically collect information about I/O devices, centrally assign interrupt levels and I/O addresses, and then report this information to each board. Such a system runs on Pentium computers. Every computer with a Pentium processor contains a motherboard that contains the program - BIOS system(Basic Input Output System - base system I/O). The BIOS contains low-level I/O programs, including procedures for reading from the keyboard, for displaying information on the screen, for inputting/outputting data from the disk, and so on.

When the computer boots up, the BIOS system starts, which checks the amount of RAM installed in the system, the connection and correct operation of the keyboard and other main devices. Next, the BIOS checks the ISA and PCI buses and all devices attached to them. Some of these devices are traditional (pre-plug and play). They have fixed interrupt levels and an I/O port address (for example, set using switches or jumpers on the I/O board that cannot be changed by the operating system). These devices are enrolled, then plug and play device registrations go through. If the devices present are different from those at the time of the last boot, the new devices are configured.

The BIOS then determines which device to boot from by trying each in turn from the list stored in CMOS memory. The user can modify this list by entering the BIOS configuration program immediately after booting. Usually, an attempt is made to boot from a floppy disk first. If that fails, the CD is tried. If the computer does not have both a floppy disk and a CD, the system boots from the hard disk. From the boot device, the first sector is read into memory and executed. This sector contains a program that checks the partition table at the end of the boot sector to determine which partition is active. The secondary bootloader is then read from the same partition. He reads from active partition operating system and starts it.

The operating system then polls the BIOS for information about the computer's configuration and checks for a driver for each device. If the driver is not present, the operating system prompts the user to insert a floppy disk or CD containing the driver (these disks are supplied by the device manufacturer). If all the drivers are in place, the operating system loads them into the kernel. It then initializes the driver tables, creates any necessary background processes, and starts the password entry program or GUI at each terminal.

5. History of the development of computer technology

All IBM-compatible personal computers are equipped with Intel-compatible processors. The history of the development of microprocessors of the Intel family is briefly as follows. Intel's first general-purpose microprocessor appeared in 1970. It was called the Intel 4004, was four-bit, and had the ability to input/output and process four-bit words. Its speed was 8000 operations per second. The Intel 4004 microprocessor was designed for use in programmable calculators with 4K bytes of memory.

Three years later, Intel released the 8080 processor, which could already perform 16-bit arithmetic operations, had a 16-bit address bus and, therefore, could address up to 64 KB of memory (2516 0 = 65536). 1978 was marked by the release of the 8086 processor with a word size of 16 bits (two bytes), a 20-bit bus, and could already operate with 1 MB of memory (2520 0 = 1048576, or 1024 KB), divided into blocks (segments) of 64 KB each every. The 8086 processor was equipped with computers compatible with the IBM PC and IBM PC / XT. The next major step in the development of new microprocessors was the 8028b processor, which appeared in 1982. It had a 24-bit address bus, could handle 16 megabytes of address space, and was installed on computers compatible with the IBM PC/AT. In October 1985, the 80386DX was released with a 32-bit address bus (maximum address space is 4 GB), and in June 1988, the 80386SX was released, which was cheaper than the 80386DX and had a 24-bit address bus. Then, in April 1989, the 80486DX microprocessor appears, and in May 1993, the first version of the Pentium processor (both with a 32-bit address bus).

In May 1995 in Moscow at the international exhibition Komtek-95, Intel presented new processor- P6.

One of the most important design goals for the P6 was to double the performance of the Pentium processor. At the same time, the production of the first versions of P6 will be carried out according to the already debugged "Intel" and used in production. latest versions Pentium semiconductor technology (0.6 µm, Z, Z V).

Using the same manufacturing process ensures that mass production of the P6 can be achieved without major problems. However, this means that doubling the performance is achieved only through comprehensive improvements in the microarchitecture of the processor. The P6 microarchitecture was developed using a carefully thought out and tuned combination of various architectural methods. Some of them were previously tested in the processors of "large" computers, some were proposed by academic institutions, the rest were developed by engineers from the Intel company. This unique combination of architectural features, which Intel refers to as "dynamic execution," allowed the first P6 chips to exceed their originally intended performance levels.

When compared with alternative "Intel" processors of the x86 family, it turns out that the P6 microarchitecture has much in common with the microarchitecture of the Nx586 processors from NexGen and K5 from AMD, and, although to a lesser extent, with the M1 from Cyrix. This commonality is explained by the fact that the engineers of the four companies were solving the same problem: introducing elements of RISC technology while maintaining compatibility with the Intel x86 CISC architecture.

Two crystals in one case

The main advantage and unique feature of the P6 is the placed in the same package with the processor, a secondary static cache memory of 256 KB in size, connected to the processor by a dedicated bus. This design should significantly simplify the design of systems based on P6. P6 is the first mass-produced microprocessor containing two chips in one package.

The CPU die in the P6 contains 5.5 million transistors; second-level cache crystal - 15.5 million. In comparison, the latest Pentium model included about 3.3 million transistors, and L2 cache was implemented using an external set of memory chips.

So big number transistors in the cache is due to its static nature. The static memory in the P6 uses six transistors to store one bit, while the dynamic memory would use one transistor per bit. Static memory is faster but more expensive. Although the number of transistors on a chip with a secondary cache is three times greater than on a processor chip, the physical dimensions of the cache are smaller: 202 square millimeters versus 306 for the processor. Both dies are housed together in a 387-pin ceramic package ("dual cavity pin-drid array"). Both dies are manufactured using the same technology (0.6 µm, 4-layer Metal-BiCMOS, 2.9 V). Estimated maximum power consumption: 20 W at 133 MHz.

The first reason to combine the processor and secondary cache in one package is to facilitate the design and manufacture of high-performance systems based on the P6. The performance of a computing system built on a fast processor depends very much on the fine tuning of the microcircuits of the processor's environment, in particular the secondary cache. Not all computer manufacturers can afford the relevant research. In the P6, the secondary cache is already optimally tuned to the processor, making it easier to design the motherboard.

The second reason for combining is to improve performance. The second-level kzsh is connected to the processor by a specially dedicated 64-bit wide bus and operates at the same clock frequency as the processor.

The first 60 and 66 MHz Pentium processors accessed the secondary cache over a 64-bit bus at the same clock speed. However, as Pentium clock speeds increased, it became too difficult and expensive for designers to maintain that frequency for motherboard. Therefore, frequency dividers began to be used. For example, for a 100 MHz Pentium, the external bus operates at a frequency of 66 MHz (for a 90 MHz Pentium - 60 MHz, respectively). The Pentium uses this bus both for secondary cache accesses and for accessing main memory and other devices such as the PCI chip set.

Using a dedicated bus to access the secondary cache improves the performance of the computing system. First, this achieves full synchronization of processor and bus speeds; secondly, competition with other I / O operations and the associated delays are excluded. The L2 cache bus is completely separate from the external bus through which memory and external devices are accessed. The 64-bit external bus can run at half, one-third, or one-fourth the speed of the processor, with the secondary cache bus operating independently at full speed.

Combining the processor and secondary cache in the same package and communicating via a dedicated bus is a step towards the performance enhancement techniques used in the most powerful RISC processors. So, in the Alpha 21164 processor from "Digital" the second-level cache of 96 kb is located in the processor core, like the primary cache. This provides very high cache performance by increasing the number of transistors per chip to 9.3 million. The performance of the Alpha 21164 is 330 SPECint92 at 300 MHz. The performance of the P6 is lower (Intel estimates 200 SPECint92 at 133 MHz), but the P6 provides best ratio cost/performance for its potential market.

When evaluating the cost/performance ratio, one should take into account that, although the P6 may be more expensive than its competitors, most other processors should be surrounded by an additional set of memory chips and a cache controller. In addition, to achieve comparable cache performance, other processors will need to use a cache larger than 256 KB.

"Intel" usually offers numerous variations of their processors. This is done to meet the diverse requirements of system designers and leave less room for competitor models. Therefore, it can be assumed that soon after the release of P6, both modifications with an increased amount of secondary cache memory and cheaper modifications with external location secondary cache, but with a dedicated bus between the secondary cache and the processor.

Pentium as a starting point

The Pentium processor with its pipelined and superscalar architecture has reached an impressive level of performance. The Pentium contains two 5-stage pipelines that can run in parallel and execute two integer instructions per machine clock. In this case, only a pair of commands can be executed in parallel, following one after another in the program and satisfying certain rules, for example, the absence of register dependencies of the "write after reading" type.

In P6, to increase throughput, a transition was made to a single 12-stage pipeline. The increase in the number of stages leads to a decrease in the work performed at each stage and, as a result, to a decrease in the time the team spends at each stage by 33 percent compared to the Pentium. This means that using the same technology in manufacturing the P6 as in manufacturing the 100 MHz Pentium will result in a P6 clocked at 133 MHz.

The capabilities of Pentium's superscalar architecture, with its ability to execute two instructions per clock, would be hard to beat without a completely new approach. Applied in P6 new approach eliminates the rigid relationship between the traditional "fetch" and "execute" phases, when the sequence of commands passing through these two phases corresponds to the sequence of commands in the program.

The new approach is associated with the use of the so-called command pool and with new effective methods predicting the future behavior of the program. In this case, the traditional "execution" phase is replaced by two: "dispatching/execution" and "rollback". As a result, commands can start executing in any order, but always finish their execution in accordance with their original order in the program. The P6 core is implemented as three independent devices interacting through a pool of instructions (Fig. 1).

The main problem on the way to improve performance

The decision to organize the P6 as three independent devices interacting through a pool of instructions was made after a thorough analysis of the factors that limit the performance of modern microprocessors. The fundamental fact, which is true for the Pentium and many other processors, is that real programs do not use the full power of the processor.

While processor speeds have increased at least 10 times over the past 10 years, main memory access times have only decreased by 60 percent. This increasing lag in memory performance relative to processor speed was the fundamental problem that had to be addressed in the design of the P6.

One possible approach to solving this problem is to shift its focus to the development of high-performance components surrounding the processor. However, mass production of systems that include both a high-performance processor and high-speed dedicated environment chips would be too costly.

One could try to solve the problem using brute force, namely, to increase the size of the second level cache in order to reduce the percentage of cases where the necessary data is not in the cache.

This solution is effective, but also extremely expensive, especially considering today's speed requirements for L2 cache components. The P6 was designed from the point of view of an efficient implementation of a complete computing system, and it was required that the high performance of the system as a whole be achieved using a cheap memory subsystem.

Thus, P6's combination of architectural techniques, such as improved branch prediction (nearly always correctly determines the next sequence of instructions), data flow analysis (determines the optimal order of execution of instructions), and pre-emptive execution (the expected sequence of instructions is executed without idle time in the optimal order), allowed us to double the performance relative to the Pentium using the same manufacturing technology. This combination of methods is called dynamic execution.

Intel is currently developing a new 0.35 micron manufacturing technology that will enable the production of P6 processors with a core clock speed of over 200 MHz.

P6 as a platform for building powerful servers

Among the most significant computer development trends in recent years, both the increasing use of x86-based systems as application servers and the growing role of Intel as a supplier of non-processor technologies such as buses, networking technologies, video compression, flash memory and tools system administration.

The release of the P6 processor continues Intel's policy of bringing capabilities previously reserved for more expensive computers to the mass market. Parity is provided for the internal registers P6, and the 64-bit bus connecting the processor core and the second-level cache is equipped with error detection and correction tools. The new diagnostic capabilities built into the P6 allow manufacturers to design more reliable systems. P6 provides the ability to receive information about more than 100 processor variables or events occurring in the processor, such as the absence of data in the cache, the contents of registers, the appearance of self-modifying code, and so on, through processor contacts or using software. The operating system and other programs can read this information to determine the state of the processor. P6 also has improved support for checkpoints, that is, it provides the ability to roll back the computer to a previously fixed state in the event of an error.

What is the purpose of computer technology. The history of the development of computer technology. Classification of computers. The composition of the computing system. Hardware and software. Classification of service and applied software. By stages

Computer Engineering

History of the development of computer technology

Similar Documents