Support ess. Types of RAM. Unbuffered memory with ECC, registered memory with ECC. Why is the value in RAM memory cells distorted?

Explain what “ECC Support” is on random access memory

  1. online memory checking for errors
  2. This is an error correction function. Such memory is installed on servers, because they cannot lag, shut down or be overloaded due to errors. For a home computer this is not a necessary thing, although it is useful. If you decide to install this, make sure that your motherboard supports this type of RAM with ECC.
  3. So can we limit ourselves to the memtest program? or does this technology constantly monitor and correct small values ​​in memory data?
  4. ECC (Error Correct Code) - identification and correction of errors (other decodings of the same abbreviation are possible) - an algorithm that replaced “parity control”. In contrast to the latter, each bit is included in more than one checksum, which allows, if an error occurs in one bit, to restore the error address and correct it. Typically, errors in two bits are also detected, although they are not corrected. To implement these capabilities, an additional memory chip is installed on the module and it becomes 72-bit, in contrast to the 64-bit data of a conventional module. ECC is supported by all modern motherboards designed for server solutions, as well as some “general purpose” chipsets. Some types of memory (Registered, Full Buffered) are available only in the ECC version. It should be noted that ECC is not a panacea for defective memory and is used to correct random errors, reducing the risk of computer problems from random changes in the contents of memory cells caused by external factors, such as background radiation.
    Registered memory modules are recommended for use in systems that require (or support) 4 GB or more of RAM. They are always 72-bit wide, i.e. they are ECC modules, and contain additional register chips for partial buffering.
    PLL-Phase Locked Loop - a circuit for automatically adjusting the frequency and phase of the signal, serves to reduce the electrical load on the memory controller and increase operational stability when using a large number of memory chips, used in all buffered memory modules.
    Buffered – buffered module. Due to the high total electrical capacity of modern memory modules, their long “charging” time leads to large amounts of time spent on write operations. To avoid this, some modules (usually 168-pin DIMMs) are equipped with a special chip (buffer) that stores incoming data relatively quickly, which frees up the controller. Buffered DIMMs are generally incompatible with unbuffered ones. Partially buffered modules are also called “Registered”, and Full Buffered modules are called “FB-DIMM”. In this case, “unbuffered” means ordinary memory modules without buffering means.
    Parity – parity, modules with parity control, also parity control. Quite an old principle of checking data integrity. The essence of the method is that a checksum is calculated for a data byte at the recording stage, which is stored as a special parity bit in a separate chip. When reading data, the checksum is calculated again and compared with the parity bit. If they match, the data is considered authentic, otherwise a parity error message is generated (usually causing the system to stop). The obvious disadvantages of the method include the high cost of memory required to store extra parity bits, vulnerability to double errors (as well as false positives when an error occurs in the parity bit), and system stops even with a minor error (say, in a video frame). Currently not applicable.
    SPD is a chip on a DIMM memory module that contains all the data about it (in particular, performance information) necessary to ensure normal operation. This data is read during the computer self-testing stage, long before booting. operating system and allow you to configure memory access parameters even when there are different types of memory modules in the system at the same time. Some motherboards refuse to work with modules that do not have an SPD chip installed, but such modules are now very rare and are mainly PC-66 modules.
  5. memtest test may not reveal errors, but the test in memtest –Test 1 Addresstest, ownaddress is a deep test for identifying errors in the addressing memory registration - identifies such errors well, so if you have blue screens it's basically ram or hard drive
  6. We already said here, use windowsfix.ru

Also, ECC data protection schemes can be used for memory built into microprocessors: cache memory, register file. Sometimes control is also added to computing circuits.

Description of the problem

There are concerns that the trend towards smaller physical sizes of memory modules will lead to higher error rates due to lower energy particles being able to change bits. On the other hand, the compact size of the memory reduces the likelihood of particles getting into it. Additionally, switching to technologies such as silicon-on-insulator may make memory more resilient.

A study conducted on a large number of Google servers showed that the number of errors could range from 25,000 to 70,000 errors per billion device hours per megabit (that is, 2.5-7.0 × 10 − 11 errors/bit hour) .

Technology

One solution to this problem is parity - the use of an extra bit that records the parity of the remaining bits. This approach allows you to detect errors, but does not allow you to correct them. Thus, if an error is detected, you can only interrupt the execution of the program.

A more reliable approach is to use error correction codes. The most commonly used error correction code is the Hamming code. Most error-correcting memories used in modern computers can correct a single-bit error in a single 64-bit machine word and detect, but not correct, a two-bit error in a single 64-bit word.

The most effective approach to error correction depends on the type of errors expected. It is often assumed that changes to different bits occur independently. In this case, the probability of two errors in one word is negligible. However, this assumption does not hold for modern computers. Memory based on error correction technology Chipkill(IBM), allows you to correct several errors, including damage to an entire memory chip. Other memory correction technologies that do not assume independent errors in different bits include Extended ECC(Sun Microsystems) Chipspare(Hewlett-Packard) and SDDC(Intel).

Many older systems did not report fixed errors, only reporting errors that were found that could not be corrected. Modern systems record both corrected errors (CE, English correctable errors) and uncorrectable errors (UE, English uncorrectable errors). This allows you to replace damaged memory in time: despite the fact that a large number of of corrected errors in the absence of uncorrectable errors does not affect the correct operation of the memory, this may indicate that for a given memory module the likelihood of uncorrectable errors appearing in the future will increase.

Advantages and Disadvantages

Error-correcting memory protects against incorrect operation computer system due to memory corruption and reduces the likelihood of a fatal system failure. However, such memory is more expensive; A motherboard, chipset, and processor that support error-correcting memory can also be more expensive, so such memory is used in systems where uninterrupted operation is important. correct work such as file server, scientific and financial applications.

Error-correcting memory runs 2-3% slower (often requiring one extra memory controller clock cycle to check sums) than conventional memory, depending on the application. Additional logic that implements counting, ECC checking, and error correction requires logical resources and time to operate either in the memory controller itself or in the interface between the CPU and the memory controller.

see also

Notes

  1. Werner Fischer. RAM Revealed (undefined) . admin-magazine.com. Retrieved October 20, 2014.
  2. Archived copy (undefined) (unavailable link). Retrieved November 20, 2016. Archived April 18, 2016.
  3. Single Event Upset at Ground Level, Eugene Normand, Member, IEEE, Boeing Defense & Space Group, Seattle, WA 98124-2499
  4. "A Survey of Techniques for Modeling and Improving Reliability of Computing Systems", IEEE TPDS, 2015
  5. Kuznetsov V.V. Solar-terrestrial physics (course of lectures for physics students). Lecture 7. Solar activity. // Solar storms. Gorno-Altai State University. 2012
  6. Gary M. Swift and Steven M. Guertin. "In-Flight Observations of Multiple-Bit Upset in DRAMs". Jet Propulsion Laboratory
  7. Borucki, “Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level,” 46th Annual International Reliability Physics Symposium, Phoenix, 2008, pp. 482–487
  8. Schroeder, Bianca; Pinheiro, Eduardo; Weber, Wolf-Dietrich. DRAM Errors in the Wild: A Large-Scale Field Study (unspecified) // SIGMETRICS/Performance. - ACM, 2009. - ISBN 978-1-60558-511-6.
  9. Using StrongArm SA-1110 in the On-Board Computer of Nanosatellite (undefined) . Tsinghua Space Center, Tsinghua University, Beijing. Retrieved February 16, 2009. Archived October 2, 2011.
  10. Doug Thompson, Mauro Carvalho Chehab. "EDAC - Error Detection And Correction" Archived September 5, 2009. . 2005-2009. "The "edac" kernel module goal is to detect and report errors that occur within the computer system running under Linux."
  11. Discussion of ECC on pcguide (undefined) . Pcguide.com (April 17, 2001). Retrieved November 23, 2011.

Page 1 of 10

On the Internet, you can often see questions on thematic forums regarding error-correcting memory, namely, its impact on system performance. Today's testing will answer this question.

Before reading this material, we recommend that you familiarize yourself with the materials on the LGA1151 platform.

Theory

Before testing, we'll tell you about memory errors.
Errors that occur in memory can be divided into two types - hardware and random. The first ones are caused by defective DRAM chips. The latter arise due to the effects of electromagnetic interference, radiation, alpha and elementary particles, etc. Accordingly, hardware errors can be corrected only by replacing DRAM chips, and random errors can be corrected using special technologies, for example, ECC (Error-Correcting Code). ECC error correction has two methods in its arsenal: SEC (Single Error Correction) and DED (Double Error Detection). The first one corrects single-bit errors in a 64-bit word, and the second one detects double-bit errors.
The hardware implementation of ECC consists of placing additional memory chips that are needed to write 8-bit checksums. Thus, an error correction memory module with a single-sided design will have 9 memory chips instead of 8 (as in a standard module), and with a double-sided design - 18 instead of 16. At the same time, the width of the module increases from 64 to 72 bits.
When data is read from memory, the calculation is repeated checksum, which is compared with the original one. If the error is in one bit, it is corrected, if in two, it is detected.

Practice

In theory, everything is fine - error-correcting memory increases system reliability, which is very important when building a server or workstation. But in practice, there is also a financial side to this issue. If the server requires memory with error correction, then the workstation can easily do without ECC (many ready-made workstations from different manufacturers are equipped with conventional RAM). How much more expensive is memory with error correction?
A typical 8GB DDR4-2133 module costs about $39, while a typical ECC module costs $48 (at the time of writing). The difference in cost is about 23%, which is quite significant at first glance. But if you look at the total cost of the workstation, this difference will not exceed 5% of it. Thus, purchasing ECC memory only slightly increases the cost of the workstation. The only question that remains is how ECC memory affects processor performance.
In order to answer this question, the editors of the site took for testing Samsung DDR4-2133 ECC and Kingston DDR4-2133 memory modules with the same timings 15-15-15-36 and a capacity of 8 GB.

Samsung M391A1G43DB0-CPB memory modules with error correction have 9 chips soldered on each side.

While regular Kingston KVR21N15D8/8 memory modules have 8 chips soldered on each side.

Test bench: Intel Xeon E3-1275v5, Supermicro X11SAE-F, Samsung DDR4-2133 ECC 8GB, Kingston DDR4-2133 non-ECC 8GB

Detailing

Processor: (HT on; TB off);
- Motherboard: ;
- RAM: 2x (M391A1G43DB0-CPB), 2x (KVR21N15D8/8);
- OS: .

Testing methodology

3DMark06 1.21;
- 7zip 15.14;
- AIDA64 5.60;
- Cinebench R15;
- Fritz 4.2;
- Geekbench 3.4.1;
- LuxMark v3.1;
- MaxxMEMI 1.99;
- PassMark v8;
- RealBench v2.43;
- SiSoftware Sandra 2016;
- SVPmark v3.0.3b;
- TrueCrypt 7.1a;
- WinRAR 5.30;
- wPrime 2.10;
- x264 v5.0.1;
- x265 v0.1.4;
- Kraken;
- Octane;
- Octane 2.0;
- Peacekeeper;
- SunSpider;
- WebXPRT.

More and more people are faced with the problem of RAM incompatibility with their computer. They install the memory, but it does not work and the computer does not turn on. Many users simply do not know that there are several types of memory and which type is suitable for their computer and which is not. IN this manual I'll briefly tell you about personal experience about RAM and where each is used.

You don't know what it means U in the RAM marking, which means E, What means R or F? These letters indicate the type of memory - U(Unbuffered, unbuffered), E(error correction memory, ECC), R(register memory, Registered), F(FB-DIMM, Fully Buffered DIMM - fully buffered DIMM). Now let's look at all these types in more detail.

Types of memory used in computers:

1. Unbuffered memory . Ordinary memory for ordinary people desktop computers, it is also called UDIMM. A memory stick usually has 2, 4, 8 or 16 memory chips on one or both sides. For such memory, the marking usually ends with the letter U (Unbuffered) or without a letter at all, for example DDR2 PC-6400, DDR2 PC-6400U, DDR3 PC-8500U or DDR3 PC-10600. And for laptop memory, the marking ends with the letter S, apparently this is an abbreviation for SO-DIMM, for example DDR2 PC-6400S. A photo of unbuffered memory can be seen below.

2. Error Correcting Memory (ECC memory). Regular unbuffered memory with error correction. Such memory is usually installed in branded computers sold in Europe (NOT SERVERS), the advantage of this memory is its greater reliability during operation. Most memory errors can be corrected during operation, even if they appear, without losing data. Typically, each stick of such memory has 9 or 18 memory chips; one or 2 chips are added. Most regular computers (not servers) and motherboards can handle ECC memory. For such memory, the marking usually ends with the letter E (ECC), for example DDR2 PC-4200E, DDR2 PC-6400E, DDR3 PC-8500E or DDR3 PC-10600E. A photo of unbuffered ECC memory can be seen below.

The difference between memory with ECC and memory without ECC can be seen in the photo:

Although most boards sold support this memory, it is better to find out compatibility with a specific board and processor in advance before purchasing. From personal experience, 90-95% of motherboards and processors can handle ECC memory. Of those that cannot work: boards on Intel chipsets G31, Intel G33, Intel G41, Intel G43, Intel 865PE. All motherboards and processors from the first generation Intel Core everyone can work with ECC memory and this does not depend on motherboards. Under AMD processors in general, almost all motherboards can work with ECC memory, except for cases of individual incompatibility (this happens in the rarest cases).

3. Register memory (Registered). SERVER memory type. Usually he always released with ECC(error correction) and with a "Buffer" chip. The “buffer” chip allows you to increase the maximum number of memory sticks that can be connected to the bus without overloading it, but this is unnecessary data, we will not delve into the theory. Recently, the concepts buffered and registered are almost not distinguished. To exaggerate: register memory = buffered. This memory works ONLY on servers motherboards capable of working with memory using a “buffer” chip.

Typically, register memory strips with ECC have 9, 18 or 36 memory chips and another 1, 2 or 4 “buffer” chips (they are usually in the center and differ in size from the memory chips). For such memory, the marking usually ends with the letter R (Registered), for example DDR2 PC-4200R, DDR2 PC-6400R, DDR3 PC-8500R or DDR3 PC-10600R. Also in the marking of register (server) (buffered) memory there is usually an abbreviation for the word Registered - REG. A photo of buffered (registered) memory with ECC can be seen below.

Remember! Registered memory with ECC is 100% likely NOT to work on regular motherboards. It only works on servers!

4. FB-DIMM Fully Buffered DIMM(Fully Buffered DIMM) is a computer memory standard that is used to improve the reliability, speed, and density of the memory subsystem. In traditional memory standards, data lines are connected from the memory controller directly to the data lines of each DRAM module (sometimes through buffer registers, one register chip per 1-2 memory chips). As the channel width or data transfer rate increases, the signal quality on the bus deteriorates and the bus layout becomes more complicated. This limits memory speed and density. FB-DIMM takes a different approach to solve these problems. This further development ideas for registered modules - Advanced Memory Buffer buffers not only address signals, but also data, and uses serial bus to the memory controller instead of parallel.

The FB-DIMM has 240 pins and is the same length as other DDR DIMMs, but differs in the shape of the tabs. Suitable for server platforms only.

FB-DIMM specifications, like other memory standards, are published by JEDEC.

Intel Company used FB-DIMM memory on systems with Xeon 5000 and 5100 series processors and later (2006-2008). FB-DIMM memory is supported by server chipsets 5000, 5100, 5400, 7300; only with Xeon processors based on the Core microarchitecture (socket LGA771).

In September 2006 AMD company also abandoned plans to use FB-DIMM memory.

If you find it difficult to choose memory for your computer, check with the seller and tell him the model of the motherboard and processor model.

P.S.: Recently, another cheap and interesting type of memory has appeared - I call it " Chinese Fake". If you haven't encountered it yet, I'll tell you. This is the kind of memory that can always be recognized by its contacts; usually they are oxidized, and even if they are cleaned, within a month or two they oxidize again, become cloudy, dirty, and the memory may fail or It doesn’t even smell like gold on the contacts of this memory. Another difference between this memory and the original one is that it works on certain motherboards or processors, for example ONLY on AMD, or only on certain chipsets. There are very few of these chipsets. What is the secret of this “memory” is not yet clear to me, but many people buy it - after all, it is 40-50% cheaper than a similar one. And what is most surprising is that the new “Chinese Counterfeit” usually costs less than the original used memory: ) I won’t talk about the reliability and durability of the work, everything is clear here.

Related articles.

ECC, from English error-correcting code, is translated into Russian as an error correction code. Technology built into flash drive controllers to detect and correct errors during data transfer. ECC can only cope with minor problems; in severe cases, the flash drive will be blocked from writing data.

WHY IS THIS NEEDED?

In the era of high-quality SLC and MLC flash memory chips, there was little point in paying attention to this error correction mechanism. Now, when the overwhelming majority of flash drives either have TLC memory or some kind of MLC DownGrade installed, you should not neglect the settings of the ECC mechanism.

This technology allows you to extend the life of a flash drive until the next plugs with it, because you don’t want to reflash your flash drive every month.

Another positive feature is the likelihood of achieving the maximum possible capacity of a flash drive. It can be even higher than the carrier originally had, especially for flash drives with rejected chips.

FLAWS

The higher you set the ECC parameter value, the more load it will create on the flash drive controller. And this, in turn, can negatively affect its performance, i.e. speed of work. Also among the noticeable disadvantages, high load, is the greater heating of the flash drive.

Most utilities do not use the values ​​used in flash lists (for example: 7b/512B and 72b/1K), but the sums of certain parameters. As a rule, in the range from 0 to 15, in some production programs, due to the support of extremely low-quality memory, from 0 to 20..

ECC Value
MEMORY TYPE: ECC:
SLC 1
MLC 32nm, 35nm, 42nm, 50nm, … 3-4
MLC 24nm, 25nm, 26nm, 32nm 4-8
MLC 21nm, 20nm, 19nm, … 8-12
TLC 27nm, 32nm, 43nm, … 8
TLC 24nm, 21nm, 19nm, … 12-15

Some utilities use a different coordinate system, for example the Dyna production complex for SMI controllers. In this case, just below you can find a link to the specific settings of specific manufacturers.

Let me explain a little how to use the table above. So, if your flash drive is of good quality (a well-established brand), then select the minimum value from it. For gift and fake flash drives, I strongly recommend using maximum value ECC parameter for your memory type.

IMPLEMENTATION IN PRODUCTION UTILITIES

Not all utilities allow manual adjustment of the ECC option. We can say that ECC is a feature of the Sorting component of production utilities. I’ll try to briefly express this in a table for the main manufacturers of USB controllers.

ECC Compatible Software
Company: Tools:
ALCOR AlcorMP_UFD
FC MpTool
AAMP
CHIPSBANK Chipsbank UMPTool
CBM2093 UMPTool
CBM2098 UMPTool
umptool209X
V68 Building Tools
INNOSTOR Innostor MPTool
Innostor 917 LFA MP Tool
PHISON UPTool
UP19_CTool
UP21_CTool
UP23_CTool
SILICON GO KingStore Manufacture Tool
SiliconGo MPTools
SiliconGo MPTool2
SKYMEDI SK6221 MPTool
SMI Dyna Mass Storage Production Tool
ARTICLES ON THE TOPIC OF ECC CORRECTION