[ overview | motivations | results | contact | download | xen | links | Rick van Rein ]



BadRAM: Results

Results up to now

Well... what to say except... it works!

It is included by default on some distributions: Mandrake 9.2, Debian, Caldera (I think).

I have run tests for several years, without flaws. Many people over the world have expressed their gratitude, clearly stating that it works for them as well. And everybody thinks this is so cool they want to get their hands on bad memory, just to tease the less fortunate.

I used the very thorough RAM checker Memtest86 to find the erroneous addresses on my chips, but basically any tester would do. Specifically for memtest86, I made an extension to derive badram=... arguments for the LILO command line (go to the configure/printmode menu). Since LILO can also offer memtest86, it means that handling your bad RAM can be done with two reboots and no screwdriver. I've seen operating systems that need more reboots for far less impressive results ;-)
Hey, why is this thing not part of standard distributions?!? Oh well, Debian is going to do it soon, perhaps others will follow later on...

Benchmarks

I performed benchmarks to prove that BadRAM has no influence on system performance. This is indeed the case, as shown on the benchmark page that provides all the information of interest (and probably much more too).

I performed benchmarks to compare a system with 64 MB of flawless memory and 64 MB of faulty memory. The result: Performance differences are negligable. The conclusion: BadRAM forms a good extension to the Linux kernel.

Read about the benchmarks:

Terms: I shall coin the term hole for a faulty byte in RAM, and refer to such faulty RAM modules as BadRAM. To contradict, classical (hole-free) RAM will be referred to here as OK RAM. Many BadRAMs contain holes all over, spread in a regular pattern, but I have developed a patch that makes Linux run smoothly on such RAMs.

Description of available hardware

My computer used to run with 128 MB of flawless RAM, with a CAS timing of 2. It has TLB and cache, as any Pentium-II system. In the new situation, I added two RAM modules of 32 MB each, each with holes, and each with a CAS timing of 3.

The first BadRAM has 512 holes, spread through the 8MB-16MB range of its 32MB. The second BadRAM has 256 holes, spread through the 0MB-8MB range of its 32MB.

The two interesting cases to compare would be:

  1. The OK RAM only,
  2. The BadRAM only.

Correction of Influential Factors

Factors of influence on this measurement are:

  1. The memory size influences buffering and so on,
  2. The different CAS timing for the OK RAM and the BadRAM,
  3. The pages sacrificed because they contain a hole reduce the size of available RAM,
  4. Networking, daemons, the weather and quantum-mechanical non-determinism.

These factors are dealt with as follows:

  1. The used memory size will be equaled between the two tests (using the LILO boot option mem=...),
  2. The BIOS will be instructed to assume a CAS timing of 3 in all cases; leaving a BadRAM in after the used region of RAM will help to convince the BIOS that this is the right value,
  3. The amount of flawless memory offered to Linux will be reduced to the actually available amount in the BadRAM case (64MB minus 3MB of sacrificed pages in this case on i386),
  4. The tests are performed in single user mode (no networking) by root and be the averages of 5 independent measurements.
I hope and expect this accounts for all possible problems.

Note: Why reduce the flawless memory with the pages that are sacrificed from the BadRAMs? Well, the point I intend to demonstrate is that BadRAM performs equally well as normal RAM after bad pages have been taken out. So, I should compare the 61MB of BadRAM with 61MB out of the flawless RAM.

Software: The measurements are performed with lmbench-2alpha10.

Measurements

These are the available results for these measurements:

The following subsections deal with the latency tables in the make see results. Bandwidths are not discussed, as they are more likely to be influenced by the fact that they address different RAMs than by the BadRAM algorithms (which take no part in them).

The dmesg values reported for memory are different:

The first line shows that no pages received a `BadRAM' treatment, and therefore, that no influence of BadRAM routines on runtime performance is possible. Note the difference in data segment for the kernel; no doubt, this is because bad pages are stored in the page tables, even though the memory is never made available.

Processor, Processes

From the make see results, one table of interest is the process(or) table, which are almost equal for both measurements. These tables are:

for OK RAM, and for BadRAM is: The difference is mainly in the last line, which is 8K for OK RAM, and 9K for BadRAM. What does that mean?

Context Switching

The tables for context switching times are:

for OK RAM, and for BadRAM it is: The averages for these columns are: The last measurement fell out a little lower, but the result was rounded out; I have some difficulties believing in more than 3 digits of true value for a measurement of 5 minutes. To my utter surprise, BadRAM seems to cause improvements for the other values! I have the tendency to assign that to the measurements.

Local Communication Latencies

The latency tables for local communication are, for OK RAM:

and for BadRAM: How boring; any differences fall under the benchmark's threshold :).

Virtual Memory Latencies

The tables for context swithing times are, for OK RAM:

and for BadRAM: The averages for these columns are: Here too, there are no signs of worse performance caused by BadRAM. We are not interested in the question whether BadRAM performs better than OK RAM, just whether there is a performance loss when replacing OK RAM with a same amount of OK memory in BadRAM.

Memory Latency

The tables for memory latency are, for OK RAM:

and for BadRAM: And these results show no distinction between OK RAM and BadRAM performance either.

Note: I am unsure what to do with the `check graphs' message.

Conclusion

BadRAM performs equally well as normal RAM after bad pages have been taken out.

This is as expected. There is no influence to be expected, because the BadRAM's bad pages are never supplied to the kernel allocation routines of Linux. Although the regular appearance of holes in a RAM leads to increased fragmentation of page ranges, this is not a major problem because most memory is user space memory, which is allocated page-by-page anyway. In user space, memory page regions are formed from single pages through the MMU.

Later amendements

From version 2.4.0 on, there is not even any code overhead in the BadRAM patch; all the code is put into the __init segment of the kernel, which is flushed out of memory after booting. This means that the only overhead from BadRAM from this version forth is boot time (milliseconds) and kernel size (kilobytes).