Well... what to say except... it works!
It is included by default on some distributions: Mandrake 9.2, Debian, Caldera (I think).
I have run tests for several years, without flaws. Many people over the world have expressed their gratitude, clearly stating that it works for them as well. And everybody thinks this is so cool they want to get their hands on bad memory, just to tease the less fortunate.
I used the very thorough RAM checker Memtest86 to find the erroneous addresses on my chips, but basically any tester would do.
Specifically for memtest86, I made an extension to derive badram=... arguments for the LILO command line (go to the configure/printmode menu).
Since LILO can also offer memtest86, it means that handling your bad RAM can be done with two reboots and no screwdriver.
I've seen operating systems that need more reboots for far less impressive results ;-)
Hey, why is this thing not part of standard distributions?!?
Oh well, Debian is going to do it soon, perhaps others will follow later on...
I performed benchmarks to prove that BadRAM has no influence on system performance. This is indeed the case, as shown on the benchmark page that provides all the information of interest (and probably much more too).
I performed benchmarks to compare a system with 64 MB of flawless memory and 64 MB of faulty memory. The result: Performance differences are negligable. The conclusion: BadRAM forms a good extension to the Linux kernel.
Read about the benchmarks:
Terms: I shall coin the term hole for a faulty byte in RAM, and refer to such faulty RAM modules as BadRAM. To contradict, classical (hole-free) RAM will be referred to here as OK RAM. Many BadRAMs contain holes all over, spread in a regular pattern, but I have developed a patch that makes Linux run smoothly on such RAMs.
My computer used to run with 128 MB of flawless RAM, with a CAS timing of 2. It has TLB and cache, as any Pentium-II system. In the new situation, I added two RAM modules of 32 MB each, each with holes, and each with a CAS timing of 3.
The first BadRAM has 512 holes, spread through the 8MB-16MB range of its 32MB. The second BadRAM has 256 holes, spread through the 0MB-8MB range of its 32MB.
The two interesting cases to compare would be:
Factors of influence on this measurement are:
These factors are dealt with as follows:
mem=...
),
Note: Why reduce the flawless memory with the pages that are sacrificed from the BadRAMs? Well, the point I intend to demonstrate is that BadRAM performs equally well as normal RAM after bad pages have been taken out. So, I should compare the 61MB of BadRAM with 61MB out of the flawless RAM.
Software: The measurements are performed with lmbench-2alpha10.
These are the available results for these measurements:
[A] 61MB of OK RAM | [B] 64MB of BadRAM, 3MB wrong | |
---|---|---|
Measurements | Resultset #1 | Resultset #1 |
Resultset #2 | Resultset #2 | |
Resultset #3 | Resultset #3 | |
Resultset #4 | Resultset #4 | |
Resultset #5 | Resultset #5 | |
Linux information | dmesg | dmesg |
/proc/meminfo | /proc/meminfo | |
/proc/cmdline | /proc/cmdline | |
LMbanch' made | see | see |
stats | stats |
make see
results.
Bandwidths are not discussed, as they are more likely to be influenced by the fact that they address different RAMs than by the BadRAM algorithms (which take no part in them).
The dmesg values reported for memory are different:
Memory: 60244k/62464k available (940k kernel code, 416k reserved, 804k data, 60k init, 0k badram) Memory: 60212k/65536k available (940k kernel code, 416k reserved, 836k data, 60k init, 3072k badram)The first line shows that no pages received a `BadRAM' treatment, and therefore, that no influence of BadRAM routines on runtime performance is possible. Note the difference in data segment for the kernel; no doubt, this is because bad pages are stored in the page tables, even though the memory is never made available.
From the make see
results, one table of interest is the process(or) table, which are almost equal for both measurements.
These tables are:
Processor, Processes - times in microseconds - smaller is better ---------------------------------------------------------------- Host OS Mhz null null open selct sig sig fork exec sh call I/O stat clos inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 8K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 8K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 8K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 8K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 8Kfor OK RAM, and for BadRAM is:
Processor, Processes - times in microseconds - smaller is better ---------------------------------------------------------------- Host OS Mhz null null open selct sig sig fork exec sh call I/O stat clos inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.5 3 0.3K 2K 9K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 9K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 9K i686-linu Linux 2.2.14 351 0.8 1.2 7 9 0.04K 2.6 3 0.3K 2K 9K i686-linu Linux 2.2.14 351 0.9 1.2 7 9 0.04K 2.6 3 0.3K 2K 9KThe difference is mainly in the last line, which is 8K for OK RAM, and 9K for BadRAM. What does that mean?
The tables for context switching times are:
Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ----- ------ ------ ------ ------ ------- ------- i686-linu Linux 2.2.14 1 19 58 19 106 22 192 i686-linu Linux 2.2.14 1 19 58 19 125 23 192 i686-linu Linux 2.2.14 1 19 58 19 97 26 192 i686-linu Linux 2.2.14 1 18 58 19 125 22 192 i686-linu Linux 2.2.14 1 19 58 19 108 26 192for OK RAM, and for BadRAM it is:
Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ----- ------ ------ ------ ------ ------- ------- i686-linu Linux 2.2.14 1 18 58 19 131 23 192 i686-linu Linux 2.2.14 1 19 58 19 104 24 191 i686-linu Linux 2.2.14 1 18 58 19 94 22 192 i686-linu Linux 2.2.14 1 18 58 19 92 24 192 i686-linu Linux 2.2.14 1 19 58 19 112 24 192The averages for these columns are:
Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Measurement 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --- -------- ---------- ----- ------ ------ ------ ------ ------- ------- [A] 61MB OK RAM 1 18,8 58.0 19.0 112 23.8 192 [B] 64MB-3MB BadRAM 1 18.4 58.0 19.0 107 23.4 192The last measurement fell out a little lower, but the result was rounded out; I have some difficulties believing in more than 3 digits of true value for a measurement of 5 minutes. To my utter surprise, BadRAM seems to cause improvements for the other values! I have the tendency to assign that to the measurements.
The latency tables for local communication are, for OK RAM:
*Local* Communication latencies in microseconds - smaller is better ------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17and for BadRAM:
*Local* Communication latencies in microseconds - smaller is better ------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17 i686-linu Linux 2.2.14 1 9 17How boring; any differences fall under the benchmark's threshold
:)
.
The tables for context swithing times are, for OK RAM:
File & VM system latencies in microseconds - smaller is better -------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page Create Delete Create Delete Latency Fault Fault --------- ------------- ------ ------ ------ ------ ------- ----- ----- i686-linu Linux 2.2.14 19 2 85 3 4624 1 0.8K i686-linu Linux 2.2.14 19 2 85 3 4603 1 0.8K i686-linu Linux 2.2.14 19 2 82 3 4656 1 0.8K i686-linu Linux 2.2.14 19 2 81 3 4690 1 0.8K i686-linu Linux 2.2.14 19 2 76 3 4642 1 0.8Kand for BadRAM:
File & VM system latencies in microseconds - smaller is better -------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page Create Delete Create Delete Latency Fault Fault --------- ------------- ------ ------ ------ ------ ------- ----- ----- i686-linu Linux 2.2.14 19 2 85 3 4647 1 0.7K i686-linu Linux 2.2.14 19 2 85 3 4361 1 0.7K i686-linu Linux 2.2.14 19 2 85 3 4472 1 0.7K i686-linu Linux 2.2.14 19 2 79 3 4444 1 0.7K i686-linu Linux 2.2.14 19 2 76 3 4500 1 0.7KThe averages for these columns are:
Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Measurement 0K File 10K File Mmap Prot Page Create Delete Create Delete Latency Fault Fault --- -------- ---------- ------ ------ ------ ------ ------- ----- ----- [A] 61MB OK RAM 19 2 81.8 3 4643 1 0.8K [B] 64MB-3MB BadRAM 19 2 82.0 3 4485 1 0.7KHere too, there are no signs of worse performance caused by BadRAM. We are not interested in the question whether BadRAM performs better than OK RAM, just whether there is a performance loss when replacing OK RAM with a same amount of OK memory in BadRAM.
The tables for memory latency are, for OK RAM:
Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- --- ---- ---- -------- ------- i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 78 163and for BadRAM:
Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- --- ---- ---- -------- ------- i686-linu Linux 2.2.14 351 8 78 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163 i686-linu Linux 2.2.14 351 8 62 163And these results show no distinction between OK RAM and BadRAM performance either.
Note: I am unsure what to do with the `check graphs' message.
BadRAM performs equally well as normal RAM after bad pages have been taken out.
This is as expected. There is no influence to be expected, because the BadRAM's bad pages are never supplied to the kernel allocation routines of Linux. Although the regular appearance of holes in a RAM leads to increased fragmentation of page ranges, this is not a major problem because most memory is user space memory, which is allocated page-by-page anyway. In user space, memory page regions are formed from single pages through the MMU.
From version 2.4.0 on, there is not even any code overhead in the BadRAM patch; all the code is put into the __init
segment of the kernel, which is flushed out of memory after booting.
This means that the only overhead from BadRAM from this version forth is boot time (milliseconds) and kernel size (kilobytes).