| Rick van Rein
BadRAM on Xen
This is a new development for the BadRAM patch: integrating it into the Xen hypervisor. Why you ask? For several good reasons.
I personally haven't used BadRAM on my machines for quite a while,
because it interferes with setting up new operating systems. In
short, it can be a maintenance problem that requires constant
attention. Perhaps Linux is not the ideal place for BadRAM.
The code base of Xen is much simpler, making the BadRAM support
a lot more straightforward.
It is lovely to have BadRAM support for Linux, but it would be even
nicer to have it on FreeBSD, OpenBSD and whatever else we may care
to run on a machine, including even closed-source operating systems.
Doing that as part of these operating systems, is an unfair amount
There have been voices that recommend putting BadRAM in the BIOS,
or in the GRUB bootloader. This does not scale well, as the memory
map is passed between BIOS and Linux (or GRUB and Linux) as an e820
map, which lists segments of contiguous memory; but this was never
designed to carry the long lists that can come from, say, applying
a regular pattern of 4096 memory holes accross a DIMM's memory range.
Such patterns are very common among BadRAM users, which is why it
must be at a place where the level of abstraction is suitable. Xen
is such a place.
Xen already has a
badpage boot option, which supports
summing up singular bad pages, or sequences of pages. Compared to
the patterns that are commonly reported by BadRAM users, this is
very a very limited facility, and hardly useful as a hardware
abstraction. BadRAM takes the general patterns of failing columns
or rows into account, and this makes it a much better match with
memory failures in real hardware.
The code base of Xen is very general, and when changed with care,
it can implement BadRAM for any piece of hardware in just one bit
of code. So if the code is tested to work on i386, it would be
fairly safe to assume that it will not only work on AMD64 as well,
but also on ARM, Sparc and other platforms that are harder to test.
The advantages of BadRAM to embedded environments may be the start
of very interesting developments in which broken memory is actually
used in end-user devices! And the extra code is just a few hundred
bytes, which are reclaimed after booting Xen.
I have an adapted version of Xen ready, and if I poke a BadRAM page
somewhere in the region that it normally allocates for Dom0, I see the
address allocations of Dom0 shift to another place. So it looks like
it is working.
What I need right now is a broken DDR1 DIMM that I can use to test this
on the AMD64 platform. I have older DIMMs, but not anything in this class.
If you have one that is damaged by (presumably) static electricity and that
you can do without, please consider sending it to me!
Things that still need resolution: Protecting from faults in the base
640k of memory; learning why Dom0 is allocated contiguous memory; learning
if Xen could suffer from having its physical memory map fragmented; learning
how the Xen community feels about a BadRAM option for Xen.