Defects on High-Speed Memory – Part 2

Last month, we saw how defects on memory data lines can
cause a system to fail, and yet escape detection by the system boot loader or
BIOS. Let’s examine this in more technical detail.

In the blog,
we examined the case of a short circuit between two DQ lines, and how this
might escape detection by the BIOS. This is fundamentally due to the fact that
there may exist a voltage bias on the memory controller whereby the value read
out may be the same as that read in. For example, if DQ0 is written as a ‘1’,
and DQ1 is written as a ‘0’, the resultant values stored in the memory cells
will be indeterminate, based on the short circuit yielding a level midway between
high and low. Deterministically, it is possible that this might escape the
simple testing that is part of the memory training algorithm within the BIOS.

DIMM Block Diagram two DQ shorted

If the memory training sequence does complete, in this
instance the system will soon fail thereafter due to more data being read in
and out of main memory during the remainder of the board boot-up process. This
will result, on some systems, in the infamous “blue screen” which yields very
little diagnostics information. And, of course, many test routines themselves
rely on being run within main system RAM, which would not be possible. It is
best, of course, to try to catch the failure during the BIOS memory training

An implementation of cache-based instrumented memory testing
routines can be reviewed in our white paper, Cache-as-RAM to bring up non-booting boards

Alan Sguigna