MAJOR trouble with 32 MB SIMMs in UDB

Joshua M. Thompson (invid@optera.com)
Tue, 5 Nov 1996 00:06:08 -0500 (EST)

This problem has actually been going for quite some time, but not until my
recent upgrades did it become serious. I *DESPERATELY* need to figure out
what the hell is causing this.

Background: UDB 166 MHz with 128 MB RAM. Originally had a 2 GB boot drive
and two 4 GB drives in a RAID-0 setup (using md) as the news spool. New
setup has a 4 GB boot drive and a 18 GB RAID-5 array connected to an
Adaptec 2940U (not UW) in the PCI slot. The SIMMS are true parity 8 MB x
36 bit SIMMs, 60 ns speed.

Originally I ran kernel 2.0.18 (no axp-diffs), milo 2.0.12, and a beta of
Colgate (late September...right before the final version). Now I'm running
kernel 2.0.24, Milo 2.0.22 (from the test images directory), and the
official Colgate plus the recently released update RPMs.

Problem: originally, we'd get lots of correctable memory errors. Replacing
SIMMs and motherboards did not cure the problem. The errors would only
start popping up once inn was started, or when something else
memory-intense started running (such as a drive check after a crash).

On the new setup, everything ran perfectly until I started INN. Suddently
the screen went bezerk with what looked like machine checks, though they
went by so fast I couldn't read them. Eventually they stopped and the
machine was frozen. Judging from the hex dumps they look like an "unknown
errlog size 206" machine check.

Upon reboot the system would eventually die the same way after it started
checking the now dirty hard disk partitions.

Moving SIMMs around did not help. Tried reseating, rearranging, even going
down to 80 MB (2 x 32 plus 2 x 8). Also tried artificially forcing Linux
down to less memory than was actually installed using a "mem=64M" command
line option. Same result.

Eventually I managed to boot the machine on just two 8 MB SIMMs.
Apparently, for some reason or another, it seems like 32 MB SIMMs just
_do_not_ work right in there. As I said, I've had both the SIMMs and
motherboards replaced and nothing changed. The end conclusion is that I
simply can't use ANY 32 MB SIMMs in this darn thing.

Also I seem to recall that on the old setup, I was running ELF without
axp-diffs because I had this same disasterous set of scrolling messages
when I tried a kernel with axp-diffs. Back then I still had some ECOFF
machines, so I compiled unpatched kernels as ECOFF and ran those. Now I
_have_ to build ELF kernels, so I have to have the axp-diffs installed to
compile. Also since I have the Adaptec installed I can't use any standard
ready-built kernels. I should note that when I tried axp-diff kernels
previously, I was running standard Milo 2.0.12 (no test images).

The SRM console and ARC firmware see the memory just fine and never
complain. I'm not an Alpha guru (especially not at the hardware
level), but this makes me suspect that perhaps something in the PAL code
and/or kernel is messing up the memory controller somehow.

I'm also curious if this problem is the same problem that prevents me from
running INN as an ELF binary (it gets a single unaligned access trap and
just dies with other errors on the screen or in the logs). I've been
running it as ECOFF for all this time, but if I have to recompile it for
any reason it's going to end up ELF now, and that worries me.

-- 
invid@optera.com             | We are Grey
http://www.optera.com/~invid | We stand between the Candle and the Star
                             | Between the Darkness and the Light

--
To unsubscribe: send e-mail to axp-list-request@redhat.com with
'unsubscribe' as the subject.  Do not send it to axp-list@redhat.com



Feedback | Store | News | Support | Product Errata | About Us | Linux Info | Search | JumpWords
No Frames | Show Frames

Copyright © 1995-1997 Red Hat Software. Legal notices