Re: EB164 scsi hang with Redhat 4.0

Gerard Roudier (groudier@club-internet.fr)
Fri, 1 Nov 1996 14:35:27 +0000 (GMT)

Mike,

On Thu, 31 Oct 1996, Mike Cruse wrote:

> Hi all,
>
> I have an EB164 machine with an NCR 53c825 fast wide scsi controller and
> a Quantum XP34300W fast wide drive. Redhat 3.0.3 was working fine. I'm
> now trying to install 4.0 from scratch. I've tried more than 20 times to
> do this but the machine dies every time while trying to prepare the swap
> partition. I have a 64Meg swap partition. I tried a 32 Meg swap for
> grins. That works but things hang during mke2fs. I even tried doing all
> the prep stuff on another Intel machine then moving the drive back but
> again things grind to a halt as soon as the first large package gets
> installed.

I hope that the Readhat boot disk does not enable tagged command
queueing. Can you check it from console at boot time and let me know.
A message like "Enabling tagged command queuing ..." is printed out by
the driver if tagged queue is enabled.
It has been reported that some (perhaps all) XP34300(W)s returns QUEUE FULL
status especially when lots of small ios are done.
My guess (just a speculation) is that the firmware tries to cluster write
operations and may have some internal queue full even when only fiew scsi
commands are queued to it.
Disabling TGQ precludes such status to be returned.

The driver (1.12a/b/c) included in linux-2.0.X does not manage queue full
status properly and donnot allow to disable tagged queue once it has been
enabled.
The fix of those problems was a too large patch for 2.0.X and so has been
included in the 2.1.X (driver release 1.14a).
The driver sources are available at sunsite:
/pub/Linux/kernel/patches/scsi/ncrBsd2Linux-1.14a.tar.tgz

If RedHat boot disk just uses default config options, the problem is
elsewhere. But very probably, the system did not hang silently and some
log messages related to the problem have been printed by the driver.
Restarting klogd with level=7 will print all messages to console:
klogd -c 7

> I also tried tweaking the scsi transfer stuff, i.e.
>
> > echo "setsync #target period" >/proc/scsi/ncr53c8xx/0
> >
> > - echo "setsync 0 255" >... renegotiate asynchronous transfer with
> target 0
> > - echo "setsync 0 50" >... 5 MHz synchronous transfer
> > - echo "setsync 0 25" >... 10 MHz synchronous transfer (max
> speed)
>
> but to no avail.

The default value for sync transfers should be 5 MHz. That will be
changed in the next patch I will post.

> I have now seen the same problem on an Intel PPro-200 box also with an
> NCR controller (not wide though).

My working configuration is the following:
P133 Triton II / Tyan NCR 825 / XP32150W / 0662S12 / Tosh 3401B (old)
Linux-2.0.22 / ncr53c8xx-1.12c / 10 MHz sync / 8 tagged commands max.
I donnot encounter any scsi problems even under heavy load.

My test configuration just uses the latest linux revision + next driver
revision.

> > Seems that there is a problem with the NCR driver.

If you get log messages to the console just before your system hangs is it
possible for you to try to catch them and report them to me.
Thanks. That will help me.

> I would really like to get 4.0 running on my Alpha.
>
> Can anyone shed some light on this problem?
>
> Thanks in advance

Gerard.

--
To unsubscribe: send e-mail to axp-list-request@redhat.com with
'unsubscribe' as the subject.  Do not send it to axp-list@redhat.com



Feedback | Store | News | Support | Product Errata | About Us | Linux Info | Search | JumpWords
No Frames | Show Frames

Copyright © 1995-1997 Red Hat Software. Legal notices