Axp-List Archive
Re: 2.2.16 Alpha: "Fix SMP rescheduling with lock held"

Subject: Re: 2.2.16 Alpha: "Fix SMP rescheduling with lock held"
From: W Bauske (wsb@paralleldata.com)
Date: Tue Jun 13 15:15:04 2000


Jay Estabrook wrote:
>
> On Tue, Jun 13, 2000 at 01:12:10PM -0500, W Bauske wrote:
> >
> > They happen alot on my UP2K's in production.
> >
> > socket.c:43 spinlock grabbed in pvmd3 at fffffc00003c4368(0) 2032 ticks
> > select.c:43 spinlock stuck in pvmd3 at fffffc000035f630(0) owner zm32mig_s_pvm
> > at fffffc000034fb78(1) read_write.c:43
> > select.c:43 spinlock grabbed in pvmd3 at fffffc000035f630(0) 2033 ticks
> > select.c:43 spinlock stuck in pvmd3 at fffffc000035f630(0) owner zm32mig_s_pvm
> > at fffffc000034fb78(1) read_write.c:43
>
> Sigh...
>
> Not knowing whether x86/sparc/whatever code has any sort of similar
> debug output, it's hard to know if this is "normal" behavior".

I don't bother trying to run this code on Intels. Maybe when
Williamette/Itanium arrive, I'll look at them again but that's
a couple months off.

>
> Perhaps the PVM stuff could be better organized or rewritten to cause
> the holding of locks (at its behest) to be shorter?
>

PVM is just doing what I asked it to do. I send large chunks of
data because I use very large datasets. Anywhere from 100GB to
over a TB. This code can keep all my systems busy for a month
straight on a single problem.

> As a last resort, if one tires of seeing those messages, one could up
> the default "timeout" value in the debugging locks code to wait for a
> longer time before giving the message... :-\
>

I don't look at the system console much unless there's trouble
or I need to shut things down so I rarely look at these messages.
The key is they are not specific to boot time. I take it these
messages indicate the hang is around 2 seconds, which makes sense
given I send 16MB over 100Mb Enet, which would mean PVM is sustaining
around 8MB/sec, actually quite good from my point of view. I'm
surprised the other processor would have to wait while another
processor was receiving network data. The lock must protect large
portions of the kernel code. At some point I'll go back and try
the 2.4.test series but not real soon.

Wes

-- 
To unsubscribe: send e-mail to axp-list-request@redhat.com with
'unsubscribe' as the subject.  Do not send it to axp-list@redhat.com



This archive was generated by hypermail version 2a22 on Sat Jul 1 05:31:30 2000 PDT
Send any problems or questions about this archive to webmaster@alphalinux.org.