Greg Lindahl wrote:
> OK, so I finally got a node to hang:
>
> sched.c:30 spinlock stuck in pbs_mom at fffffc000032cd64(1) owner
> a.out.system14. at fffffc000032cd64(0) sched.c:30
> sched.c:30 spinlock stuck in pbs_mom at fffffc000032cd64(1) owner
> a.out.system14. at fffffc000032cd64(0) sched.c:30
> sched.c:30 spinlock stuck in pbs_mom at fffffc000032cd64(1) owner
> a.out.system14. at fffffc000032cd64(0) sched.c:30
>
Hmm...that comes from this code right here (sched.c:821):
...
kstat.context_swtch++;
get_mmu_context(next);
switch_to(prev, next, prev);
__schedule_tail(prev);
same_process:
reacquire_kernel_lock(current);
return;
...
So, that means it's our good friend the kernel_lock again. How long did the
system hang for before you killed it? I'm just a little curious if it would
have recovered "eventually". Are you either doing 1) a lot of data
transfers or 2) a lot of heavily threaded work?
Problem here is that this code probably will do the same thing in this
section of code in 2.3/2.4 - that particular section of code is roughly the
same.
>
> This is printed once every few minutes on the console of the hung system.
> There are two copies of a.out.system14. running. That's some of the
> problem with the magic of the kernel_lock - it makes it a little too easy
> to use... :(
>
> The kernel is the RedHat-compiled 2.2.14-6.0smp.
>
> My question is: is this bug fixed by the 2.2.16 patch to alpha/smp.c ? I
> suppose I should try it, but it's nice to understand these things well
> enough to know...
>
It doesn't look like it should be. That code is in pointer_lock() which is
only called by smp_call_function() (at least as far as cscope is concerned,
anyway). It's entirely possible that it might, however, if somehow your
code is making heavy use of the smp_call_function() in some way. (I'm
guessing a bit here, but hey - it's better than nothing. ;) Obviously Jay
or rth (or whomever submitted the change) would know better.
- Pete
-- To unsubscribe: send e-mail to axp-list-request@redhat.com with 'unsubscribe' as the subject. Do not send it to axp-list@redhat.com
This archive was generated by hypermail version 2a22 on Sat Jul 1 05:31:31 2000 PDT
Send any problems or questions about this archive to webmaster@alphalinux.org.