[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#4360) hdb livelock, probably on write



Full_Name: Aaron Richton
Version: 2.3.18
OS: Solaris 9
URL: 
Submission from: (NULL) (68.197.28.208)


One of my colleagues reported this behavior to me the other day on an extremely
lightly loaded 2.3.17. This is the first time I'm seeing it myself (on a more
loaded 2.3.18). slapd guns CPU:
(from top)
 4283 root      18   0    0  937M  522M run   512:58 95.85% slapd

  18 LWP 2  0xfed1dbb4 in _poll () from /usr/lib/libc.so.1
  17 LWP 3  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  16 LWP 4  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  15 LWP 5  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  14 LWP 6  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  13 LWP 7  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  12 LWP 8  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  11 LWP 9  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  10 LWP 10  0xff214b60 in __lock_get_internal (lt=0x38d8d0, locker=240,
    flags=1, obj=0xd57ffce0, lock_mode=DB_LOCK_WRITE, timeout=0,
    lock=0xd57ffd7c) at ../dist/../lock/lock.c:845
  9 LWP 11  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  8 LWP 12  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  7 LWP 13  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  6 LWP 14  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  5 LWP 15  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  4 LWP 16  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  3 LWP 17  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
  2 LWP 18  0xfec65994 in __lwp_park () from /usr/lib/libthread.so.1
* 1 LWP 1  0xfed1f8e0 in _lwp_wait () from /usr/lib/libc.so.1


I'll need to install dbx to get a backtrace (it appears that Sun Studio 11
doesn't play with gdb as well as 10 did) but I snapped it with gcore, so it
should be forthcoming (hopefully later today).
The only way out of this was a kill -9. slapd remained listening, so I had to
kill it off (numerous clients issued operations that just timed out, which made
for unhappy users).