Hello,
Since update from OpenLDAP 2.4.23 to OpenLDAP 2.4.32 about one to three times a week a slapd process crashes with a coredump.
Seems it’s caused by ldap requests as only some of our servers are affected which are all in the same network zone.
The facts I found out so far:
Syslog:
Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:01 vg0092 last message repeated 14 times
Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:01 vg0092 last message repeated 17 times
Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:01 vg0092 last message repeated 15 times
Mar 8 20:13:01 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:02 vg0092 last message repeated 18 times
Mar 8 20:13:02 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:11 vg0092 last message repeated 1091 times
Mar 8 20:13:11 vg0092 slapd[220]: [ID 870088 local4.debug] get_filter: unknown filter type=48
Mar 8 20:13:20 vg0092 last message repeated 1057 times
Mar 8 20:14:14 vg0092 genunix: [ID 603404 kern.notice] NOTICE: core_log: slapd[220] core dumped: /dpool/vg0092-data/ldap/core/core.slapd.220
Mar 8 20:14:14 vg0092 slapd[7288]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $
Mar 8 20:14:14 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd
Mar 8 20:14:14 vg0092 slapd[7299]: [ID 643551 local4.debug] hdb_db_open: database "dc=scom": unclean shutdown detected; attempting recovery.
Mar 8 20:14:31 vg0092 last message repeated 2 times
Mar 8 20:14:42 vg0092 last message repeated 5 times
Mar 8 20:15:03 vg0092 slapd[8246]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.32 (Aug 5 2012 00:09:28) $
Mar 8 20:15:03 vg0092 steve@sunblade2500:/bigdisk/SOURCES/S10/openldap-2.4.32/servers/slapd
Mar 8 20:15:03 vg0092 ldap: [ID 702911 user.warning] vg0092 slapd maintenance, rebuilding, WARNING
The ‘unknown filter’ messages are caused by HPUX clients. By the crash the Berkeley-DB became corrupt and has to be rebuilt.
Coredump:
# adb /usr/local/libexec/slapd core.slapd.220
core file = core.slapd.220 -- program ``/usr/local/libexec/slapd'' on platform SUNW,SPARC-Enterprise-T5120
SIGABRT: Abort
$c
libc.so.1`_lwp_kill+8(6, 0, fed87080, fecede54, ffffffff, 6)
libc.so.1`abort+0x110(b07ff4e8, 1, fed833f0, ffba0, fed85518, 0)
libc.so.1`_assert+0x64(12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c)
connection_next+0x138(0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000)
0x112574(8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8)
monitor_entry_create+0x94(714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084)
0xe1eec(714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400)
monitor_back_search+0x248(714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8)
fe_op_search+0x420(714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20)
do_search+0x618(714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38)
0x3da44(b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0)
0x3e3d0(0, 2f, fed87940, 0, fd17ba00, 2330ec)
libldap_r-2.4.so.2`ldap_int_thread_pool_wrapper+0x190(2330a8, b0800000, 0, 0, ff30ed80, 1)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
pflags shows, that lwp 25 might be the culprit:
# pflags core.slapd.220
core 'core.slapd.220' of 220: /usr/local/libexec/slapd -4 -u ldap -g ldap -f /dpool/vg0092-data/ldap
data model = _ILP32 flags = MSACCT|MSFORK
/1: flags = STOPPED lwp_wait(0x4,0xffbffb34)
why = PR_SUSPENDED
/2: flags = STOPPED pollsys(0x4,0x9f,0x0,0x0)
why = PR_SUSPENDED
/3: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/4: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/5: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/6: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/7: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/8: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/9: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/10: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/11: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/12: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/13: flags = DETACH|STOPPED
why = PR_SUSPENDED
/14: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/15: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/16: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/17: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/18: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/19: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/20: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/21: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/22: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/23: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/24: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/25: flags = DETACH
sigmask = 0xffffbefc,0x0000ffff cursig = SIGABRT
/26: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/27: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/28: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/29: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/30: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/31: flags = DETACH|STOPPED
why = PR_SUSPENDED
/32: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/33: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
/34: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
pstack:
----------------- lwp# 25 / thread# 25 --------------------
fed0e8cc _lwp_kill (6, 0, fed87080, fecede54, ffffffff, 6) + 8
fec82950 abort (b07ff4e8, 1, fed833f0, ffba0, fed85518, 0) + 110
fec82b8c _assert (12d0d0, 12c9d0, 3a8, 0, ff8bc, 19418c) + 64
0003cc64 connection_next (0, b07ff7c4, b07ff7c0, 199d1c, fd17ba00, 1a2000) + 138
00112574 ???????? (8000, b07ffcb8, 5e9bb4, 199d1c, b07ff8a8, 1c77a8)
00114670 monitor_entry_create (714ba50, b07ffcb8, 0, 545d64, b07ff8a8, 546084) + 94
000e1eec ???????? (714ba50, b07ffcb8, 545d3c, 0, 1, 1a2400)
000e2200 monitor_back_search (714ba50, b07ffcb8, 0, 142a7da8, e1fb8, 1971d8) + 248
0004005c fe_op_search (714ba50, b07ffcb8, 12d838, 0, 1a2928, 1a2a20) + 420
0003f70c do_search (714ba50, b07ffcb8, fed87940, 0, 3f0f4, b07ffa38) + 618
0003da44 ???????? (b07ffe08, 714ba50, fed87940, 0, fd17ba00, 0)
0003e3d0 ???????? (0, 2f, fed87940, 0, fd17ba00, 2330ec)
ff30ef10 ldap_int_thread_pool_wrapper (2330a8, b0800000, 0, 0, ff30ed80, 1) + 190
fed0abd8 _lwp_start (0, 0, 0, 0, 0, 0)
Questions:
- Is this a known problem?
- If yes: is it already fixed in OpenLDAP 2.4.34 or can it be circumvented?
- If no: Is there any additional info I can provide which might be helpful?
Sending the coredump is no option yet as it contains all password hashes etc.
Regards
Jürgen Sprenger