[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7587) slapd crashes when using pcache overlay applied to a translucent proxy
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#7587) slapd crashes when using pcache overlay applied to a translucent proxy
- From: hyc@symas.com
- Date: Tue, 18 Mar 2014 18:50:37 GMT
- Auto-submitted: auto-generated (OpenLDAP-ITS)
amoneger@cisco.com wrote:
> Full_Name: Alex
> Version: 2.4.35
> OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP)
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (64.103.25.102)
>
>
> When using the pcache overlay over a translucent proxy, the slapd daemon crashes
> after the second LDAP request which misses the cache. For example, the following
> requests will trigger the issue. The important part is to miss the cache.
> Assuming nothing is cached for the aaaa and bbbb uid, the following request will
> trigger the issue (172.16.206.156 being the openldap server):
Thanks for the report, this is now fixed in git master.
> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid
> st
> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid
> st
>
> Whether aaaa and bbbb exist or not does not matter.
>
> The following config is used:
>
> include /usr/local/etc/openldap/schema/core.schema
> include /usr/local/etc/openldap/schema/cosine.schema
> include /usr/local/etc/openldap/schema/inetorgperson.schema
> include /usr/local/etc/openldap/schema/misc.schema
> include /usr/local/etc/openldap/schema/nis.schema
>
> moduleload pcache.la
> moduleload translucent.la
>
> database bdb
> suffix "o=xxxx"
> #checkpoint 1024 15
> rootdn "uid=amoneger,ou=yyyy,o=xxxx"
> overlay translucent
> translucent_local uidNumber,gidNumber,homeDirectory,loginShell
> translucent_strict
> rootdn "uid=amoneger,ou=yyyy,o=xxxx"
> uri ldap://zzzz/
> #tls ldaps tls_reqcert=demand
> tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain
> overlay pcache
> pcache bdb 10000 1 50 100
> pcacheAttrset 0 *
> pcacheTemplate (uid=) 0 3600
> pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx
> pcacheOffline TRUE
> pcachePersist TRUE
> pcacheValidate FALSE
> directory /var/cache/ldap
> cachesize 1000
> index pcacheQueryid eq
>
> The crash seems to be caused by a SIGABRT which is raised by libc free() due to
> a double free. Here is the traceback:
> Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6
> (gdb) c
> Continuing.
>
> Program received signal SIGABRT, Aborted.
> 0x000000312aa328a5 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0 0x000000312aa328a5 in raise () from /lib64/libc.so.6
> #1 0x000000312aa34085 in abort () from /lib64/libc.so.6
> #2 0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6
> #3 0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6
> #4 0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at
> search.c:263
> #5 0x0000000000421449 in connection_operation (ctx=0x7f1366764a90,
> arg_v=0x7f135c000b80) at connection.c:1155
> #6 0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90,
> argv=<value optimized out>) at connection.c:1291
> #7 0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at
> tpool.c:688
> #8 0x000000312b207851 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000312aae890d in clone () from /lib64/libc.so.6
>
> I was unable to track back the particular piece of code triggering the double
> free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:
>
> (gdb) delete 3
> (gdb) break search.c:263
> Breakpoint 4 at 0x5645df: file search.c, line 263.
> (gdb) c
> Continuing.
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263 op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493 {
> (gdb) s
> 498 if (!ptr)
> (gdb) s
> 501 if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502 ber_memfree_x(ptr, NULL);
> (gdb) s
> 649 }
> (gdb) s
> 502 ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127 {
> (gdb) s
> 128 if( p == NULL ) {
> (gdb) s
> 134 if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160 }
> (gdb) s
> 152 free( p );
> (gdb) print p
> $1 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10: 0x00000001 0x00000000 0x005ffd31 0x00000000
> 0x1d64a20: 0x00000000 0x00000000 0x00000000 0x00000000
> 0x1d64a30: 0x00000000 0x00000000
> (gdb) c
> Continuing.
> [New Thread 0x7fa561994700 (LWP 5823)]
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263 op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493 {
> (gdb) s
> 498 if (!ptr)
> (gdb) s
> 501 if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502 ber_memfree_x(ptr, NULL);
> (gdb) s
> 649 }
> (gdb) s
> 502 ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127 {
> (gdb) s
> 128 if( p == NULL ) {
> (gdb) s
> 134 if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160 }
> (gdb) s
> 152 free( p );
> (gdb) print P
> No symbol "P" in current context.
> (gdb) print p
> $2 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10: 0x00000000 0x00000000 0x005ffd31 0x00000000
> 0x1d64a20: 0x00000000 0x00000000 0x00000000 0x00000000
> 0x1d64a30: 0x00000000 0x00000000
>
> So the same pointer is being freed twice by the 2 connections which miss the
> cache. I'm unable to figure out who is responsible for that call though, but the
> same op->ors_attrs is freed by do_search():
>
> if ( op->ors_attrs != NULL ) {
> op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
>
> Parameters seem correct in both cases:
> (gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free
> $12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free>
> (gdb) print op->o_request.oq_search.rs_attrs
> $15 = (AttributeName *) 0x1f85a10
>
> The call is done via connection_operation(), but that code part is a bit above
> my head, so I'm unable to track this further.
>
> I thought this could be due to a threading problem, but building slapd with
> --with-threads=no does not make a difference.
>
> I tried uploading the core dump to your ftp server, but seems like there is an
> issue with ftp.openldap.org
> [root@centos63 tmp]# ftp ftp.openldap.org
> Trying 204.152.186.57...
> Connected to ftp.openldap.org (204.152.186.57).
> 220- OpenLDAP FTP Service
> 220 boole.openldap.org FTP server (Version 6.00LS) ready.
> Name (ftp.openldap.org:cisco): anonymous
> 331 Guest login ok, send your email address as password.
> Password:
> 230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved.
> 230- COPYING RESTRICTIONS APPLY, see:
> 230- ftp://ftp.openldap.org/COPYRIGHT
> 230- ftp://ftp.openldap.org/LICENSE
> 230 Guest login ok, access restrictions apply.
> Remote system type is UNIX.
> Using binary mode to transfer files.
> ftp> cd incoming
> 250 CWD command successful.
> ftp> binary
> 200 Type set to I.
> ftp> put core-slapd-6-55-55-26548-1368063267
> local: core-slapd-6-55-55-26548-1368063267 remote:
> core-slapd-6-55-55-26548-1368063267
> 227 Entering Passive Mode (204,152,186,57,242,33)
> 553 core-slapd-6-55-55-26548-1368063267: No space left on device.
>
> Let me know if you need anything. I can provide further debugs or cores. I'm
> also happy to try things out.
>
> Cheers,
> Alex
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/