[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: better malloc strategies
- To: openldap-devel@OpenLDAP.org
- Subject: Re: better malloc strategies
- From: Howard Chu <hyc@symas.com>
- Date: Tue, 21 Nov 2006 23:14:58 -0800
- In-reply-to: <44F64AB0.7080007@symas.com>
- References: <200608282343.k7SNhOjt061559@cantor.openldap.org> <44F3896A.9080002@symas.com> <Pine.SOC.4.64.0608282050580.7225@toolbox.rutgers.edu> <44F64AB0.7080007@symas.com>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060911 Netscape/7.2 (ax) Firefox/1.5 SeaMonkey/1.5a
Howard Chu wrote:
I've tested again with libhoard 3.5.1, and it's actually superior to
libumem for both speed and fragmentation. Here are some results against
glibc 2.3.3, libhoard, and libumem:
I've retested with current HEAD, adding Google's tcmalloc to the mix.
glibc hoard umem tcmalloc
size time size time size time size time
initial 731 741 741 744
single 1368 2:03.02 1809 1:26.67 1778 1:43.49 1453 0:58.77
single 1369 2:23.61 1827 0:25.68 1825 0:18.40 1512 0:21.74
single 1368 1:48.41 1828 0:16.52 1826 0:21.07 1529 0:22.97
single 1368 1:48.59 1829 0:16.59 1827 0:16.95 1529 0:17.07
single 1368 1:48.72 1829 0:16.53 1827 0:16.61 1529 0:17.01
single 1368 1:48.39 1829 0:20.70 1827 0:16.56 1529 0:16.99
single 1368 1:48.63 1830 0:16.56 1828 0:17.48 1529 0:17.29
single 1384 1:48.14 1829 0:16.64 1828 0:22.17 1529 0:16.94
four 1967 1:20.21 1918 0:35.96 1891 0:29.95 1606 0:42.48
four 2002 1:10.58 1919 0:30.07 1911 0:29.00 1622 0:42.38
four 2009 1:33.45 1920 0:42.06 1911 0:40.01 1628 0:40.68
four 1998 1:32.94 1920 0:35.62 1911 0:39.11 1634 0:30.41
four 1995 1:35.47 1920 0:34.20 1911 0:28.40 1634 0:40.80
four 1986 1:34.38 1920 0:28.92 1911 0:31.16 1634 0:40.42
four 1989 1:33.23 1920 0:31.48 1911 0:33.73 1634 0:33.97
four 1999 1:33.04 1920 0:33.47 1911 0:38.33 1634 0:40.91
slapd CPU 26:31.56 8:34.78 8:33.39 9:19.87
The initial size is the size of the slapd process right after startup,
with the process totally idle. The id2entry DB is 1.3GB with about
360,000 entries, BDB cache at 512M, entry cache at 70,000 entries,
cachefree 7000. The subsequent statistics are the size of the slapd
process after running a single search filtering on an unindexed
attribute, basically spanning the entire DB. The entries range in size
from a few K to a couple megabytes.
The BDB and slapd cache configurations are the same, but the machine now
has 4GB of RAM so the entire DB fits in the filesystem buffer cache. As
such there is no disk I/O during these tests.
After running the single ldapsearch 4 times, I then ran the same search
again with 4 jobs in parallel. There should of course be some process
growth for the resources for 3 additional threads (about 60MB is about
right since this is an x86_64 system).
This time I ran the tests 8 times each. Basically I was looking for the
slapd process size to stabilize at a constant number...
The machine only had 2GB of RAM, and you can see that with glibc malloc
the kswapd got really busy in the 4-way run. The times might improve
slightly after I add more RAM to the box. But clearly glibc malloc is
fragmenting the heap like crazy. The current version of libhoard looks
like the winner here.
There was no swap activity (or any other disk activity) this time, so
the glibc numbers don't explode like they did before. But even so, the
other allocators are 5-6 times faster. The Google folks claim tcmalloc
is the fastest allocator they have ever seen. These tests show it is
fast, but it is not the fastest. It definitely is the most
space-efficient multi-threaded allocator though. It's hard to judge just
by the execution times of the ldapsearch commands, so I also recorded
the amount of CPU time the slapd process consumed by the end of each
test. That gives a clearer picture of the performance differences for
each allocator.
These numbers aren't directly comparable to the ones I posted on August
30, because I was using a hacked up RE23 there and used HEAD here.
Going thru these tests is a little disturbing; I would expect the
single-threaded results to be 100% repeatable but they're not quite. In
one run I saw glibc use up 2.1GB during the 4-way test, and never shrink
back down. I later re-ran the same test (because I hadn't recorded the
CPU usage the first time around) and got these numbers instead. The
other thing that's disturbing is just how bad glibc's malloc really is,
even in the single-threaded case which is supposed to be its ideal
situation.
The other thing to note is that for this run with libhoard, I doubled
its SUPERBLOCK_SIZE from 64K to 128K. That's probably why it's less
space-efficient here than umem, while it was more efficient in the
August results. I guess I'll have to recompile that with the original
size to see what difference that makes. For now I'd say my preference
would be tcmalloc...
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/