[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: large write amplification

To: "Shu, Xinxin" <xinxin.shu@intel.com>, "leo@yuriev.ru" <leo@yuriev.ru>
Subject: Re: large write amplification
From: Howard Chu <hyc@symas.com>
Date: Tue, 5 May 2015 08:58:35 +0100
Cc: "openldap-technical@openldap.org" <openldap-technical@openldap.org>
In-reply-to: <75674D092A819E4189E91166C74CB90D01537D49@shsmsx102.ccr.corp.intel.com>
References: <75674D092A819E4189E91166C74CB90D01537600@shsmsx102.ccr.corp.intel.com> <CAO2+NUDgzUjeL2uuX=dvB7Fvm-VyqHbLurJEw1fgZ6OyHr-HHA@mail.gmail.com> <75674D092A819E4189E91166C74CB90D01537D49@shsmsx102.ccr.corp.intel.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0 SeaMonkey/2.37a1

Shu, Xinxin wrote:

Hi leonid,

Thanks for your reply, I observed another scenario , I also tested

"overwrite mode", I slightly modify source code to change default
behavior (set dbflags_ = SYNC, flush data to disk once transaction is
committed ), also collected iostat , the overwrite IOPS is ~ 521 ops/sec
, but iostat show that w/s is ~ 4666, the write amplification is ~9, to
my understanding, overwriting exist value does not adjust btree, why
write amplification is so large, could you help explain ? thanks

Your understanding is wrong. LMDB is clearly documented as acopy-on-write design. It does not modify values in-place.

Cheers,
xinxin

-----Original Message-----
From: Леонид Юрьев [mailto:leo@yuriev.ru]
Sent: Monday, May 04, 2015 6:59 PM
To: Shu, Xinxin
Cc: openldap-technical@openldap.org
Subject: Re: large write amplification

Hi, Xinxin.

I will try to answer briefly, without a details:

- To allow readers be never blocked by a writer, LMDB provides a snapshot of data, indexes and directory for each completed transaction.

- Most of a db-pages (which is not changed by a particular
transaction) are "shared" between such snapshots. But any changes of data itself and reflection to btree-indexes (include a particular table, free-db, main-db and so forth) require a new pages to be used and written to the disk.

- In a large db a small "one-byte" change may make "dirty" a lot of db-pages (usualy 4K each). For example, one add/del/mod operation in LDAP-db with size of few GB,  requires about 50-100 page-level IOPS.

Leonid.

P.S.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging benefits of write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
https://github.com/ReOpen/ReOpenLDAP/issues/2 and https://github.com/ReOpen/ReOpenLDAP/issues/1.

2015-05-04 5:31 GMT+03:00 Shu, Xinxin <xinxin.shu@intel.com>:

Hi list,

Recently I run micro tests on LMDB on DC3700 (200GB), I use bench code
https://github.com/hyc/leveldb/tree/benches ,  I tested  fillrandsync mode and collected iostat data, found that write amplification is large For fillrandsync case:

IOPS : 1020 ops/sec

Iostat data shows that w/s on that SSD is 8093, and avgqu-sz is ~ 1,
await time is about 0.16 ms,  so the write amplification is ~8, which
is large to me, can someone help explain why write amplification is so
large? thanks


Cheers,
xinxin



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

References:
- large write amplification
  - From: "Shu, Xinxin" <xinxin.shu@intel.com>
- Re: large write amplification
  - From: Леонид Юрьев <leo@yuriev.ru>
- RE: large write amplification
  - From: "Shu, Xinxin" <xinxin.shu@intel.com>

Prev by Date: Re: Need migrationhelp for 1.3.6.1.4.1.1466.115.121.1.5 because of bug in slapcat
Next by Date: Antwort: Re: Need migrationhelp for 1.3.6.1.4.1.1466.115.121.1.5 because of bug in slapcat
Index(es):
- Chronological
- Thread