[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Antw: Re: LMDB test assertion failures on Linux/MIPS
Hi!
I think a problem with your test program is that you don't wait for the write() thread to finish before you try to read the mmap(). See how locking on a producer-consumer (or reader-writer) relationship is usually implemented (If you don't have it ready, I could send you the algorithms).
Regards,
Ulrich
>>> Martin Lucina <martin@lucina.net> schrieb am 10.03.2014 um 22:10 in Nachricht
<20140310211032.GA22062@nodbug.moloch.sk>:
> hyc@symas.com said:
>> Martin Lucina wrote:
>> >That still doesn't explain the MIPS issues, any suggestions on how to
>> >proceed there? I can give someone access to a MIPS host if that would help.
>>
>> Copying back to the list:
>>
>> Martin Lucina wrote:
>> > hyc@symas.com said:
>> >> It appears that this system also lacks a coherent FS cache, like
>> >> some BSDs. I changed mtest.c to use MDB_WRITEMAP and it now runs
>> >> fine.
>> >>
>> >> The unmodified mtest.c also worked when single-stepping thru gdb,
>> >> which apparently gives time for the cache to sort itself out between
>> >> mdb function calls.
>> >
>> > Interesting. What you're saying is that without MDB_WRITEMAP pages are
>> > written out separately and it is up to the FS cache to ensure that reading
>> > back via the memory map is consistent, correct?
>>
>> That's the general idea. As the LMDB design paper states, LMDB
>> requires the OS to use a unified buffer cache - so that mmap pages
>> and FS cache pages are the same.
>>
>> > I'll try and dig through the OpenWRT kernel configuration, they must have
>> > changed something that triggers this behaviour.
>>
>> Frankly it seems unlikely that they could have changed something so
>> fundamental to the VM subsystem of the kernel. It's also possible that we're
>> seeing *CPU* cache inconsistencies, and that adding a few
>> MIPS-specific memory barrier instructions here and there may fix
>> things up.
>
> I did some more investigating:
>
> 1) Tried adding calls to sync_file_range() (Linux-specific syscall) and
> in desperation even sync(2) to mdb_txn_commit() just after mdb_page_flush()
> et al. No change.
>
> 2) Compiled the below test program on various plaforms. This tries (rather
> unscientifically) to test how "long" it takes for a mmap to become
> consistent after writing to the underlying file through a different fd
> opened with O_DSYNC (what mdb does).
>
> The results are interesting:
>
> x86_64 core i5m (2 cores, 4 threads): gcc -O2: consistently less than 1k
> iterations
> x86_64 core i5m (2 cores, 4 threads): gcc -O2 -DNOBARRIER: consistently around
> 10k iterations
> x86_64 dual 4-core xeon, gcc -O2: around 2k iterations
> x86_64 dual 4-core xeon, gcc -O2 -DNOBARRIER: 10-15k iterations
> MIPS target, musl gcc -O2 -mips32r2: varies, mostly 1, in each 10 runs at
> least one run completes in the high 100k's of iterations
> MIPS target, musl gcc -O2 -mips32r2 -DNOBARRIER: about the same as previous,
> but
> when not 1 the result is subjectively higher (around 1m iterations)
> single CPU SPARCv9 solaris 10, Sun cc -fast -mt: always[*] 1
> single CPU SPARCv9 solaris 10, CSW gcc -O2, with or without -DNOBARRIER:
> always[*] 1
> ia64 dual Itanium 2, Linux gcc -O2: around 2k iterations
> ia64 dual Itanium 2, Linux gcc -O2 -DNOBARRIER: anwhere between 3-8k iterations
>
> [*] very rarely several million iterations
>
> Does this help in any way? It certainly seems to suggest that the MIPS
> target's fs cache is (eventually) consistent.
>
> Any pointers on how to proceed or what else to try/who else to ask will be
> much appreciated.
>
> Martin
>
> ----test program----
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <assert.h>
> #include <stdio.h>
> #include <pthread.h>
> #include <unistd.h>
>
> pthread_barrier_t b;
>
> static void *thread (void *arg)
> {
> int fd;
>
> pthread_barrier_wait (&b);
> fd = open ("/tmp/testfile", O_RDWR | O_CREAT | O_DSYNC, 0600);
> unsigned long v = 1;
> assert (write (fd, &v, sizeof v) == sizeof v);
> close (fd);
> return NULL;
> }
>
> int main (int argc, char *argv[])
> {
> int fd;
> pthread_barrier_init (&b, NULL, 2);
>
> unlink ("/tmp/testfile");
> fd = open ("/tmp/testfile", O_RDWR | O_CREAT, 0600);
> unsigned long v = 0;
> assert (write (fd, &v, sizeof v) == sizeof v);
> volatile unsigned long *p = mmap (NULL, getpagesize (), PROT_READ,
> MAP_SHARED, fd, 0);
> assert (p != MAP_FAILED);
>
> int i = 0;
> pthread_t thread_id = 0;
> pthread_create (&thread_id, NULL, thread, NULL);
>
> while (*p != 1) {
> if (!i)
> pthread_barrier_wait (&b);
> i++;
> #if defined (__GNUC__) && !defined (NOBARRIER)
> __sync_synchronize ();
> #endif
> }
> printf ("%d\n", i);
>
> munmap ((void *)p, getpagesize ());
> close (fd);
> return 0;
> }