[Date Prev][Date Next] [Chronological] [Thread] [Top]

Reader Writer Locks possible wrong implementation (ITS#1865)



Full_Name: George Aprotosoaie
Version: 2.0.23 (and 2.1.1beta)
OS: Windows, but it may be OS independent
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (209.47.38.244)


Hi!


Platform: Windows 2000 (I will expect the same behavior on NT4 and may be other
OSs)
Version: 2.0.23 (2.1.1beta also has this problem)
Build type: built from source
Effect: server not responding to queries


The common understanding for a ReaderWriter lock is that a) after a writer is
released b) if one or more readers are waiting to acquire the lock, then all of
the readers have to be able to acquire the lock.

A possible error for the existing implementation (in
'libraries\libldap_r\rdwr.c') is that only one reader is able to acquire the
lock after a writer releases the lock. The issue is that only a writer release
would enable one waiting reader to acquire the lock. When the writer releases
the lock if there are more readers waiting to acquire the lock (as in the case
when there are more searches done at the same time than add or modify
operations), than all the waiting readers less one will be blocked. The blocked
readers are using up threads from the pool (may generate a pool starvation by
using all the threads and locking up the LDAP server, with the server using no
CPU). The only way to release the blocked readers is to perform operations that
will take the writer lock (one writer released for each blocked reader). New
attempts to take reader locks without a writer owning the lock will succeed.

Using the rdwr.c source: when a reader attempts to acquire a lock, if a writer
is 'active', then the reader is blocked in 'rw->ltrw_read' until the writer
broadcast the lock release by calling 'ldap_pvt_thread_cond_broadcast( &rw->
ltrw_read )'. The issue (may be only Windows related) is that the implementation
of the 'ldap_pvt_thread_cond_broadcast( &rw->ltrw_read )' releases only one
waiting thread and not all of them. As a result if there are more than one
readers waiting on 'rw->ltrw_read' then only one will be released, with the
other ones still waiting. If the are OS where 'ldap_pvt_thread_cond_broadcast(
&rw->ltrw_read )' releases all the waiting threads, than this is not an issue on
that OS.

The solution I propose (I have tested it for Windows and it works fine) is when
a reader is released from waiting on 'rw->ltrw_read' (as a result there are no
'active' writers), the reader should check if there are more waiting readers. If
there are waiting readers, than the reader should broadcast again the lock
release using 'ldap_pvt_thread_cond_broadcast( &rw->ltrw_read )' (in the same
way as the writer does when is released). This broadcast will release another
waiting reader that again will check if there are waiting readers and if any
will repeat the cycle until there are no more waiting readers! The new function
looks like (the change is follows the comment 'the following two lines were
added to release possible other waiting readers'):

int ldap_pvt_thread_rdwr_rlock( ldap_pvt_thread_rdwr_t *rwlock )
{
	struct ldap_int_thread_rdwr_s *rw;

	assert( rwlock != NULL );
	rw = *rwlock;

	assert( rw != NULL );
	assert( rw->ltrw_valid == LDAP_PVT_THREAD_RDWR_VALID );

	if( rw->ltrw_valid != LDAP_PVT_THREAD_RDWR_VALID )
		return LDAP_PVT_THREAD_EINVAL;

	ldap_pvt_thread_mutex_lock( &rw->ltrw_mutex );

	assert( rw->ltrw_w_active >= 0 ); 
	assert( rw->ltrw_w_wait >= 0 ); 
	assert( rw->ltrw_r_active >= 0 ); 
	assert( rw->ltrw_r_wait >= 0 ); 

	if( rw->ltrw_w_active > 0 ) {
		/* writer is active */

		rw->ltrw_r_wait++;

		do {
			ldap_pvt_thread_cond_wait(
				&rw->ltrw_read, &rw->ltrw_mutex );
		} while( rw->ltrw_w_active > 0 );

		rw->ltrw_r_wait--;
		assert( rw->ltrw_r_wait >= 0 ); 
	}

	rw->ltrw_r_active++;

// the following two lines were added to release 
// possible other waiting readers
	if (rw->ltrw_r_wait > 0)
		ldap_pvt_thread_cond_broadcast( &rw->ltrw_read );

	ldap_pvt_thread_mutex_unlock( &rw->ltrw_mutex );

	return 0;
}

Implementation notes: the change uses 'rw->ltrw_r_wait'. As a result has to be
under the 'rw->ltrw_mutex' lock protection.

Note: this issue may be the cause for the Incoming/1764 issue.


Regards,
George