Toggle Menu

<-- Back to schedule

Improving thread synchronization in GlusterD (Daemon for Gluster) using Userspace RCU (Read-copy-update)

Project: Gluster

Gluster is a open source scalable distributed storage system which can run in any commodity hardware. GlusterD is the daemon which manages the cluster configuration for Gluster.

For more information about Gluster please refer to http://www.gluster.org/

What is Big-lock in GlusterD:
------------------------------------

GlusterD was originally designed as a single threaded application which
could handle just one transaction at a time. It was made multi-threaded to
improve responsiveness and support handling multiple transactions at a
time. This was needed for newer features like volume snapshots which could
leave GlusterD unresponsive for some periods of time.

Making GlusterD multi-threaded required the creation of a thread
synchronization mechanism, to protect the shared data-structures (mainly
everything under the GlusterD configuration, glusterd_conf_t struct) from
concurrent access from multiple threads. This was accomplished using the
Big-lock.

The Big-lock is an exclusive lock, so any threads which needs to use the
protected data need to obtain the Big-lock and give up the Big-lock once
done.

Problem with Big-lock
----------------------------

The Big-lock synchronization solution was added into the GlusterD code to
solve problems that arose when GlusterD was made multi-threaded. This was
supposed to be a quick solution, to allow GlusterD to be shipped.

Big-lock as the name suggests, is a coarse grained lock. The coarseness of
the lock leads to threads contending even when they are accessing unrelated
data, which lead to some deadlocks.

One example of this deadlock is with transactions and RPC. If a thread
holding the Big-lock blocked on network I/O it may result in a deadlock.
This could happen when the remote endpoint is disconnected. The callback
code would be executed in the same thread that has acquired the Big-lock.
All network I/O handlers, including callbacks, are implemented to acquire
the Big-lock before executing. From the above two, we have a deadlock.

To avoid this, we release the Big-lock whenever a thread could block on
network I/O. This comes with a price. This opens up a window of time when
the shared data structures are prone to updates leading to inconsistencies.

The Big-lock, in its current state, doesn’t even fully satisfy the problem
it set out to solve, and has more problems on top of that. These problems
are only going to grow with new features and new code being added to
GlusterD.

Possible solutions
-----------------------

The most obvious solution would be to split up the Big-lock into more fine
grained locks. We could go one step further and use replace the mutex locks
(Big-lock is a mutex lock), with readers-writer locks. This will bring in
more flexibility and fine grained control, at the cost of additional
overheads mainly in the complexity of implementation.

As an alternative to readers-writer locks, we propose to use RCU as the
synchronization mechanism. RCU provides several advantages above
readers-writer locks while providing similar synchronization features.
These advantages make it more preferable to readers-writer locks, even
though the implementation complexity remains nearly the same for both
approaches.

Read-copy-update (RCU)
--------------------------------

RCU, short for Read-Copy-Update, is a synchronization mechanism that can be
used as an alternative to reader-writer locks.

A good introduction to RCU can be found in this series of articles on LWN
[1] and [2]. The articles are with respect to the usage of RCU in the
Linux kernel, where it is used heavily.

The advantages that make RCU preferable to RWlocks are the following,

- Wait free reads
RCU readers have no wait overhead. They can never be blocked by writers.
RCU readers need to notify when they are in their critical sections, but
this notification is much lighter than locks.

- Provides existence guarantees
RCU guarantees that RCU protected data in a readers critical section will
remain in existence till the end of the critical section. This is achieved
by having the writers work on a copy of the data, instead of using the
existing data.

- Concurrent readers and writers
Wait-free reads and the existence guarantee mean that it is possible to
have readers and writers in concurrent execution. Any readers in execution,
before a writer starts will continue working with the original copy of the
data. The writer will work on a copy, and will use RCU methods to
swap/replace original data without affecting existing readers. Any readers
coming online after the writer will see the new data.
This does mean that some readers will continue to work with stale data,
but this isn't too big a problem as the data at least remains consistent
till the reader finishes.

- Read-side deadlock immunity
RCU readers always run in a deterministic time as they never block. This
means that they can never become a part of a deadlock.

- No writer starvation
As RCU readers don't block, writers can never starve.

Userspace RCU
--------------------

The kernel uses features provided by the processor to implement its RCU.
Userspace applications cannot make use of these features, but instead can
use the Userspace RCU library.

liburcu [3] provides a userspace implementation of RCU, which is
portable across multiple platforms and operating systems. liburcu also
provides some common data structures and RCU protected APIs to use them.

An introduction to URCU and its APIs can be found in this article on LWN
[4].

[1]: https://lwn.net/Articles/262464/
[2]: https://lwn.net/Articles/263130/
[3]: http://urcu.so/
[4]: https://lwn.net/Articles/573424/

Atin Mukherjee

Atin currently works as a Senior S/W Engineer in Redhat India and based out of Bengaluru office. He is having around 8 years of experience in various domains which includes Telecom, BFSI & Storage. Currently he is working in a open source distributed scalable storage system called 'Gluster' and is responsible of managing the management daemon of Gluster for configuration management of the cluster. He is also a FOSS enthusiast and loves to contribute to many open source projects. His area of interest is mainly distributed systems.


Geelong 2016

Our Emperor Penguin Sponsors

Geelong

About Geelong

Geelong is Victoria's second largest city, located on Corio Bay, and within a short drive from popular beach-front communities on the Bellarine Peninsula as well as being the gateway to the famous Great Ocean Road

More Info »

linux.conf.au

linux.conf.au

linux.conf.au is widely regarded by delegates as one of the best community run Linux conferences worldwide and is the largest Linux and Open Source Software conference in the Asia-Pacific.

Read More »

Sponsorship

Sponsorship

Our Sponsors help make linux.conf.au become the awesome conference everyone comes back to year after year. Come see who's on board this year, or find out how to get in contact with us

Sponsorship »