Monday, August 25, 2014

Java Concurrency Tutorial - Thread-safe designs

After reviewing what the main risks are when dealing with concurrent programs (like atomicity or visibility), we will go through some class designs that will help us prevent the aforementioned bugs. Some of these designs result in the construction of thread-safe objects, allowing us to share them safely between threads. As an example, we will consider immutable and stateless objects. Other designs will prevent different threads from modifying the same data, like thread-local variables.

You can see all the source code at github.


1   Immutable objects


Immutable objects have a state (have data which represent the object's state), but it is built upon construction, and once the object is instantiated, the state cannot be modified.

Although threads may interleave, the object has only one possible state. Since all fields are read-only, not a single thread will be able to change object's data. For this reason, an immutable object is inherently thread-safe.

Product shows an example of an immutable class. It builds all its data during construction and none of its fields are modifiable:

In some cases, it won't be sufficient to make a field final. For example, MutableProduct class is not immutable although all fields are final:

Why is the above class not immutable? The reason is we let a reference to escape from the scope of its class. The field 'categories' is a mutable reference, so after returning it, the client could modify it. In order to show this, consider the following program:

And the console output:

Product categories
A
B
C

Modified Product categories
B
C

Since categories field is mutable and it escaped the object's scope, the client has modified the categories list. The product, which was supposed to be immutable, has been modified, leading to a new state.

If you want to expose the content of the list, you could use an unmodifiable view of the list:


2   Stateless objects


Stateless objects are similar to immutable objects but in this case, they do not have a state, not even one. When an object is stateless it does not have to remember any data between invocations.

Since there is no state to modify, one thread will not be able to affect the result of another thread invoking the object's operations. For this reason, a stateless class is inherently thread-safe.

ProductHandler is an example of this type of objects. It contains several operations over Product objects and it does not store any data between invocations. The result of an operation does not depend on previous invocations or any stored data:

In its sumCart method, the ProductHandler converts the product list to an array since for-each loop uses an iterator internally to iterate through its elements. List iterators are not thread-safe and could throw a ConcurrentModificationException if modified during iteration. Depending on your needs, you might choose a different strategy.


3  Thread-local variables


Thread-local variables are those variables defined within the scope of a thread. No other threads will see nor modify them.

The first type is local variables. In the below example, the total variable is stored in the thread's stack:

Just take into account that if instead of a primitive you define a reference and return it, it will escape its scope. You may not know where the returned reference is stored. The code that calls sumCart method could store it in a static field and allow it being shared between different threads.

The second type is ThreadLocal class. This class provides a storage independent for each thread. Values stored into an instance of ThreadLocal are accessible from any code within the same thread.

The ClientRequestId class shows an example of ThreadLocal usage:

The ProductHandlerThreadLocal class uses ClientRequestId to return the same generated id within the same thread:

If you execute the main method, the console output will show different ids for each thread. As an example:

T1 - 23dccaa2-8f34-43ec-bbfa-01cec5df3258
T2 - 936d0d9d-b507-46c0-a264-4b51ac3f527d
T2 - 936d0d9d-b507-46c0-a264-4b51ac3f527d
T3 - 126b8359-3bcc-46b9-859a-d305aff22c7e
...

If you are going to use ThreadLocal, you should care about some of the risks of using it when threads are pooled (like in application servers). You could end up with memory leaks or information leaking between requests. I won't extend myself in this subject since the post How to shoot yourself in foot with ThreadLocals explains well how this can happen.


4   Using synchronization


Another way of providing thread-safe access to objects is through synchronization. If we synchronize all accesses to a reference, only a single thread will access it at a given time. We will discuss this on further posts.


5   Conclusion


We have seen several techniques that help us build simpler objects that can be shared safely between threads. It is much harder to prevent concurrent bugs if an object can have multiple states. On the other hand, if an object can have only one state or none, we won't have to worry about different threads accessing it at the same time.

This post is part of the Java Concurrency Tutorial series. Check here to read the rest of the tutorial.

I'm publishing my new posts on Google plus and Twitter. Follow me if you want to be updated with new content.

Thursday, August 14, 2014

Java Concurrency Tutorial - Visibility between threads

When sharing an object’s state between different threads, other issues besides atomicity come into play. One of them is visibility.

The key fact is that without synchronization, instructions are not guaranteed to be executed in the order in which they appear in your source code. This won’t affect the result in a single-threaded program but, in a multi-threaded program, it is possible that if one thread updates a value, another thread doesn’t see the update when it needs it or doesn’t see it at all.

In a multi-threaded environment, it is the program’s responsibility to identify when data is shared between different threads and act in consequence (using synchronization).

The example in NoVisibility consists in two threads that share a flag. The writer thread updates the flag and the reader thread waits until the flag is set:

This program might result in an infinite loop, since the reader thread may not see the updated flag and wait forever.


With synchronization we can guarantee that this reordering doesn’t take place, avoiding the infinite loop. To ensure visibility we have two options:
  • Locking: Guarantees visibility and atomicity (as long as it uses the same lock).
  • Volatile field: Guarantees visibility.

The volatile keyword acts like some sort of synchronized block. Each time the field is accessed, it will be like entering a synchronized block. The main difference is that it doesn’t use locks. For this reason, it may be suitable for examples like the above one (updating a shared flag) but not when using compound actions.

We will now modify the previous example by adding the volatile keyword to the ready field.

Visibility will not result in an infinite loop anymore. Updates made by the writer thread will be visible to the reader thread:

Writer thread - Changing flag...
Reader Thread - Flag change received. Finishing thread.


Conclusion


We learned about another risk when sharing data in multi-threaded programs. For a simple example like the one shown here, we can simply use a volatile field. Other situations will require us to use atomic variables or locking.

This post is part of the Java Concurrency Tutorial series. Check here to read the rest of the tutorial.

You can take a look at the source code at github.

I'm publishing my new posts on Google plus and Twitter. Follow me if you want to be updated with new content.

Monday, August 11, 2014

Java Concurrency Tutorial - Atomicity and race conditions

Atomicity is one of the key concepts in multi-threaded programs. We say a set of actions is atomic if they all execute as a single operation, in an indivisible manner. Taking for granted that a set of actions in a multi-threaded program will be executed serially may lead to incorrect results. The reason is due to thread interference, which means that if two threads execute several steps on the same data, they may overlap.

The following Interleaving example shows two threads executing several actions (prints in a loop) and how they are overlapped:

When executed, it will produce unpredictable results. As an example:

Thread 2 - Number: 0
Thread 2 - Number: 1
Thread 2 - Number: 2
Thread 1 - Number: 0
Thread 1 - Number: 1
Thread 1 - Number: 2
Thread 1 - Number: 3
Thread 1 - Number: 4
Thread 2 - Number: 3
Thread 2 - Number: 4

In this case, nothing wrong happens since they are just printing numbers. However, when you need to share the state of an object (its data) without synchronization, this leads to the presence of race conditions.


Race condition


Your code will have a race condition if there’s a possibility to produce incorrect results due to thread interleaving. This section describes two types of race conditions:

  1. Check-then-act
  2. Read-modify-write
To remove race conditions and enforce thread safety, we must make these actions atomic by using synchronization. Examples in the following sections will show what the effects of these race conditions are.


Check-then-act race condition


This race condition appears when you have a shared field and expect to serially execute the following steps:

  1. Get a value from a field.
  2. Do something based on the result of the previous check.


The problem here is that when the first thread is going to act after the previous check, another thread may have interleaved and changed the value of the field. Now, the first thread will act based on a value that is no longer valid. This is easier seen with an example.

UnsafeCheckThenAct is expected to change the field number once. Following calls to changeNumber method, should result in the execution of the else condition:

But since this code is not synchronized, it may (there's no guarantee) result in several modifications of the field:

T13 | Changed
T17 | Changed
T35 | Not changed
T10 | Changed
T48 | Not changed
T14 | Changed
T60 | Not changed
T6 | Changed
T5 | Changed
T63 | Not changed
T18 | Not changed

Another example of this race condition is lazy initialization.

A simple way to correct this is to use synchronization.

SafeCheckThenAct is thread-safe because it has removed the race condition by synchronizing all accesses to the shared field.

Now, executing this code will always produce the same expected result; only a single thread will change the field:

T0 | Changed
T54 | Not changed
T53 | Not changed
T62 | Not changed
T52 | Not changed
T51 | Not changed
...

In some cases, there will be other mechanisms which perform better than synchronizing the whole method but I won’t discuss them in this post.


Read-modify-write race condition


Here we have another type of race condition which appears when executing the following set of actions:

  1. Fetch a value from a field.
  2. Modify the value.
  3. Store the new value to the field.


In this case, there’s another dangerous possibility which consists in the loss of some updates to the field. One possible outcome is:

Field’s value is 1.
Thread 1 gets the value from the field (1).
Thread 1 modifies the value (5).
Thread 2 reads the value from the field (1).
Thread 2 modifies the value (7).
Thread 1 stores the value to the field (5).
Thread 2 stores the value to the field (7).

As you can see, update with the value 5 has been lost.

Let’s see a code sample. UnsafeReadModifyWrite shares a numeric field which is incremented each time:

Can you spot the compound action which causes the race condition?

I’m sure you did, but for completeness, I will explain it anyway. The problem is in the increment (number++). This may appear to be a single action but in fact, it is a sequence of three actions (get-increment-write).

When executing this code, we may see that we have lost some updates:

2014-08-08 09:59:18,859|UnsafeReadModifyWrite|Final number (should be 10_000): 9996

Depending on your computer it will be very difficult to reproduce this update loss, since there’s no guarantee on how threads will interleave. If you can’t reproduce the above example, try UnsafeReadModifyWriteWithLatch, which uses a CountDownLatch to synchronize thread’s start, and repeats the test a hundred times. You should probably see some invalid values among all the results:

Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 997
Final number (should be 1_000): 999
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000
Final number (should be 1_000): 1000

This example can be solved by making all three actions atomic.

SafeReadModifyWriteSynchronized uses synchronization in all accesses to the shared field:

Let’s see another example to remove this race condition. In this specific case, and since the field number is independent to other variables, we can make use of atomic variables. SafeReadModifyWriteAtomic uses atomic variables to store the value of the field:

Following posts will further explain mechanisms like locking or atomic variables.


Conclusion


This post explained some of the risks implied when executing compound actions in non-synchronized multi-threaded programs. To enforce atomicity and prevent thread interleaving, one must use some type of synchronization.

This post is part of the Java Concurrency Tutorial series. Check here to read the rest of the tutorial.

You can take a look at the source code at github.

I'm publishing my new posts on Google plus and Twitter. Follow me if you want to be updated with new content.