Tuesday, June 17, 2014

Compare and Swap In Java


Compare and swap (CAS)

The first processors that supported concurrency provided atomic test-and-set operations, which generally operated on a single bit. The most common approach taken by current processors, including Intel and Sparc processors, is to implement a primitive called compare-and-swap, or CAS. (On Intel processors, compare-and-swap is implemented by the cmpxchg family of instructions. PowerPC processors have a pair of instructions called "load and reserve" and "store conditional" that accomplish the same goal; similar for MIPS, except the first is called "load linked.")

A CAS operation includes three operands - a memory location (V), the expected old value (A), and a new value (B). The processor will atomically update the location to the new value if the value that is there matches the expected old value, otherwise it will do nothing. In either case, it returns the value that was at that location prior to the CAS instruction. (Some flavors of CAS will instead simply return whether or not the CAS succeeded, rather than fetching the current value.) CAS effectively says "I think location V should have the value A; if it does, put B in it, otherwise, don't change it but tell me what value is there now."

The natural way to use CAS for synchronization is to read a value A from an address V, perform a multistep computation to derive a new value B, and then use CAS to change the value of V from A to B. The CAS succeeds if the value at V has not been changed in the meantime.

Instructions like CAS allow an algorithm to execute a read-modify-write sequence without fear of another thread modifying the variable in the meantime, because if another thread did modify the variable, the CAS would detect it (and fail) and the algorithm could retry the operation. Listing below illustrates the behavior (but not performance characteristics) of the CAS operation, but the value of CAS is that it is implemented in hardware and is extremely lightweight (on most processors):

=================================================
public class SimulatedCAS {

    private int value;

    public synchronized int getValue() {
        return value;
    }

    public synchronized int compareAndSwap(int expectedValue, int newValue) {
        int oldValue = value;
        if (value == expectedValue) {
            value = newValue;
        }
        return oldValue;
    }
}
=================================================
Concurrent algorithms based on CAS are called lock-free, because threads do not ever have to wait for a lock (sometimes called a mutex or critical section, depending on the terminology of your threading platform). Either the CAS operation succeeds or it doesn't, but in either case, it completes in a predictable amount of time. If the CAS fails, the caller can retry the CAS operation or take other action as it sees fit. Listing below shows the counter class written to use CAS:

=================================================
public class CASCounter {

    private SimulatedCAS value = new SimulatedCAS(); // starts with 0

    public int getValue() {
        return value.getValue();
    }

    public int increment() {
        int oldValue = value.getValue();
        while (value.compareAndSwap(oldValue, oldValue + 1) != oldValue) {
            oldValue = value.getValue();
        }
        return oldValue + 1;
    }
}
=================================================
Lock-free and wait-free algorithms

An algorithm is said to be wait-free if every thread will continue to make progress in the face of arbitrary delay (or even failure) of other threads. By contrast, a lock-free algorithm requires only that some thread always make progress.

Nonblocking algorithms are used extensively at the operating system and JVM level for tasks such as thread and process scheduling. While they are more complicated to implement, they have a number of advantages over lock-based alternatives - hazards like priority inversion and deadlock are avoided, contention is less expensive, and coordination occurs at a finer level of granularity, enabling a higher degree of parallelism.

Until Java SE 5, it was not possible to write wait-free, lock-free algorithms in the Java language without using native code. With the addition of the atomic variables classes in the java.util.concurrent.atomic package, that has changed. The atomic variable classes all expose a compare-and-set primitive (similar to compare-and-swap), which is implemented using the fastest native construct available on the platform (compare-and-swap, load linked/store conditional, or, in the worst case, spin locks). Nine flavors of atomic variables are provided in the java.util.concurrent.atomic package (AtomicInteger; AtomicLong; AtomicReference; AtomicBoolean; array forms of atomic integer; long; reference; and atomic marked reference and stamped reference classes, which atomically update a pair of values).

The atomic variable classes can be thought of as a generalization of volatile variables, extending the concept of volatile variables to support atomic conditional compare-and-set updates. Reads and writes of atomic variables have the same memory semantics as read and write access to volatile variables.

Operations on atomic variables get turned into the hardware primitives that the platform provides for concurrent access, such as compare-and-swap.

Java Atomic Operations

In Java the integer primitive increment operation is not an atomic operation. When an integer is incremented, the following logical steps are performed by the JVM:

Retrieve the value of the integer from memory

Increment the value

Assign the newly incremented value back to the appropriate memory location

Return the value to the caller

So while we write the increment operator in a single line of Java code, such as:

int n = i++;
Each one of the aforementioned steps occurs in the JVM. The danger is that if you have multiple threads that all try to increment the same value, there is a chance that two or more of the threads will get the same value (step #1), then increment it (step #2), and then assign the new value to it (step #3). If two threads increment the number 5 then you would expect to see 7 but instead both increment the number 5, yielding a result of 6, and then assign 6 back to the integer's memory location.

With the release of Java SE 5, Sun included a java.util.concurrent.atomic package that addresses this limitation. And specifically they added classes including the following:

AtomicBoolean - A boolean value that may be updated atomically. An AtomicBoolean is used in applications such as atomically updated flags, and cannot be used as a replacement for a Boolean.

AtomicInteger - An int value that may be updated atomically. An AtomicInteger is used in applications such as atomically incremented counters, and cannot be used as a replacement for an Integer. However, this class does extend Number to allow uniform access by tools and utilities that deal with numerically-based classes.

AtomicIntegerArray - An int array in which elements may be updated atomically.

AtomicLong - A long value that may be updated atomically. An AtomicLong is used in applications such as atomically incremented sequence numbers, and cannot be used as a replacement for a Long. However, this class does extend Number to allow uniform access by tools and utilities that deal with numerically-based classes.

AtomicLongArray - A long array in which elements may be updated atomically.

Each of these atomic classes provides methods to perform common operations, but each one is ensured to be performed as a single atomic operation. For example, rather than incrementing an integer using the standard increment operator, like the following:

int n = ++i;
You can ensure that the (1) get value, (2) increment value, (3) update memory, and (4) assign the new value to n is all accomplished without fear of another thread interrupting your operation by writing you code as follows:

AtomicInteger ai = new AtomicInteger(0);
int n = ai.incrementAndGet();
In addition, the AtomicInteger class provides the following operations:

int addAndGet(int delta) - Add delta to the integer and then return the new value.

int decrementAndGet() - Decrement the integer and return its value.

int getAndAdd(int delta) - Return the value and then add delta to it.

int getAndDecrement() - Return the value and then decrement.

int getAndIncrement() - Return the value and then increment it.

int getAndSet(int newValue) - Return the value and then set it to the newValue


Ref:
>> http://howtodoinjava.com/2014/06/13/compare-and-swap-cas-algorithm/
>> http://java.boot.by/ocpjp7-upgrade/ch04s03.html

Singleton's and The Double Checked Locking Problem in Multithreading


Let's go through the below a bit about singleton and then the double checking problem

What is singleton?
>>A singleton is simply a class that is instantiated exactly once

Objective of singleton?
>>There should be only one instance allowed for a class
>>There should allow global point of access to that single instance

Why singletons are required?
>>Singletons typically represent a system component that is intrinsically unique, such as Java Runtime, the window manager or file system
>>Singletons are used to store data that it is used/updated across multiple components. The data in one component is usually important to another component, so everything is managed in one central object.

Ways to implement
There are 2 ways to implement singletons. In Both ways, We suppress the constructor and don’t allow even a single instance for the class. But we declare a static member to hold the sole instance of our singleton class

Early Initialization
At the time of loading the class, instance gets created. JVM guarantees that the instance will be created before any thread access the static member. getInstance() method simply returns the instance that was already created.

If your application always creates and uses singleton instance or the overhead of creation and runtime aspects of the singleton are not onerous, this early initialization is preferred.

==============================================
public class SingletonEarly {

private final static SingletonEarly instance = new SingletonEarly();

private SingletonEarly() {

}

public SingletonEarly getInstance() {
return instance;
}

}
==============================================

Lazy Initialization

==============================================
public class SingletonLazy {

private static SingletonLazy instance;

private SingletonLazy() {
// Suppressing creating a new instances
}

public SingletonLazy getInstance() {
if (instance == null) {
instance = new SingletonLazy();
}
return instance;
}
}

==============================================



Double-Checked Locking Problem

In earlier times (prior to JDK 1.6) a simple uncontended synchronization block was expensive and that lead many people to write double-checked locking to write lazy initialization code. The double-checked locking idiom tries to improve performance by avoiding synchronization over the common code path after the helper is allocated. But the Double Checked Locking never worked because of the limitations of pervious JMM. 

This is now fixed by new JMM (JDK 1.5 onwards) using volatile keyword.

==============================================
public class SingletonLazy {

private  static SingletonLazy _instance;

private SingletonLazy() {
// Suppressing creating a new instances
}

public SingletonLazy getInstance() {
if (_instance == null) {
synchronized (SingletonLazy.class) {
if (_instance == null) {
_instance = new SingletonLazy();
}
}
}
return _instance;
}

}
==============================================

Why above code idiom is broken in current JMM ?

Double Checked Locking relies on the un synchronized use of _instance field. This appears harmless, but it is not. Suppose Thread A is inside sycnhronized block and it is creating new Singleton instance and assigning to _instance variable, while thread B is just entering the getInstance() method. Consider the effect on memory of this initialization. Memory for the new Singleton object will be allocated; the constructor for Singleton will be called, initializing the member fields of the new object; and the field resource of SomeClass will be assigned a reference to the newly created object. There could be two scenarios now


>> Suppose Thread A has completed initialization of _instance and exits synchronized block as thread B enters getInstance(). By this time, the _instance is fully initalized and Thread A has flushed its local memory to main memory (write barriers). Singleton's member fields may refer other objects stored in memory which will also be flushed out.. While Thread B may see a valid reference to the newly created _instance, but because it didn't perform a read barrier, it could still see stale values of _instance's member fields.

>> Since thread B is not executing inside a synchronized block, it may see these memory operations in a different order than the one thread A executes. It could be the case that B sees these events in the following order (and the compiler is also free to reorder the instructions like this): allocate memory, assign reference to resource, call constructor. Suppose thread B comes along after the memory has been allocated and the resource field is set, but before the constructor is called. It sees that resource is not null, skips the synchronized block, and returns a reference to a partially constructed Resource! Needless to say, the result is neither expected nor desired.


Fixed double-checked Locking using volatile in new JMM (multi-threaded singleton pattern JDK 1.5)

The following code makes the helper volatile so as to stop the instruction reordering. This code will work with JDK 1.5 onwards only.

=================================================
public class SingletonLazy {

private volatile static SingletonLazy _instance;

private SingletonLazy() {
// Suppressing creating a new instances
}

public SingletonLazy getInstance() {
if (_instance == null) {
synchronized (SingletonLazy.class) {
if (_instance == null) {
_instance = new SingletonLazy();
}
}
}
return _instance;
}

}
=================================================

Hope this give an insight to  The Double Checked Locking Problem, Happy Learning :)