Monday, March 31, 2014

Understanding Basics of Tim Sort, The hybrid sorting algorithm in Java 7

Timsort is a hybrid, an efficient combination of a number of other algorithms. The algorithm is based on the idea that in the real world the sorted data array contains ordered (no matter how: non-descending or descending) sub-arrays. And often it really is so. With such data Timsort goes ahead of all other algorithms.The mechanism of the algorithm can be briefly explained as follows:

  1. A particular algorithm is used to split the input array into sub-arrays.
  2. Each sub-array is sorted with a simple Insertion Sort.
  3. The sorted sub-arrays are merged into one array with the use of the Merge Sort.



The Algorithm

Definitions

  1. N: the length of the input array.
  2. run: an ordered sub-array in the input array. At the same time, the order is non-descending or strictly descending, i.e. “a0 ≤ a1 ≤ a2 ≤ ...» or «a0 > a1 > a2 > ...”
  3. minrun: as it was said above, in the first step of the algorithm, the input array is split into runs. minrun is a minimum length of such run. This number is calculated by a certain logics from the N number.



Step 0. Calculation of the Minrun.

The minrun number is determined based on the N, by the following principles:

It shouldn’t be too long as the minrun will later be subjected to insertion sort, which is effective for short runs.
It shouldn’t be too short, as, the shorter the run, the more runs will have to be merged in the next step.
It would be good if N \ minrun was a power of 2 (or close to it). This requirement results from the fact that the merge sort performs best on the runs of about the same length.
Here, the author refers to his own experiments, showing that with minrun > 256, principle 1 is not satisfied, and with minrun < 8, principle 2 is not satisfied, and the most performing lengths range from 32 to 65. Exception: if N < 64, then minrun = N and Timsort turns into a simple merge sort. At this point, the minrun calculation algorithm is as simple as a piece of cake: take the six most significant bits out of N and add one, if the remaining least significant bits contain at least one off bit.



Step 1. Splitting into Runs and Their Sorting.

So, at this stage, there is an input array with N size and a calculated minrun. The algorithm for this step:

The base address of the current element is set at the beginning of the input array.

  • Starting with the current array, search for the run (an ordered array) in the input array. By definition, this run will definitely include the current element and the one that follows, but further on it’s pure luck. If the final array is descending, the elements are ordered in a non-descending order (it’s a simple serial algorithm, it's just that we move from both sides to the centre, changing the places of the elements).
  • If the current run is smaller than the minrun, take the number of elements that follow the found run equal to the minrun — size(run). Thus, the final run will be the size of the minrun or more, a part of which (ideally all of it) is ordered.
  • Then this run is insertion sorted. As this run is small and partially ordered, the sorting is fast and highly performing.
  • The base address of the current element is set at the element following this run.
  • If the end of the input array has not been reached, go to point 2, otherwise it is the end of this step.


Step 2. Merge

At this stage, there is an input array split into runs. If the data in the input array were close to random, the run size is close to minrun, and if the data had ordered ranges (the algorithm application requirement lets us hope for it), the run size exceeds minrun. Now, these runs need to be combined to receive the final and completely ordered array, and in the course of it, two requirements have to be satisfied:

  • There should be combined runs of relatively equal size (this way it will be more efficient).
  • The algorithm stability should be maintained, i.e. no useless shifts (e.g. two consecutive equal numbers should not exchange places).
  • This is achieved in the following way:
  • Create an empty pair stack <run base address>-<run size>. Take the first run.
  • Add a few data to the stack <base address>-<size> to the current run.
  • Assess whether the current run should be merged with the preceding one. To do this, check where the two requirements are satisfied (say, X, Y and Z are three top runs in the stack): X > Y + Z Y > Z
  • If one of the requirements is not satisfied, array Y is merged with the smallest of arrays X and Z. This step is performed until both requirements are satisfied or until all data is ordered.
  • For any unconsidered runs, take the next and go point 2. Otherwise, it is the end.



Runs Merging

As you remember, in step two of the algorithm, we were merging two runs into an ordered one. The two consecutive runs are always merged. To merge them an additional memory is used.

  • A temporary run is created with a size of the smallest of the merged runs.
  • Copy the shortest run into the temporary one.
  • Mark the current position with the first elements of the large and temporary arrays.
  • In each following step compare the elements in the large and temporary arrays and move the smaller into a new sorted array. Move the base address of the element in the array, where the element was taken from.
  • Repeat step 4 until one of the arrays runs out.
  • Add all elements of the remaining run to the end of the new one.


Modifications to the Merging Sort

All seems perfect in the above merge sort. Except for one thing: imagine the merge of such two arrays:

A = {1, 2, 3,..., 9999, 10000}

B = { 20000, 20001, ...., 29999, 30000}

The above procedure will work for them too, but at each step four, one comparison and one moving should be performed to give 10000 comparisons and 10000 moves. Timsort offers here a modification called galloping. It means the following:

Start the merge sort as described above.
At each moving of an element from the temporary or large run to the final one, the run where the element was moved from is recorded.
If a number of elements (in this representation of the algorithm this number equals 7) was moved from one and the same run, it can be assumed that the next element will also come from the same run. To prove it, the galloping mode is switched, i.e. go through the run that is expected to supply the next set of data with a binary search (remember that the array is ordered and we have all rights for binary search) to find the current element from the other merged run. The binary search is more efficient than the linear one, thus the number of searches will be much smaller.
Finally, when the data in the supplying run do not fit (or having reached the run end), this data can be moved in a bulk (which can be more efficient than moving separate elements).
The explanation can be a bit vague, so let’s have a look at the example: A = {1, 2, 3,..., 9999, 10000} B = { 20000, 20001, ...., 29999, 30000}

 In the first seven iterations, numbers 1, 2, 3, 4, 5, 6 and 7 from run A are compared with number 20000 and, after 20000 is found to be greater, they are moved from array A to the final one.
Starting with the next iteration, the galloping mode is switched on: number 20000 is compared in sequence with numbers 8, 10, 14, 22, 38, n+2^i, ..., 10000 from run A. As you can see, the number of such comparisons will be far less than 10000.
When run A is empty, it is known that it is smaller than run B (we could also have stopped somewhere in the middle). The data from run A is moved to the final one, and the procedure is repeated.


Complexity

Best: n
Average: nlogn
worst: nlogn


Please check the java code @ http://cr.openjdk.java.net/~martin/webrevs/openjdk7/timsort/raw_files/new/src/share/classes/java/util/TimSort.java

Sunday, March 23, 2014

Implement a DoublyLinkedList in Java

Please go through the below code to understand the implementation

package test.sachi.dsnalgos.doubleLinkedList;

public class Node {

protected Node next;
protected Node prev;
protected int data;

public Node(){

next=null;
prev=null;
data=0;
}


public Node(int d,Node n, Node p){

next=n;
prev=p;
data=d;
}

public void setNextlink(Node n){
next=n;

}
public void setPrevlink(Node p){
prev=p;

}
public Node getNextLink(){
return next;
}

public Node getPrevLink(){
return prev;
}
public void setData(int d)
    {
        data = d;
    }
public int getData()
    {
        return data;
    }

}


package test.sachi.dsnalgos.doubleLinkedList;

public class DoublyLinkedList {

protected Node start;
protected Node end;
int size;

public DoublyLinkedList(){
start=null;
end=null;
size=0;
}

public boolean isEmpty(){
if(start !=null){
return false;
}else{
return true;
}
}

public int getSize(){
return size;

}

public void insertAtStart(int val){
Node nodePtr=new Node(val,null,null);
if(start==null){
start=nodePtr;
end=start;

}else{
start.setPrevlink(nodePtr);
nodePtr.setNextlink(start);
start=nodePtr;

}
size++;

}

public void insertAtEnd(int val){
Node nodePtr=new Node(val,null,null);
if(start==null){
start=nodePtr;
end=start;

}else{
nodePtr.setPrevlink(end);
end.setNextlink(nodePtr);
end=nodePtr;

}
size++;

}

public void insertAtPosition(int val , int pos)
{
Node nptr = new Node(val, null, null);    
if (pos == 1)
{
insertAtStart(val);
return;
}            
Node ptr = start;
for (int i = 2; i <= size; i++)
{
if (i == pos)
{
Node tmp = ptr.getNextLink();
ptr.setNextlink(nptr);
nptr.setPrevlink(ptr);
nptr.setNextlink(tmp);
tmp.setPrevlink(nptr);
}
ptr = ptr.getNextLink();            
}
size++ ;
}

public void deleteAtPosition(int pos){
if(pos==1){

if(size==1){
start=null;
end=null;
size=0;
return;

}
start=start.getNextLink();
start.setPrevlink(null);
size--;
return;

}
if(pos==size) {
end=end.getPrevLink();
end.setNextlink(null);
size--;
}
Node tmpPtr=start.getNextLink();
for(int i=2;i<=size;i++){
if(i==pos){
Node p=tmpPtr.getPrevLink();
Node n=tmpPtr.getNextLink();
p.setNextlink(n);
n.setPrevlink(p);
size--;
return;


}
tmpPtr=tmpPtr.getNextLink();
}



}

public void displayDL(){
System.out.print("\nDoubly Linked List = ");
if (size == 0) {
System.out.print("empty\n");
return;
}
if(start.getNextLink()==null){
System.out.println(start.getData() );
return;
}
Node nodePtr=start;
System.out.print(start.getData()+ " <-> ");
while(nodePtr.getNextLink() !=null){
System.out.print(nodePtr.getData()+ " <-> ");
nodePtr = nodePtr.getNextLink();
        }
System.out.print(nodePtr.getData()+ "\n");
}


}




Implement Queue using two Stacks in Java

Please go through the below program to understand the code.

package test.sachi.dsnalgos;

import java.util.NoSuchElementException;
import java.util.Stack;
/*
 Create two stacks : 's' and 'tmps' as in the program given below
For insert operation :
    if size of s = 0 then
        push value into s
    else
        push all popped elements from s to tmps
        push value into s
        push all popped elements from tmps to s

For remove operation :
    if size of s = 0 then
        throw 'underflow' exception
    else 
        return pop element from s


 */


public class QueueUsingTwoStacks {
Stack <Integer> s;
Stack <Integer> tmps;

public QueueUsingTwoStacks(){
s=new Stack<Integer>();
tmps=new Stack<Integer>();
}
public void insert(int data){
if(s.size()==0){
s.push(data);
}else{
int l=s.size();
for (int i = 0; i < l; i++){
tmps.push(s.pop());
}
s.push(data);
for (int i = 0; i < l; i++){
s.push(tmps.pop());
}
}
}
public int remove() throws Exception
    {
        if (s.size() == 0)
            throw new NoSuchElementException("Underflow Exception");            
        return s.pop();
    }    
public int peek() throws Exception
    {
        if (s.size() == 0)
            throw new NoSuchElementException("Underflow Exception");            
        return s.peek();
    }        
public boolean isEmpty()
    {
        return s.size() == 0 ;
    }    
public int getSize()
    {
        return s.size();
    }    
public void display()
    {
        System.out.print("\nQueue = ");
        int l = getSize();
        if (l == 0)
            System.out.print("Empty\n");
        else
        {
                        
            for (int i = l - 1; i >= 0; i--)
                System.out.print(s.get(i)+" ");                
            System.out.println();
        }
    }
}

Implement Stack using two Queues in Java

Please go through the below program to understand the code 



package test.sachi.dsnalgos;

import java.util.Iterator;
import java.util.LinkedList;
import java.util.NoSuchElementException;
import java.util.Queue;

/*
 * Create two queues : 'q' and 'tmpq' as in the program given below
 For push operation :
 if size of q = 0 then
 enqueue value into q
 else
 dequeue all elements from q to tmpq
 enqueue value into q
 dequeue all elements from tmpq to q

 For pop operation :
 if size of q = 0 then
 throw 'underflow' exception
 else 
 return front element of q

 */

public class StackUsingTwoQueues {

Queue<Integer> q;
Queue<Integer> tempq;

// contructor
public StackUsingTwoQueues() {
q = new LinkedList<Integer>();
tempq = new LinkedList<Integer>();
}

// Method to push the element to the Stack

public void push(int data) {

if (q.size() == 0) {
q.add(data);
} else {
int l = q.size();
for (int i = 0; i < l; i++) {
tempq.add(q.remove());
}
q.add(data);
for (int i = 0; i < l; i++) {
q.add(tempq.remove());
}
}

}

public int pop() throws Exception {
if (q.size() == 0) {
throw new NoSuchElementException("Underflow Exception");
}

return q.remove();
}

public int peek() throws Exception {

if (q.size() == 0) {
throw new NoSuchElementException("Underflow Exception");
}

return q.peek();
}
public boolean isEmpty(){
if(q.size()==0){
return true;
}else{
return false;
}
}
public int getSize(){
return q.size();
}
public void display(){
System.out.print("\nStack = ");
        int l = getSize();
        if (l == 0)
            System.out.print("Empty\n");
        else
        {
            Iterator<Integer> it = q.iterator();
            while (it.hasNext())
                System.out.print(it.next()+" ");
            System.out.println();
        }
}


}