Branch Prediction is Real

Branch predictions exist, and they do have a major impact on performance. Here I will show some useless code just to demonstrate how do they effect your performance. Two “algorithms” scanning the entire array O(n) and do some counting, however, due to branch prediction the nature of elements in the array will make a huge difference on performance.

  1. Sorted Array
  2. int initialSize = 10_000;
    long before,after;
    for(int j=1;j<1000;j+=100) {
      int size = initialSize*j;
      int[] array = IntStream.range(0, size).toArray();
    
       int count = 0;
       before = System.nanoTime();
       for (int i = 0; i < array.length - 1; i++) {
          if (array[i] < array[i + 1]) {
            count++;
          }
        }
        after = System.nanoTime();
        long elapsed = after - before;
        System.out.println(elapsed+","+size+","+count);
    }
    
    256933,10000,9999
    6509775,1010000,1009999
    2504190,2010000,2009999
    2378569,3010000,3009999
    3104675,4010000,4009999
    3909848,5010000,5009999
    5470804,6010000,6009999
    5878946,7010000,7009999
    6360480,8010000,8009999
    6848432,9010000,9009999
    
  3. Randomised Array
  4. Random random = new Random();
    int initialSize = 10_000;
    long before,after;
    for(int j=1;j<1000;j+=100) {
       int size = initialSize*j;
       int[] array =  random.ints(size).toArray();
       int count = 0;
       before = System.nanoTime();
       for (int i = 0; i < array.length - 1; i++) {
          if (array[i] < array[i + 1]) {
            count++;
          }
        }
        after = System.nanoTime();
        long elapsed = after - before;
        System.out.println(elapsed+","+size+","+count);
    }
    

    output on my machine:

    298587  ,10000,4970
    11325883,1010000,505161
    9414624 ,2010000,1004060
    13196647,3010000,1504505
    17669974,4010000,2005382
    22051148,5010000,2504787
    26793498,6010000,3004422
    30835706,7010000,3504727
    36048498,8010000,4004003
    45621428,9010000,4505365
    

I do need to print count otherwise the compiler will play it smart and never execute the loop. Remember this.
Here’s a graph that shows the difference more clearly.
time_plotting

Now. Both programs scan the entire array and do the same check, so theoretically they have the same asymptotic growth. However, in the case of the sorted array, the processor have noticed that the condition array[i] < array[i + 1] was true in the previous instruction/s and optimised accordingly.

Advertisements

2 Comments

I really like this blogpost – what is the behaviour, if, after 40m iterations, you reverse the list order? Will HotSpot reoptimise the alternative branch some future time?

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: