Module 0060: Algorithms to search for an item in an array

About this module

Sorted arrays

The linear search algorithm in module 0054 is the only kind of search algorithm that works on unsorted arrays. In the worst case, all elements in the array must be examined to confirm that a value is not in the array.

However, we can make some improvements with sorted arrays.

Assume that array a is sorted, and assume n is the number of elements in array a. Furthermore, let us assume that the array is sorted in a non-decreasing order. This implies that we can have duplicate values in the array. The mathematical way to say an array is sorted in a non-decreasing order is as follows:

a[i] <= a[i+1] for i ranging inclusively from 0 to n-2. We can also spell this out as a[0] <= a[1] <= a[2]... a[n-1].

The following algorithm results in a value of true in variable sorted when it completes if and only if array a is sorted:

i=0;                                 // line 1
sorted=true;                         // line 2
while (sorted && i<n-1) {            // line 3
  sorted = sorted && (a[i]<=a[i+1]); // line 4
  i=i+1;                             // line 5
}                                    // line 6

As an exercise, try tracing this algorithm using a sorted array and an unsorted array.

Linear search

We can improve our linear search algorithm to work more efficiently. The original algorithm is described as the following algorithm.

// Linear search algorithm that works for sorted and unsorted arrays
// Assume v contains the value to be search in an array a that has n elements
// Assume array a elements have specific values
i=0;
while ((i<n) && (a[i] != v))
  i = i + 1;
if (i=n)
  // conclude no element in a has a value of v
else
  // conclude a[i] is the first element with a value of v

Note that if array a is sorted in a non-decreasing way, and we find that v < a[i], then we already know that v < a[i] <= a[i+1] <= a[i+2]... a[n-1]. This also means that v != a[i+1]$, v != a[i+2], all the way up to v != a[n-1].

This property of a sorted array means that we can exit the loop earlier when we discover that v < a[i]. This is great news! On the other hand, we need to modify the conditional statement because now we can exit with i < n and still conclude that value $v$ is not anywhere in the array. The condition that confirms that value v is not in the array should be (i == n) || (a[i] != v).

As a result, the algorithm is modified as in the following algorithm.

// Improved linear search algorithm that works only for non-decreasingly sorted arrays.
i=0;
while ((i<n) && (a[i]<v))
  i=i+1;
if ((i==n) || (a[i] != v))
  // conclude no element in a has a value of v
else
  // conclude a[i] is the first element in a that has a value of v

With this modification, the number of elements to examine is variable when the value $v$ is not in the array. In the worst case, when v > a[n-1], we still need to examine all n items in the array.

Binary search

Concept

Binary search works only if an array is sorted. The basic idea is to compare the value to find with an element in the middle of the array. Depending on the outcome, we can rule out one-half of the candidates or confirm that the value does exist.

Let us assume that we compare v with a[m], the first candidate has an index of b, and the last candidate has an index of e. Then, there are three possible outcomes:

If we pick $m$ to be half way between $b$ and $e$, then we can throw away at least one half of the candidates after one comparison.

Algorithm

The binary search algorithm is described in the following algorithm. This algorithm assumes that $a$ is an array with at least one item, and $v$ is the value that we are searching for.

// The binary search algorithm, assume array a has n items non-decreasingly sorted
// k is the value to search for
b=0;                               // line 1
e=n-1;                             // line 2
do {                               // line 3
  m = (b+e)/2;                     // line 4
  if (a[m] < v)                    // line 5
    b = m + 1;                     // line 6
  else                             // line 7
    if (v < a[m])                  // line 8
      e = m - 1;                   // line 9
} while ((b <= e) && (a[m] != v)); // line 10
if (b>e)                           // line 11
  // line 12: conclude v cannot be found in a
else                               // line 13
  // line 14: conclude v is found in a

Let us dissect this algorithm.

Now, that’s efficient!

The binary search algorithm is very efficient because each comparison rules out at least one half of the remaining candidates. This means that if we start with 511 candidates, we’ll end up with 255 after one comparison, 127 after two, 63 after three, etc. It’ll take 9 comparisons that confirm a value v is not in an array of 511 elements.

Let n be the number of elements in a and q be the number of comparisons needed to confirm v is not in a. Then $q = \lceil \log{(n)} / \log{(2)} \rceil$. The symbol $\lceil x \rceil$ is the ceiling of $x$, which is the smallest integer that is larger than or equal to $x$.

Using this formula, to look up a name in a phonebook with 8 billion entries will only take up to 33 comparisons!