Binary Search

A binary search is a fast search algorithm suitable for data sets of any reasonable size encountered. Unlike a linear search, it is suitable even for very large data sets because it eliminates large numbers of comparisons. A binary search differs from a linear search in that it requires sorted data to operate successfully.

We can explore how a binary search operates by considering how it would find an element in the following vector.

Figure 1 A vector containing nine elements

Given the vector pictured in Figure 1, we can use a binary search to find the element that contains the value 9. All binary searches begin by comparing the item being sought against the item in the middle of the data set. In this example, the middle element contains the value 21.

Figure 2 Selecting the middle element

The value we are seeking is 9. Since 9 is less than 21, we know that 9 must be stored in an element somewhere to the left of the middle element. With this in mind, we can safely ignore the right half of the vector and continue our search, only considering the left half of the vector. Figure 3 demonstrates this partitioning of the vector.

Figure 2 Partitioning the vector

After we partition the vector, the middle element changes. We then compare the value stored here to the value we seek. Since 5 is less than 9, we know that 9 must be stored in the right half of this partition. We can then ignore the left half of this partition, and compare against the midpoint of the right half.

Figure 4 Partitioning the vector, again

Figure 4 highlights the midpoint of the current vector partition in green. The value stored here equals the value we seek so our search is complete. For this vector, which contains nine elements, only three comparisons were needed to find the element that stored the value 9. Starting from the left side of the vector, a linear search needs only three comparisons also to find the element that stores the value 9. The real advantage of binary search appears when we consider larger data sets. The following table lists the maximum number of comparisons a binary search algorithm has to make to find any element in vectors of various sizes.

Size of vector	Max comparisons


1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000

Table 1 Maximum comparisons of binary search

Even for a vector that contains a billion elements, a binary search algorithm needs to make at most thirty comparisons to find any element. An implementation of a binary search algorithm appears in Listing 2.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:

// Finding an element in a vector using binary search template <typename T> int binary_search(const vector<T>& v, const T& item) { int low = 0; int high = v.size() - 1; int mid; while (low <= high) { // find the midpoint mid = (low + high) / 2; if (v[mid] < item) { low = mid + 1; // look in upper portion } else if (v[mid] > item) { high = mid - 1; // look in lower portion } else { return mid; // found it! } } return -1; // item not found }

Listing 2Finding an element in a vector using a binary search

The only disadvantage of a binary search is that it requires sorted data. Without sorted data, the midpoint comparisons are meaningless and the search would incorrectly report that an element is not present in the data set. Given a vector of unsorted items, how can we sort the items so that we can use a binary search? There are many approaches that we can take. We introduce the various approaches in 4.1.2 Basic Sorting Algorithms.

Instead of relying on their own binary search implementations, C++ programmers can use the STL binary_search function. The STL binary_search interface resembles the find function interface. The only difference is that function binary_search returns a bool value indicating the success of the search. An example use of the STL binary_search appears in wordsearch.cpp. This program stores the words contained in a text file into a vector. The user then enters a word via the console. The program then performs a binary search to see if the text file contains the word. A sample text file containing the full text of Nathaniel Hawthorne's The House of the Seven Gables can be found here.