Editing Binary search (section)

== Performance ==
[[File:Binary search example tree.svg|thumb|upright|A [[Tree (data structure)|tree]] representing binary search. The array being searched here is <math>[20, 30, 40, 50, 80, 90, 100]</math>, and the target value is <math>40</math>.]]
[[File:Binary search complexity.svg|thumb|upright=1.6|The worst case is reached when the search reaches the deepest level of the tree, while the best case is reached when the target value is the middle element.]]

In terms of the number of comparisons, the performance of binary search can be analyzed by viewing the run of the procedure on a binary tree. The root node of the tree is the middle element of the array. The middle element of the lower half is the left child node of the root, and the middle element of the upper half is the right child node of the root. The rest of the tree is built in a similar fashion. Starting from the root node, the left or right subtrees are traversed depending on whether the target value is less or more than the node under consideration.<ref name="FloresMadpis1971" />{{sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

In the worst case, binary search makes <math display="inline">\lfloor \log_2 (n) + 1 \rfloor</math> iterations of the comparison loop, where the <math display="inline">\lfloor\cdot \rfloor</math> notation denotes the [[floor function]] that yields the greatest integer less than or equal to the argument, and <math display="inline">\log_2</math> is the [[binary logarithm]]. This is because the worst case is reached when the search reaches the deepest level of the tree, and there are always <math display="inline">\lfloor \log_2 (n) + 1 \rfloor</math> levels in the tree for any binary search.

The worst case may also be reached when the target element is not in the array. If <math display="inline">n</math> is one less than a power of two, then this is always the case. Otherwise, the search may perform <math display="inline">\lfloor \log_2 (n) + 1 \rfloor</math>iterations if the search reaches the deepest level of the tree. However, it may make <math display="inline">\lfloor \log_2 (n) \rfloor</math> iterations, which is one less than the worst case, if the search ends at the second-deepest level of the tree.{{sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), "Theorem B"}}

On average, assuming that each element is equally likely to be searched, binary search makes <math>\lfloor \log_2 (n) \rfloor + 1 - (2^{\lfloor \log_2 (n) \rfloor + 1} - \lfloor \log_2 (n) \rfloor - 2)/n</math> iterations when the target element is in the array. This is approximately equal to <math>\log_2(n) - 1</math> iterations. When the target element is not in the array, binary search makes <math>\lfloor \log_2 (n) \rfloor + 2 - 2^{\lfloor \log_2 (n) \rfloor + 1}/(n + 1)</math> iterations on average, assuming that the range between and outside elements is equally likely to be searched.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

In the best case, where the target value is the middle element of the array, its position is returned after one iteration.{{Sfn|Chang|2003|p=169}}

In terms of iterations, no search algorithm that works only by comparing elements can exhibit better average and worst-case performance than binary search. The comparison tree representing binary search has the fewest levels possible as every level above the lowest level of the tree is filled completely.{{Efn|Any search algorithm based solely on comparisons can be represented using a binary comparison tree. An ''internal path'' is any path from the root to an existing node. Let <math>I</math> be the ''internal path length'', the sum of the lengths of all internal paths. If each element is equally likely to be searched, the average case is <math>1 + \frac{I}{n}</math> or simply one plus the average of all the internal path lengths of the tree. This is because internal paths represent the elements that the search algorithm compares to the target. The lengths of these internal paths represent the number of iterations ''after'' the root node. Adding the average of these lengths to the one iteration at the root yields the average case. Therefore, to minimize the average number of comparisons, the internal path length <math>I</math> must be minimized. It turns out that the tree for binary search minimizes the internal path length. {{Harvnb|Knuth|1998}} proved that the ''external path'' length (the path length over all nodes where both children are present for each already-existing node) is minimized when the external nodes (the nodes with no children) lie within two consecutive levels of the tree. This also applies to internal paths as internal path length <math>I</math> is linearly related to external path length <math>E</math>. For any tree of <math>n</math> nodes, <math>I = E - 2n</math>. When each subtree has a similar number of nodes, or equivalently the array is divided into halves in each iteration, the external nodes as well as their interior parent nodes lie within two levels. It follows that binary search minimizes the number of average comparisons as its comparison tree has the lowest possible internal path length.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}}} Otherwise, the search algorithm can eliminate few elements in an iteration, increasing the number of iterations required in the average and worst case. This is the case for other search algorithms based on comparisons, as while they may work faster on some target values, the average performance over ''all'' elements is worse than binary search. By dividing the array in half, binary search ensures that the size of both subarrays are as similar as possible.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

=== Space complexity ===
Binary search requires three pointers to elements, which may be array indices or pointers to memory locations, regardless of the size of the array. Therefore, the space complexity of binary search is <math>O(1)</math> in the [[word RAM]] model of computation.

=== Derivation of average case ===
The average number of iterations performed by binary search depends on the probability of each element being searched. The average case is different for successful searches and unsuccessful searches. It will be assumed that each element is equally likely to be searched for successful searches. For unsuccessful searches, it will be assumed that the [[Interval (mathematics)|intervals]] between and outside elements are equally likely to be searched. The average case for successful searches is the number of iterations required to search every element exactly once, divided by <math>n</math>, the number of elements. The average case for unsuccessful searches is the number of iterations required to search an element within every interval exactly once, divided by the <math>n + 1</math> intervals.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

==== Successful searches ====
<!-- E stands for "expected" -->
In the binary tree representation, a successful search can be represented by a path from the root to the target node, called an ''internal path''. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length {{mvar|l}}, is <math>l + 1</math> counting the initial iteration. The ''internal path length'' is the sum of the lengths of all unique internal paths. Since there is only one path from the root to any single node, each internal path represents a search for a specific element. If there are {{mvar|n}} elements, which is a positive integer, and the internal path length is <math>I(n)</math>, then the average number of iterations for a successful search <math>T(n) = 1 + \frac{I(n)}{n}</math>, with the one iteration added to count the initial iteration.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

Since binary search is the optimal algorithm for searching with comparisons, this problem is reduced to calculating the minimum internal path length of all binary trees with {{mvar|n}} nodes, which is equal to:{{Sfn|Knuth|1997|loc=§2.3.4.5 ("Path length")}}

<math display="block">
I(n) = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor
</math>

For example, in a 7-element array, the root requires one iteration, the two elements below the root require two iterations, and the four elements below require three iterations. In this case, the internal path length is:{{Sfn|Knuth|1997|loc=§2.3.4.5 ("Path length")}}

<math display="block">
\sum_{k=1}^7 \left \lfloor \log_2(k) \right \rfloor = 0 + 2(1) + 4(2) = 2 + 8 = 10
</math>

The average number of iterations would be <math>1 + \frac{10}{7} = 2 \frac{3}{7}</math> based on the equation for the average case. The sum for <math>I(n)</math> can be simplified to:{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

<math display="block">
I(n) = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor = (n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2
</math>

Substituting the equation for <math>I(n)</math> into the equation for <math>T(n)</math>:{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

<math display="block">
T(n) = 1 + \frac{(n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2}{n} = \lfloor \log_2 (n) \rfloor + 1 - (2^{\lfloor \log_2 (n) \rfloor + 1} - \lfloor \log_2 (n) \rfloor - 2)/n
</math>

For integer {{mvar|n}}, this is equivalent to the equation for the average case on a successful search specified above.

==== Unsuccessful searches ====
Unsuccessful searches can be represented by augmenting the tree with ''external nodes'', which forms an ''extended binary tree''. If an internal node, or a node present in the tree, has fewer than two child nodes, then additional child nodes, called external nodes, are added so that each internal node has two children. By doing so, an unsuccessful search can be represented as a path to an external node, whose parent is the single element that remains during the last iteration. An ''external path'' is a path from the root to an external node. The ''external path length'' is the sum of the lengths of all unique external paths. If there are <math>n</math> elements, which is a positive integer, and the external path length is <math>E(n)</math>, then the average number of iterations for an unsuccessful search <math>T'(n)=\frac{E(n)}{n+1}</math>, with the one iteration added to count the initial iteration. The external path length is divided by <math>n+1</math> instead of <math>n</math> because there are <math>n+1</math> external paths, representing the intervals between and outside the elements of the array.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

This problem can similarly be reduced to determining the minimum external path length of all binary trees with <math>n</math> nodes. For all binary trees, the external path length is equal to the internal path length plus <math>2n</math>.{{Sfn|Knuth|1997|loc=§2.3.4.5 ("Path length")}} Substituting the equation for <math>I(n)</math>:{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

<math>
E(n) = I(n) + 2n = \left[(n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2\right] + 2n = (n + 1) (\lfloor \log_2 (n) \rfloor + 2) - 2^{\lfloor \log_2 (n) \rfloor + 1}
</math>

Substituting the equation for <math>E(n)</math> into the equation for <math>T'(n)</math>, the average case for unsuccessful searches can be determined:{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search"}}

<math>
T'(n) = \frac{(n + 1) (\lfloor \log_2 (n) \rfloor + 2) - 2^{\lfloor \log_2 (n) \rfloor + 1}}{(n+1)} = \lfloor \log_2 (n) \rfloor + 2 - 2^{\lfloor \log_2 (n) \rfloor + 1}/(n + 1)
</math>

==== Performance of alternative procedure ====
Each iteration of the binary search procedure defined above makes one or two comparisons, checking if the middle element is equal to the target in each iteration. Assuming that each element is equally likely to be searched, each iteration makes 1.5 comparisons on average. A variation of the algorithm checks whether the middle element is equal to the target at the end of the search. On average, this eliminates half a comparison from each iteration. This slightly cuts the time taken per iteration on most computers. However, it guarantees that the search takes the maximum number of iterations, on average adding one iteration to the search. Because the comparison loop is performed only <math display="inline">\lfloor \log_2 (n) + 1 \rfloor</math> times in the worst case, the slight increase in efficiency per iteration does not compensate for the extra iteration for all but very large <math display="inline">n</math>.{{Efn|{{Harvnb|Knuth|1998}} showed on his [[MIX (abstract machine)|MIX]] computer model, which Knuth designed as a representation of an ordinary computer, that the average running time of this variation for a successful search is <math display="inline">17.5 \log_2 n + 17</math> units of time compared to <math display="inline">18 \log_2 n - 16</math> units for regular binary search. The time complexity for this variation grows slightly more slowly, but at the cost of higher initial complexity.{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Exercise 23"}}}}{{Sfn|Knuth|1998|loc=§6.2.1 ("Searching an ordered table"), subsection "Exercise 23"}}<ref>{{cite journal|last1=Rolfe|first1=Timothy J.|s2cid=23752485|title=Analytic derivation of comparisons in binary search|journal=ACM SIGNUM Newsletter|date=1997|volume=32|issue=4|pages=15–19|doi=10.1145/289251.289255|doi-access=free}}</ref>

=== Running time and cache use ===
In analyzing the performance of binary search, another consideration is the time required to compare two elements. For integers and strings, the time required increases linearly as the encoding length (usually the number of [[bit]]s) of the elements increase. For example, comparing a pair of 64-bit unsigned integers would require comparing up to double the bits as comparing a pair of 32-bit unsigned integers. The worst case is achieved when the integers are equal. This can be significant when the encoding lengths of the elements are large, such as with large integer types or long strings, which makes comparing elements expensive. Furthermore, comparing [[Floating-point arithmetic|floating-point]] values (the most common digital representation of [[real number]]s) is often more expensive than comparing integers or short strings.

On most computer architectures, the [[Central processing unit|processor]] has a hardware [[Cache (computing)|cache]] separate from [[Random-access memory|RAM]]. Since they are located within the processor itself, caches are much faster to access but usually store much less data than RAM. Therefore, most processors store memory locations that have been accessed recently, along with memory locations close to it. For example, when an array element is accessed, the element itself may be stored along with the elements that are stored close to it in RAM, making it faster to sequentially access array elements that are close in index to each other ([[locality of reference]]). On a sorted array, binary search can jump to distant memory locations if the array is large, unlike algorithms (such as [[linear search]] and [[linear probing]] in [[hash tables]]) which access elements in sequence. This adds slightly to the running time of binary search for large arrays on most systems.<ref>{{cite journal|last1=Khuong|first1=Paul-Virak|last2=Morin|first2=Pat|s2cid=23752485|author2-link= Pat Morin |title=Array Layouts for Comparison-Based Searching|journal=Journal of Experimental Algorithmics|year=2017|volume=22|at=Article 1.3|doi=10.1145/3053370|arxiv=1509.05053}}</ref>