Editing Fast Fourier transform (section)

==Computational issues==

===Bounds on complexity and operation counts===
{{unsolved|computer science|What is the lower bound on the complexity of fast Fourier transform algorithms?  Can they be faster than <math>O(N\log N)</math>?}}
A fundamental question of longstanding theoretical interest is to prove lower bounds on the [[computational complexity theory|complexity]] and exact operation counts of fast Fourier transforms, and many open problems remain. It is not rigorously proved whether DFTs truly require <math display="inline">\Omega(n \log n)</math> (i.e., order ''<math>n \log n</math>'' or greater) operations, even for the simple case of [[power of two]] sizes, although no algorithms with lower complexity are known. In particular, the count of arithmetic operations is usually the focus of such questions, although actual performance on modern-day computers is determined by many other factors such as [[Cache (computing)|cache]] or [[CPU pipeline]] optimization.

Following work by [[Shmuel Winograd]] (1978),<ref name="Winograd_1978"/> a tight <math>\Theta(n)</math> lower bound is known for the number of real multiplications required by an FFT. It can be shown that only <math display="inline">4n - 2\log_2^2(n) - 2\log_2(n) - 4</math> irrational real multiplications are required to compute a DFT of power-of-two length <math>n = 2^m</math>. Moreover, explicit algorithms that achieve this count are known (Heideman & [[Charles Sidney Burrus|Burrus]], 1986;<ref name="Heideman_Burrus_1986"/> Duhamel, 1990<ref name="Duhamel_1990"/>). However, these algorithms require too many additions to be practical, at least on modern computers with hardware multipliers (Duhamel, 1990;<ref name="Duhamel_1990"/> Frigo & [[Steven G. Johnson|Johnson]], 2005).<ref name="Frigo_Johnson_2005"/>

A tight lower bound is not known on the number of required additions, although lower bounds have been proved under some restrictive assumptions on the algorithms. In 1973, Morgenstern<ref name="Morgenstern_1973"/> proved an <math>\Omega(n \log n)</math> lower bound on the addition count for algorithms where the multiplicative constants have bounded magnitudes (which is true for most but not all FFT algorithms).  [[Victor Pan|Pan]] (1986)<ref name="Pan_1986"/> proved an <math>\Omega(n \log n)</math> lower bound assuming a bound on a measure of the FFT algorithm's ''asynchronicity'', but the generality of this assumption is unclear. For the case of power-of-two {{mvar|n}}, [[Christos Papadimitriou|Papadimitriou]] (1979)<ref name="Papadimitriou_1979"/> argued that the number <math display="inline">n \log_2 n</math> of complex-number additions achieved by Cooley–Tukey algorithms is ''optimal'' under certain assumptions on the [[Graph (discrete mathematics)|graph]] of the algorithm (his assumptions imply, among other things, that no additive identities in the roots of unity are exploited). (This argument would imply that at least <math display="inline">2N \log_2 N</math> real additions are required, although this is not a tight bound because extra additions are required as part of complex-number multiplications.) Thus far, no published FFT algorithm has achieved fewer than <math display="inline">n \log_2 n</math> complex-number additions (or their equivalent) for power-of-two {{mvar|n}}.

A third problem is to minimize the ''total'' number of real multiplications and additions, sometimes called the ''arithmetic complexity'' (although in this context it is the exact count and not the asymptotic complexity that is being considered).  Again, no tight lower bound has been proven. Since 1968, however, the lowest published count for power-of-two {{mvar|n}} was long achieved by the [[split-radix FFT algorithm]], which requires <math display="inline">4n\log_2(n) - 6n + 8</math> real multiplications and additions for {{Math|''n'' > 1}}. This was recently reduced to <math display="inline">\sim \frac{34}{9} n \log_2 n</math> (Johnson and Frigo, 2007;<ref name="Frigo_Johnson_2007"/> Lundy and Van Buskirk, 2007<ref name="Lundy_Buskirk_2007"/>). A slightly larger count (but still better than split radix for {{math|''n'' ≥ 256}}) was shown to be provably optimal for {{math|''n'' ≤ 512}} under additional restrictions on the possible algorithms (split-radix-like flowgraphs with unit-modulus multiplicative factors), by reduction to a [[satisfiability modulo theories]] problem solvable by [[Proof by exhaustion|brute force]] (Haynal & Haynal, 2011).<ref name="Haynal_2011"/>

Most of the attempts to lower or prove the complexity of FFT algorithms have focused on the ordinary complex-data case, because it is the simplest. However, complex-data FFTs are so closely related to algorithms for related problems such as real-data FFTs, [[discrete cosine transform]]s, [[discrete Hartley transform]]s, and so on, that any improvement in one of these would immediately lead to improvements in the others (Duhamel & Vetterli, 1990).<ref name="Duhamel_Vetterli_1990"/>

===Approximations===
All of the FFT algorithms discussed above compute the DFT exactly (i.e. neglecting [[floating-point]] errors). A few FFT algorithms have been proposed, however, that compute the DFT ''approximately'', with an error that can be made arbitrarily small at the expense of increased computations. Such algorithms trade the approximation error for increased speed or other properties. For example, an approximate FFT algorithm by Edelman et al. (1999)<ref name="Edelman_McCorquodale_Toledo_1999"/> achieves lower communication requirements for [[parallel computing]] with the help of a [[fast multipole method]]. A [[wavelet]]-based approximate FFT by Guo and Burrus (1996)<ref name="Guo_Burrus_1996"/> takes sparse inputs/outputs (time/frequency localization) into account more efficiently than is possible with an exact FFT. Another algorithm for approximate computation of a subset of the DFT outputs is due to Shentov et al. (1995).<ref name="Shentov_Mitra_Heute_Hossen_1995"/> The Edelman algorithm works equally well for sparse and non-sparse data, since it is based on the compressibility (rank deficiency) of the Fourier matrix itself rather than the compressibility (sparsity) of the data. Conversely, if the data are sparse—that is, if only {{mvar|k}} out of {{mvar|n}} Fourier coefficients are nonzero—then the complexity can be reduced to <math>O(k \log n \log n/k)</math>, and this has been demonstrated to lead to practical speedups compared to an ordinary FFT for {{math|''n''/''k'' > 32}} in a large-{{mvar|n}} example ({{math|''n'' {{=}} 2{{sup|22}}}}) using a probabilistic approximate algorithm (which estimates the largest {{mvar|k}} coefficients to several decimal places).<ref name="Hassanieh_2012"/>

===Accuracy===
FFT algorithms have errors when finite-precision floating-point arithmetic is used, but these errors are typically quite small; most FFT algorithms, e.g. Cooley–Tukey, have excellent numerical properties as a consequence of the [[pairwise summation]] structure of the algorithms.  The upper bound on the [[relative error]] for the Cooley–Tukey algorithm is <math display="inline">O(\varepsilon \log n)</math>, compared to <math display="inline">O(\varepsilon n^{3/2})</math> for the naïve DFT formula,<ref name="Gentleman_Sande_1966"/> where {{math|{{varepsilon}}}} is the machine floating-point relative precision. In fact, the [[root mean square]] (rms) errors are much better than these upper bounds, being only <math display="inline">O(\varepsilon \sqrt{\log n})</math> for Cooley–Tukey and <math display="inline">O(\varepsilon \sqrt{n})</math> for the naïve DFT (Schatzman, 1996).<ref name="Schatzman_1996"/> These results, however, are very sensitive to the accuracy of the twiddle factors used in the FFT (i.e. the [[trigonometric function]] values), and it is not unusual for incautious FFT implementations to have much worse accuracy, e.g. if they use inaccurate [[generating trigonometric tables|trigonometric recurrence]] formulas. Some FFTs other than Cooley–Tukey, such as the Rader–Brenner algorithm, are intrinsically less stable.

In [[fixed-point arithmetic]], the finite-precision errors accumulated by FFT algorithms are worse, with rms errors growing as <math display="inline">O(\sqrt{n})</math> for the Cooley–Tukey algorithm (Welch, 1969).<ref name="Welch_1969"/> Achieving this accuracy requires careful attention to scaling to minimize loss of precision, and fixed-point FFT algorithms involve rescaling at each intermediate stage of decompositions like Cooley–Tukey.

To verify the correctness of an FFT implementation, rigorous guarantees can be obtained in <math display="inline">O(n \log n)</math> time by a simple procedure checking the linearity, impulse-response, and time-shift properties of the transform on random inputs (Ergün, 1995).<ref name="Ergün_1995"/>

The values for intermediate frequencies may be obtained by various averaging methods.