Editing Mathematical optimization (section)

===Iterative methods===
{{Main|Iterative method}}

The [[iterative methods]] used to solve problems of [[nonlinear programming]] differ according to whether they [[subroutine|evaluate]] [[Hessian matrix|Hessians]], gradients, or only function values. While evaluating Hessians (H) and gradients (G) improves the rate of convergence, for functions for which these quantities exist and vary sufficiently smoothly, such evaluations increase the [[Computational complexity theory|computational complexity]] (or computational cost) of each iteration. In some cases, the computational complexity may be excessively high.

One major criterion for optimizers is just the number of required function evaluations as this often is already a large computational effort, usually much more effort than within the optimizer itself, which mainly has to operate over the N variables. The derivatives provide detailed information for such optimizers, but are even harder to calculate, e.g. approximating the gradient takes at least N+1 function evaluations. For approximations of the 2nd derivatives (collected in the Hessian matrix), the number of function evaluations is in the order of N². Newton's method requires the 2nd-order derivatives, so for each iteration, the number of function calls is in the order of N², but for a simpler pure gradient optimizer it is only N. However, gradient optimizers need usually more iterations than Newton's algorithm. Which one is best with respect to the number of function calls depends on the problem itself.
* Methods that evaluate Hessians (or approximate Hessians, using [[finite difference]]s):
** [[Newton's method in optimization|Newton's method]]
** [[Sequential quadratic programming]]: A Newton-based method for small-medium scale ''constrained'' problems. Some versions can handle large-dimensional problems.
** [[Interior point methods]]: This is a large class of methods for constrained optimization, some of which use only (sub)gradient information and others of which require the evaluation of Hessians.
* Methods that evaluate gradients, or approximate gradients in some way (or even subgradients):
** [[Coordinate descent]] methods: Algorithms which update a single coordinate in each iteration
** [[Conjugate gradient method]]s: [[Iterative method]]s for large problems. (In theory, these methods terminate in a finite number of steps with quadratic objective functions, but this finite termination is not observed in practice on finite–precision computers.)
** [[Gradient descent]] (alternatively, "steepest descent" or "steepest ascent"): A (slow) method of historical and theoretical interest, which has had renewed interest for finding approximate solutions of enormous problems.
** [[Subgradient method]]s: An iterative method for large [[Rademacher's theorem|locally]] [[Lipschitz continuity|Lipschitz functions]] using [[subgradient|generalized gradients]]. Following Boris T. Polyak, subgradient–projection methods are similar to conjugate–gradient methods.
** Bundle method of descent: An iterative method for small–medium-sized problems with locally Lipschitz functions, particularly for [[convex optimization|convex minimization]] problems (similar to conjugate gradient methods).
** [[Ellipsoid method]]: An iterative method for small problems with [[quasiconvex function|quasiconvex]] objective functions and of great theoretical interest, particularly in establishing the polynomial time complexity of some combinatorial optimization problems. It has similarities with Quasi-Newton methods.
** [[Frank–Wolfe algorithm|Conditional gradient method (Frank–Wolfe)]] for approximate minimization of specially structured problems with [[linear constraints]], especially with traffic networks. For general unconstrained problems, this method reduces to the gradient method, which is regarded as obsolete (for almost all problems).
** [[Quasi-Newton method]]s: Iterative methods for medium-large problems (e.g. N<1000).
** [[Simultaneous perturbation stochastic approximation]] (SPSA) method for stochastic optimization; uses random (efficient) gradient approximation.
* Methods that evaluate only function values: If a problem is continuously differentiable, then gradients can be approximated using finite differences, in which case a gradient-based method can be used.
** [[Interpolation]] methods
** [[Pattern search (optimization)|Pattern search]] methods, which have better convergence properties than the [[Nelder–Mead method|Nelder–Mead heuristic (with simplices)]], which is listed below.
** [[Mirror descent]]