Editing Reinforcement learning (section)

== Theory ==
Both the asymptotic and finite-sample behaviors of most algorithms are well understood. Algorithms with provably good online performance (addressing the exploration issue) are known.

Efficient exploration of Markov decision processes is given in Burnetas and Katehakis (1997).<ref name="Optimal adaptive policies for Marko" /> Finite-time performance bounds have also appeared for many algorithms, but these bounds are expected to be rather loose and thus more work is needed to better understand the relative advantages and limitations.

For incremental algorithms, asymptotic convergence issues have been settled.{{Clarify|reason=What are the issues that have been settled?|date=January 2020}} Temporal-difference-based algorithms converge under a wider set of conditions than was previously possible (for example, when used with arbitrary, smooth function approximation).