Editing Reinforcement learning (section)

== Research ==
{{More citations needed section|date=October 2022}}
Research topics include:
* actor-critic architecture<ref>{{Cite journal |last1=Grondman |first1=Ivo |last2=Vaandrager |first2=Maarten |last3=Busoniu |first3=Lucian |last4=Babuska |first4=Robert |last5=Schuitema |first5=Erik |date=2012-06-01 |title=Efficient Model Learning Methods for Actor–Critic Control |url=https://dl.acm.org/doi/10.1109/TSMCB.2011.2170565 |journal= IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)|volume=42 |issue=3 |pages=591–602 |doi=10.1109/TSMCB.2011.2170565 |pmid=22156998 |issn=1083-4419}}</ref>
* actor-critic-scenery architecture<ref name="Li-2023" />
* adaptive methods that work with fewer (or no) parameters under a large number of conditions
* bug detection in software projects<ref>{{Cite web |title=On the Use of Reinforcement Learning for Testing Game Mechanics : ACM - Computers in Entertainment |url=https://cie.acm.org/articles/use-reinforcements-learning-testing-game-mechanics/ |access-date=2018-11-27 |website=cie.acm.org |language=en}}</ref>
* continuous learning
* combinations with logic-based frameworks<ref>{{Cite journal|last1=Riveret|first1=Regis|last2=Gao|first2=Yang|date=2019|title=A probabilistic argumentation framework for reinforcement learning agents|journal=Autonomous Agents and Multi-Agent Systems|language=en|volume=33|issue=1–2|pages=216–274|doi=10.1007/s10458-019-09404-2|s2cid=71147890}}</ref>
* exploration in large Markov decision processes
* entity-based reinforcement learning<ref>{{cite arXiv |title=Entity-Centric Reinforcement Learning for Object Manipulation from Pixels |author1=Haramati, Dan |author2=Daniel, Tal |author3=Tamar, Aviv |eprint=2404.01220 |year=2024|class=cs.RO }}</ref><ref>{{cite conference |last1=Thompson |first1=Isaac Symes |last2=Caron |first2=Alberto |last3=Hicks |first3=Chris |last4=Mavroudis |first4=Vasilios |title=Entity-based Reinforcement Learning for Autonomous Cyber Defence |book-title=Proceedings of the Workshop on Autonomous Cybersecurity (AutonomousCyber '24) |pages=56–67 |date=2024-11-07 |doi=10.1145/3689933.3690835 |publisher=ACM|arxiv=2410.17647 }}</ref><ref>{{cite web |last=Winter |first=Clemens |title=Entity-Based Reinforcement Learning |url=https://clemenswinter.com/2023/04/14/entity-based-reinforcement-learning/ |date=2023-04-14 |website=Clemens Winter's Blog}}</ref>
* [[reinforcement learning from human feedback|human feedback]]<ref>{{cite arXiv |last1=Yamagata |first1=Taku |last2=McConville |first2=Ryan |last3=Santos-Rodriguez |first3=Raul |date=2021-11-16 |title=Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills |class=cs.LG |eprint=2111.08596 }}</ref>
* interaction between implicit and explicit learning in skill acquisition
* [[Intrinsic motivation (artificial intelligence)|intrinsic motivation]] which differentiates information-seeking, curiosity-type behaviours from task-dependent goal-directed behaviours large-scale empirical evaluations
* large (or continuous) action spaces
* modular and hierarchical reinforcement learning<ref>{{Cite journal|last1=Kulkarni|first1=Tejas D.|last2=Narasimhan|first2=Karthik R.|last3=Saeedi|first3=Ardavan|last4=Tenenbaum|first4=Joshua B.|date=2016|title=Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation|url=http://dl.acm.org/citation.cfm?id=3157382.3157509|journal=Proceedings of the 30th International Conference on Neural Information Processing Systems|series=NIPS'16|location=USA|publisher=Curran Associates Inc.|pages=3682–3690|isbn=978-1-5108-3881-9|bibcode=2016arXiv160406057K|arxiv=1604.06057}}</ref>
* multiagent/distributed reinforcement learning is a topic of interest. Applications are expanding.<ref>{{Cite web |title=Reinforcement Learning / Successes of Reinforcement Learning |url=http://umichrl.pbworks.com/Successes-of-Reinforcement-Learning/ |access-date=2017-08-06 |website=umichrl.pbworks.com}}</ref>
* occupant-centric control
* optimization of computing resources<ref>{{Cite book |last1=Dey |first1=Somdip |last2=Singh |first2=Amit Kumar |last3=Wang |first3=Xiaohang |last4=McDonald-Maier |first4=Klaus |title=2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) |chapter=User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs |date=March 2020 |chapter-url=https://ieeexplore.ieee.org/document/9116294 |pages=1728–1733 |doi=10.23919/DATE48585.2020.9116294 |isbn=978-3-9819263-4-7 |s2cid=219858480|url=http://repository.essex.ac.uk/27546/1/User%20Interaction%20Aware%20Reinforcement%20Learning.pdf }}</ref><ref>{{Cite web |last=Quested |first=Tony |title=Smartphones get smarter with Essex innovation |work=Business Weekly |url=https://www.businessweekly.co.uk/news/academia-research/smartphones-get-smarter-essex-innovation |access-date=2021-06-17}}</ref><ref>{{Cite web |last=Williams |first=Rhiannon |date=2020-07-21 |title=Future smartphones 'will prolong their own battery life by monitoring owners' behaviour' |url=https://inews.co.uk/news/technology/future-smartphones-prolong-battery-life-monitoring-behaviour-558689 |access-date=2021-06-17 |website=[[i (British newspaper)|i]] |language=en}}</ref>
* [[Partially observable Markov decision process|partial information]] (e.g., using [[predictive state representation]])
* reward function based on maximising novel information<ref name="kaplan2004">{{cite book |last1=Kaplan |first1=F. |title=Embodied Artificial Intelligence |last2=Oudeyer |first2=P. |chapter=Maximizing Learning Progress: An Internal Reward System for Development |publisher=Springer |year=2004 |isbn=978-3-540-22484-6 |editor-last=Iida |editor-first=F. |series=Lecture Notes in Computer Science |volume=3139 |location=Berlin; Heidelberg |pages=259–270 |doi=10.1007/978-3-540-27833-7_19 |s2cid=9781221 |editor2-last=Pfeifer |editor2-first=R. |editor3-last=Steels |editor3-first=L. |editor4-last=Kuniyoshi |editor4-first=Y.}}</ref><ref name="klyubin2008">{{cite journal |last1=Klyubin |first1=A. |last2=Polani |first2=D. |last3=Nehaniv |first3=C. |year=2008 |title=Keep your options open: an information-based driving principle for sensorimotor systems |journal=PLOS ONE |volume=3 |issue=12 |pages=e4018 |bibcode=2008PLoSO...3.4018K |doi=10.1371/journal.pone.0004018 |pmc=2607028 |pmid=19107219 |doi-access=free}}</ref><ref name="barto2013">{{cite book |last=Barto |first=A. G. |url=https://people.cs.umass.edu/~barto/IMCleVer-chapter-totypeset2.pdf |title=Intrinsically Motivated Learning in Natural and Artificial Systems |publisher=Springer |year=2013 |location=Berlin; Heidelberg |pages=17–47 |chapter=Intrinsic motivation and reinforcement learning}}</ref>
* sample-based planning (e.g., based on [[Monte Carlo tree search]]).
* securities trading<ref>{{cite journal |last1=Dabérius |first1=Kevin |last2=Granat |first2=Elvin |last3=Karlsson |first3=Patrik |date=2020 |title=Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks |ssrn=3374766 |journal=The Journal of Machine Learning in Finance |volume=1}}</ref>
* [[transfer learning]]<ref>{{Cite journal|last1=George Karimpanal|first1=Thommen|last2=Bouffanais|first2=Roland|date=2019|title=Self-organizing maps for storage and transfer of knowledge in reinforcement learning|journal=Adaptive Behavior|language=en|volume=27|issue=2|pages=111–126|doi=10.1177/1059712318818568|issn=1059-7123|arxiv=1811.08318|s2cid=53774629}}</ref>
* TD learning modeling [[dopamine]]-based learning in the brain. [[Dopaminergic]] projections from the [[substantia nigra]] to the [[basal ganglia]] function are the prediction error.
* value-function and policy search methods