Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Reinforcement learning
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Research == {{More citations needed section|date=October 2022}} Research topics include: * actor-critic architecture<ref>{{Cite journal |last1=Grondman |first1=Ivo |last2=Vaandrager |first2=Maarten |last3=Busoniu |first3=Lucian |last4=Babuska |first4=Robert |last5=Schuitema |first5=Erik |date=2012-06-01 |title=Efficient Model Learning Methods for Actor–Critic Control |url=https://dl.acm.org/doi/10.1109/TSMCB.2011.2170565 |journal= IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)|volume=42 |issue=3 |pages=591–602 |doi=10.1109/TSMCB.2011.2170565 |pmid=22156998 |issn=1083-4419}}</ref> * actor-critic-scenery architecture<ref name="Li-2023" /> * adaptive methods that work with fewer (or no) parameters under a large number of conditions * bug detection in software projects<ref>{{Cite web |title=On the Use of Reinforcement Learning for Testing Game Mechanics : ACM - Computers in Entertainment |url=https://cie.acm.org/articles/use-reinforcements-learning-testing-game-mechanics/ |access-date=2018-11-27 |website=cie.acm.org |language=en}}</ref> * continuous learning * combinations with logic-based frameworks<ref>{{Cite journal|last1=Riveret|first1=Regis|last2=Gao|first2=Yang|date=2019|title=A probabilistic argumentation framework for reinforcement learning agents|journal=Autonomous Agents and Multi-Agent Systems|language=en|volume=33|issue=1–2|pages=216–274|doi=10.1007/s10458-019-09404-2|s2cid=71147890}}</ref> * exploration in large Markov decision processes * entity-based reinforcement learning<ref>{{cite arXiv |title=Entity-Centric Reinforcement Learning for Object Manipulation from Pixels |author1=Haramati, Dan |author2=Daniel, Tal |author3=Tamar, Aviv |eprint=2404.01220 |year=2024|class=cs.RO }}</ref><ref>{{cite conference |last1=Thompson |first1=Isaac Symes |last2=Caron |first2=Alberto |last3=Hicks |first3=Chris |last4=Mavroudis |first4=Vasilios |title=Entity-based Reinforcement Learning for Autonomous Cyber Defence |book-title=Proceedings of the Workshop on Autonomous Cybersecurity (AutonomousCyber '24) |pages=56–67 |date=2024-11-07 |doi=10.1145/3689933.3690835 |publisher=ACM|arxiv=2410.17647 }}</ref><ref>{{cite web |last=Winter |first=Clemens |title=Entity-Based Reinforcement Learning |url=https://clemenswinter.com/2023/04/14/entity-based-reinforcement-learning/ |date=2023-04-14 |website=Clemens Winter's Blog}}</ref> * [[reinforcement learning from human feedback|human feedback]]<ref>{{cite arXiv |last1=Yamagata |first1=Taku |last2=McConville |first2=Ryan |last3=Santos-Rodriguez |first3=Raul |date=2021-11-16 |title=Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills |class=cs.LG |eprint=2111.08596 }}</ref> * interaction between implicit and explicit learning in skill acquisition * [[Intrinsic motivation (artificial intelligence)|intrinsic motivation]] which differentiates information-seeking, curiosity-type behaviours from task-dependent goal-directed behaviours large-scale empirical evaluations * large (or continuous) action spaces * modular and hierarchical reinforcement learning<ref>{{Cite journal|last1=Kulkarni|first1=Tejas D.|last2=Narasimhan|first2=Karthik R.|last3=Saeedi|first3=Ardavan|last4=Tenenbaum|first4=Joshua B.|date=2016|title=Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation|url=http://dl.acm.org/citation.cfm?id=3157382.3157509|journal=Proceedings of the 30th International Conference on Neural Information Processing Systems|series=NIPS'16|location=USA|publisher=Curran Associates Inc.|pages=3682–3690|isbn=978-1-5108-3881-9|bibcode=2016arXiv160406057K|arxiv=1604.06057}}</ref> * multiagent/distributed reinforcement learning is a topic of interest. Applications are expanding.<ref>{{Cite web |title=Reinforcement Learning / Successes of Reinforcement Learning |url=http://umichrl.pbworks.com/Successes-of-Reinforcement-Learning/ |access-date=2017-08-06 |website=umichrl.pbworks.com}}</ref> * occupant-centric control * optimization of computing resources<ref>{{Cite book |last1=Dey |first1=Somdip |last2=Singh |first2=Amit Kumar |last3=Wang |first3=Xiaohang |last4=McDonald-Maier |first4=Klaus |title=2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) |chapter=User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs |date=March 2020 |chapter-url=https://ieeexplore.ieee.org/document/9116294 |pages=1728–1733 |doi=10.23919/DATE48585.2020.9116294 |isbn=978-3-9819263-4-7 |s2cid=219858480|url=http://repository.essex.ac.uk/27546/1/User%20Interaction%20Aware%20Reinforcement%20Learning.pdf }}</ref><ref>{{Cite web |last=Quested |first=Tony |title=Smartphones get smarter with Essex innovation |work=Business Weekly |url=https://www.businessweekly.co.uk/news/academia-research/smartphones-get-smarter-essex-innovation |access-date=2021-06-17}}</ref><ref>{{Cite web |last=Williams |first=Rhiannon |date=2020-07-21 |title=Future smartphones 'will prolong their own battery life by monitoring owners' behaviour' |url=https://inews.co.uk/news/technology/future-smartphones-prolong-battery-life-monitoring-behaviour-558689 |access-date=2021-06-17 |website=[[i (British newspaper)|i]] |language=en}}</ref> * [[Partially observable Markov decision process|partial information]] (e.g., using [[predictive state representation]]) * reward function based on maximising novel information<ref name="kaplan2004">{{cite book |last1=Kaplan |first1=F. |title=Embodied Artificial Intelligence |last2=Oudeyer |first2=P. |chapter=Maximizing Learning Progress: An Internal Reward System for Development |publisher=Springer |year=2004 |isbn=978-3-540-22484-6 |editor-last=Iida |editor-first=F. |series=Lecture Notes in Computer Science |volume=3139 |location=Berlin; Heidelberg |pages=259–270 |doi=10.1007/978-3-540-27833-7_19 |s2cid=9781221 |editor2-last=Pfeifer |editor2-first=R. |editor3-last=Steels |editor3-first=L. |editor4-last=Kuniyoshi |editor4-first=Y.}}</ref><ref name="klyubin2008">{{cite journal |last1=Klyubin |first1=A. |last2=Polani |first2=D. |last3=Nehaniv |first3=C. |year=2008 |title=Keep your options open: an information-based driving principle for sensorimotor systems |journal=PLOS ONE |volume=3 |issue=12 |pages=e4018 |bibcode=2008PLoSO...3.4018K |doi=10.1371/journal.pone.0004018 |pmc=2607028 |pmid=19107219 |doi-access=free}}</ref><ref name="barto2013">{{cite book |last=Barto |first=A. G. |url=https://people.cs.umass.edu/~barto/IMCleVer-chapter-totypeset2.pdf |title=Intrinsically Motivated Learning in Natural and Artificial Systems |publisher=Springer |year=2013 |location=Berlin; Heidelberg |pages=17–47 |chapter=Intrinsic motivation and reinforcement learning}}</ref> * sample-based planning (e.g., based on [[Monte Carlo tree search]]). * securities trading<ref>{{cite journal |last1=Dabérius |first1=Kevin |last2=Granat |first2=Elvin |last3=Karlsson |first3=Patrik |date=2020 |title=Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks |ssrn=3374766 |journal=The Journal of Machine Learning in Finance |volume=1}}</ref> * [[transfer learning]]<ref>{{Cite journal|last1=George Karimpanal|first1=Thommen|last2=Bouffanais|first2=Roland|date=2019|title=Self-organizing maps for storage and transfer of knowledge in reinforcement learning|journal=Adaptive Behavior|language=en|volume=27|issue=2|pages=111–126|doi=10.1177/1059712318818568|issn=1059-7123|arxiv=1811.08318|s2cid=53774629}}</ref> * TD learning modeling [[dopamine]]-based learning in the brain. [[Dopaminergic]] projections from the [[substantia nigra]] to the [[basal ganglia]] function are the prediction error. * value-function and policy search methods
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Reinforcement learning
(section)
Add topic