Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Reinforcement learning
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Inverse reinforcement learning === In inverse reinforcement learning (IRL), no reward function is given. Instead, the reward function is inferred given an observed behavior from an expert. The idea is to mimic observed behavior, which is often optimal or close to optimal.<ref>{{cite book |last1=Ng |first1=A. Y. |last2=Russell |first2=S. J. |year=2000 |chapter=Algorithms for Inverse Reinforcement Learning |title=Proceeding ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning |pages=663β670 |publisher=Morgan Kaufmann Publishers |isbn=1-55860-707-2 |chapter-url=https://ai.stanford.edu/~ang/papers/icml00-irl.pdf }}</ref> One popular IRL paradigm is named maximum entropy inverse reinforcement learning (MaxEnt IRL).<ref>{{Cite journal |last1=Ziebart |first1=Brian D. |last2=Maas |first2=Andrew |last3=Bagnell |first3=J. Andrew |last4=Dey |first4=Anind K. |date=2008-07-13 |title=Maximum entropy inverse reinforcement learning |url=https://dl.acm.org/doi/10.5555/1620270.1620297 |journal=Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3 |series=AAAI'08 |location=Chicago, Illinois |publisher=AAAI Press |pages=1433β1438 |isbn=978-1-57735-368-3 |s2cid=336219}}</ref> MaxEnt IRL estimates the parameters of a linear model of the reward function by maximizing the entropy of the probability distribution of observed trajectories subject to constraints related to matching expected feature counts. Recently it has been shown that MaxEnt IRL is a particular case of a more general framework named random utility inverse reinforcement learning (RU-IRL).<ref>{{Cite journal |last1=Pitombeira-Neto |first1=Anselmo R. |last2=Santos |first2=Helano P. |last3=Coelho da Silva |first3=Ticiana L. |last4=de Macedo |first4=JosΓ© Antonio F. |date=March 2024 |title=Trajectory modeling via random utility inverse reinforcement learning |url=https://doi.org/10.1016/j.ins.2024.120128 |journal=Information Sciences |volume=660 |pages=120128 |doi=10.1016/j.ins.2024.120128 |issn=0020-0255 |s2cid=235187141|arxiv=2105.12092 }}</ref> RU-IRL is based on [[Random utility model|random utility theory]] and Markov decision processes. While prior IRL approaches assume that the apparent random behavior of an observed agent is due to it following a random policy, RU-IRL assumes that the observed agent follows a deterministic policy but randomness in observed behavior is due to the fact that an observer only has partial access to the features the observed agent uses in decision making. The utility function is modeled as a random variable to account for the ignorance of the observer regarding the features the observed agent actually considers in its utility function.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Reinforcement learning
(section)
Add topic