Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Reinforcement learning
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Safe reinforcement learning === Safe reinforcement learning (SRL) can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes.<ref>{{cite journal |last1=García |first1=Javier |last2=Fernández |first2=Fernando |title=A comprehensive survey on safe reinforcement learning |url=https://jmlr.org/papers/volume16/garcia15a/garcia15a.pdf |journal=The Journal of Machine Learning Research |date=1 January 2015 |volume=16 |issue=1 |pages=1437–1480 }}</ref> An alternative approach is risk-averse reinforcement learning, where instead of the ''expected'' return, a ''risk-measure'' of the return is optimized, such as the [[Expected shortfall|conditional value at risk]] (CVaR).<ref>{{Cite journal |last1=Dabney |first1=Will |last2=Ostrovski |first2=Georg |last3=Silver |first3=David |last4=Munos |first4=Remi |date=2018-07-03 |title=Implicit Quantile Networks for Distributional Reinforcement Learning |url=https://proceedings.mlr.press/v80/dabney18a.html |journal=Proceedings of the 35th International Conference on Machine Learning |language=en |publisher=PMLR |pages=1096–1105|arxiv=1806.06923 }}</ref> In addition to mitigating risk, the CVaR objective increases robustness to model uncertainties.<ref>{{Cite journal |last1=Chow |first1=Yinlam |last2=Tamar |first2=Aviv |last3=Mannor |first3=Shie |last4=Pavone |first4=Marco |date=2015 |title=Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach |url=https://proceedings.neurips.cc/paper/2015/hash/64223ccf70bbb65a3a4aceac37e21016-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=28|arxiv=1506.02188 }}</ref><ref>{{Cite web |title=Train Hard, Fight Easy: Robust Meta Reinforcement Learning |url=https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LnwyFkkAAAAJ&citation_for_view=LnwyFkkAAAAJ:eQOLeE2rZwMC |access-date=2024-06-21 |website=scholar.google.com}}</ref> However, CVaR optimization in risk-averse RL requires special care, to prevent gradient bias<ref>{{Cite journal |last1=Tamar |first1=Aviv |last2=Glassner |first2=Yonatan |last3=Mannor |first3=Shie |date=2015-02-21 |title=Optimizing the CVaR via Sampling |url=https://ojs.aaai.org/index.php/AAAI/article/view/9561 |journal=Proceedings of the AAAI Conference on Artificial Intelligence |language=en |volume=29 |issue=1 |doi=10.1609/aaai.v29i1.9561 |issn=2374-3468|arxiv=1404.3862 }}</ref> and blindness to success.<ref>{{Cite journal |last1=Greenberg |first1=Ido |last2=Chow |first2=Yinlam |last3=Ghavamzadeh |first3=Mohammad |last4=Mannor |first4=Shie |date=2022-12-06 |title=Efficient Risk-Averse Reinforcement Learning |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/d2511dfb731fa336739782ba825cd98c-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=32639–32652|arxiv=2205.05138 }}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Reinforcement learning
(section)
Add topic