Obsidian/Notes on Papers/Safe Reinforcement Learning via Shielding.md

30 lines
639 B
Markdown

# First Pass
**Category:**
Method
**Context:**
Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control
**Correctness:**
Cited 1000+ times. I think it's correct.
**Contributions:**
Shielding, a way to use RL in high assurance systems.
**Clarity:**
Well written and easy to understand.
# Second Pass
**What is the main thrust?**
**What is the supporting evidence?**
**What are the key findings?**
# Third Pass
**Recreation Notes:**
**Hidden Findings:**
**Weak Points? Strong Points?**