Obsidian/Zettelkasten/Permanent Notes/Literature Notes/to_process/Safe Reinforcement Learning via Shielding-Note.md

# First Pass
**Category:**
Method

**Context:**
Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control

**Correctness:**
Cited 1000+ times. I think it's correct.

**Contributions:**
Shielding, a way to use RL in high assurance systems.

**Clarity:**
Well written and easy to understand.

# Second Pass
**What is the main thrust?**

**What is the supporting evidence?**

**What are the key findings?**

# Third Pass
**Recreation Notes:**

**Hidden Findings:**

**Weak Points? Strong Points?**