# First Pass
**Category:** 
Method

**Context:** 
Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control

**Correctness:** 
Cited 1000+ times. I think it's correct.

**Contributions:** 
Shielding, a way to use RL in high assurance systems.

**Clarity:** 
Well written and easy to understand.

# Second Pass
**What is the main thrust?**

**What is the supporting evidence?**

**What are the key findings?**

# Third Pass
**Recreation Notes:**

**Hidden Findings:**

**Weak Points? Strong Points?**