# First Pass **Category:** Method **Context:** Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control **Correctness:** Cited 1000+ times. I think it's correct. **Contributions:** Shielding, a way to use RL in high assurance systems. **Clarity:** Well written and easy to understand. # Second Pass **What is the main thrust?** **What is the supporting evidence?** **What are the key findings?** # Third Pass **Recreation Notes:** **Hidden Findings:** **Weak Points? Strong Points?**