30 lines
639 B
Markdown
30 lines
639 B
Markdown
# First Pass
|
|
**Category:**
|
|
Method
|
|
|
|
**Context:**
|
|
Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control
|
|
|
|
**Correctness:**
|
|
Cited 1000+ times. I think it's correct.
|
|
|
|
**Contributions:**
|
|
Shielding, a way to use RL in high assurance systems.
|
|
|
|
**Clarity:**
|
|
Well written and easy to understand.
|
|
|
|
# Second Pass
|
|
**What is the main thrust?**
|
|
|
|
**What is the supporting evidence?**
|
|
|
|
**What are the key findings?**
|
|
|
|
# Third Pass
|
|
**Recreation Notes:**
|
|
|
|
**Hidden Findings:**
|
|
|
|
**Weak Points? Strong Points?**
|