Obsidian/.archive/Literature Notes/Notes on Papers/Safe Reinforcement Learning via Shielding-Note.md
Dane Sabo 8824324149 Auto sync: 2025-08-26 11:21:28 (90 files changed)
R  "Zettelkasten/Literature Notes/.archive/CH4System_Representation_S2020pdf2254.md" -> ".archive/Literature Notes/.archive/CH4System_Representation_S2020pdf2254.md"

R  "Zettelkasten/Literature Notes/.archive/IntroductionDiffusionModels2022.md" -> ".archive/Literature Notes/.archive/IntroductionDiffusionModels2022.md"

R  "Zettelkasten/Literature Notes/.archive/Kry10TechnicalOverview.md" -> ".archive/Literature Notes/.archive/Kry10TechnicalOverview.md"

R  "Zettelkasten/Literature Notes/.archive/ME2046_Sampled_Data_Analysis_Reading_Chapter_2pdf2254ME.md" -> ".archive/Literature Notes/.archive/ME2046_Sampled_Data_Analysis_Reading_Chapter_2pdf2254ME.md"

R  "Zettelkasten/Literature Notes/.archive/ME2046_The_z_transform_Chapter_3pdf2254ME.md" -> ".archive/Literature Notes/.archive/ME2046_The_z_transform_Chapter_3pdf2254ME.md"

R  "Zettelkasten/Literature Notes/.archive/My Library.bib" -> ".archive/Literature Notes/.archive/My Library.bib"

R  "Zettelkasten/Literature Notes/.archive/aModeladoNucleoAnalisis2023.md" -> ".archive/Literature Notes/.archive/aModeladoNucleoAnalisis2023.md"

R  "Zettelkasten/Literature Notes/.archive/atsumiModifiedBodePlots2012.md" -> ".archive/Literature Notes/.archive/atsumiModifiedBodePlots2012.md"
2025-08-26 11:21:28 -04:00

639 B

First Pass

Category: Method

Context: Making RL agents safe by 'shielding' - baking in specification checking into training by providing feedback to the network when an output (even while the system is live) produces an unsafe control

Correctness: Cited 1000+ times. I think it's correct.

Contributions: Shielding, a way to use RL in high assurance systems.

Clarity: Well written and easy to understand.

Second Pass

What is the main thrust?

What is the supporting evidence?

What are the key findings?

Third Pass

Recreation Notes:

Hidden Findings:

Weak Points? Strong Points?