Obsidian/Writing/ERLM/2-state-of-the-art/v7.tex

\section{State of the Art and Limits of Current Practice}

The principal aim of this research is to create autonomous reactor control
systems that are tractably safe. To understand what is being automated, we must
first understand how nuclear reactors are operated today. This section examines
reactor operators and the operating procedures we aim to leverage, then
investigates limitations of human-based operation, and concludes with current
formal methods approaches to reactor control systems.

\subsection{Current Reactor Procedures and Operation}

Nuclear plant procedures exist in a hierarchy: normal operating procedures for
routine operations, abnormal operating procedures for off-normal conditions,
Emergency Operating Procedures (EOPs) for design-basis accidents, Severe
Accident Management Guidelines (SAMGs) for beyond-design-basis events, and
Extensive Damage Mitigation Guidelines (EDMGs) for catastrophic damage
scenarios. These procedures must comply with 10 CFR 50.34(b)(6)(ii) and are
developed using guidance from NUREG-0900~\cite{NUREG-0899, 10CFR50.34}, but their
development relies fundamentally on expert judgment and simulator validation
rather than formal verification. Procedures undergo technical evaluation,
simulator validation testing, and biennial review as part of operator
requalification under 10 CFR 55.59~\cite{10CFR55.59}. Despite this rigor,
procedures fundamentally lack formal verification of key safety properties. No
mathematical proof exists that procedures cover all possible plant states, that
required actions can be completed within available timeframes, or that
transitions between procedure sets maintain safety invariants.

\textbf{LIMITATION:} \textit{Procedures lack formal verification of correctness
and completeness.} Current procedure development relies on expert judgment and
simulator validation. No mathematical proof exists that procedures cover all
possible plant states, that required actions can be completed within available
timeframes, or that transitions between procedure sets maintain safety
invariants. Paper-based procedures cannot ensure correct application, and even
computer-based procedure systems lack the formal guarantees that automated
reasoning could provide.

Nuclear plants operate with multiple control modes: automatic control, where the
reactor control system maintains target parameters through continuous reactivity
adjustment; manual control, where operators directly manipulate the reactor; and
various intermediate modes. In typical pressurized water reactor operation, the
reactor control system automatically maintains a floating average temperature
and compensates for power demand changes through reactivity feedback loops
alone. Safety systems, by contrast, operate with implemented automation. Reactor
Protection Systems trip automatically on safety signals with millisecond
response times, and engineered safety features actuate automatically on accident
signals without operator action required.

The division between automated and human-controlled functions reveals the
fundamental challenge of hybrid control. Highly automated systems handle reactor
protection---automatic trips on safety parameters, emergency core cooling
actuation, containment isolation, and basic process
control~\cite{WRPS.Description, gentillon_westinghouse_1999}. Human operators,
however, retain control of strategic decision-making: power level changes,
startup/shutdown sequences, mode transitions, and procedure implementation.

\subsection{Human Factors in Nuclear Accidents}

Current-generation nuclear power plants employ over 3,600 active NRC-licensed
reactor operators in the United States~\cite{operator_statistics}. These
operators divide into Reactor Operators (ROs), who manipulate reactor controls,
and Senior Reactor Operators (SROs), who direct plant operations and serve as
shift supervisors~\cite{10CFR55}. Staffing typically requires at least two ROs
and one SRO for current-generation units~\cite{10CFR50.54}. Becoming a reactor
operator requires several years of training.

The persistent role of human error in nuclear safety incidents---despite decades
of improvements in training and procedures---provides the most compelling
motivation for formal automated control with mathematical safety guarantees.
Operators hold legal authority under 10 CFR Part 55 to make critical decisions,
including departing from normal regulations during emergencies. The Three Mile
Island (TMI) accident demonstrated how a combination of personnel error, design
deficiencies, and component failures led to partial meltdown when operators
misread confusing and contradictory readings and shut off the emergency water
system~\cite{Kemeny1979}. The President's Commission on TMI identified a
fundamental ambiguity: placing responsibility for safe power plant operations on
the licensee without formal verification that operators can fulfill this
responsibility does not guarantee safety. This tension between operational
flexibility and safety assurance remains unresolved: the person responsible for
reactor safety is often the root cause of failures.

Multiple independent analyses converge on a striking statistic: 70--80\% of
nuclear power plant events are attributed to human error, versus approximately
20\% to equipment failures~\cite{WNA2020}. More significantly, the root cause of
all severe accidents at nuclear power plants---Three Mile Island, Chernobyl, and
Fukushima Daiichi---has been identified as poor safety management and safety
culture: primarily human factors~\cite{hogberg_root_2013}. A detailed analysis
of 190 events at Chinese nuclear power plants from
2007--2020~\cite{zhang_analysis_2025} found that 53\% of events involved active
errors, while 92\% were associated with latent errors---organizational and
systemic weaknesses that create conditions for failure.


\textbf{LIMITATION:} \textit{Human factors impose fundamental reliability limits
that cannot be overcome through training alone.} The persistent human
error contribution despite four decades of improvements demonstrates that these
limitations are fundamental rather than a remediable part of human-driven control.

\subsection{HARDENS and Formal Methods}

The High Assurance Rigorous Digital Engineering for Nuclear Safety (HARDENS)
project represents the most advanced application of formal methods to nuclear
reactor control systems to date~\cite{Kiniry2024}.

HARDENS aimed to address a fundamental dilemma: existing U.S. nuclear control
rooms rely on analog technologies from the 1950s--60s. This technology is
obsolete compared to modern control systems and incurs significant risk and
cost. The NRC contracted Galois, a formal methods firm, to demonstrate that
Model-Based Systems Engineering and formal methods could design, verify, and
implement a complex protection system meeting regulatory criteria at a fraction
of typical cost. The project delivered a Reactor Trip System (RTS)
implementation with full traceability from NRC Request for Proposals and IEEE
standards through formal architecture specifications to verified software.

HARDENS employed formal methods tools and techniques across the verification
hierarchy. High-level specifications used Lando, SysMLv2, and FRET (NASA Formal
Requirements Elicitation Tool) to capture stakeholder requirements, domain
engineering, certification requirements, and safety requirements. Requirements
were analyzed for consistency, completeness, and realizability using SAT and SMT
solvers. Executable formal models used Cryptol to create a behavioral model of
the entire RTS, including all subsystems, components, and limited digital twin
models of sensors, actuators, and compute infrastructure. Automatic code
synthesis generated verifiable C implementations and SystemVerilog hardware
implementations directly from Cryptol models---eliminating the traditional gap
between specification and implementation where errors commonly arise.

Despite its accomplishments, HARDENS has a fundamental limitation directly
relevant to hybrid control synthesis: the project addressed only discrete
digital control logic without modeling or verifying continuous reactor dynamics.
The Reactor Trip System specification and verification covered discrete state
transitions (trip/no-trip decisions), digital sensor input processing through
discrete logic, and discrete actuation outputs (reactor trip commands). The
project did not address continuous dynamics of nuclear reactor physics. Real
reactor safety depends on the interaction between continuous
processes---temperature, pressure, neutron flux---evolving in response to
discrete control decisions. HARDENS verified the discrete controller in
isolation but not the closed-loop hybrid system behavior.

\textbf{LIMITATION:} \textit{HARDENS addressed discrete control logic without
continuous dynamics or hybrid system verification.} Verifying discrete control
logic alone provides no guarantee that the closed-loop system exhibits desired
continuous behavior such as stability, convergence to setpoints, or maintained
safety margins.

HARDENS produced a demonstrator system at Technology Readiness Level 2--3
(analytical proof of concept with laboratory breadboard validation) rather than
a deployment-ready system validated through extended operational testing. The
NRC Final Report explicitly notes~\cite{Kiniry2024} that all material is
considered in development, not a finalized product, and that ``The demonstration
of its technical soundness was to be at a level consistent with satisfaction of
the current regulatory criteria, although with no explicit demonstration of how
regulatory requirements are met.'' The project did not include deployment in
actual nuclear facilities, testing with real reactor systems under operational
conditions, side-by-side validation with operational analog RTS systems,
systematic failure mode testing (radiation effects, electromagnetic
interference, temperature extremes), NRC licensing review, or human factors
validation with licensed operators in realistic control room scenarios.

\textbf{LIMITATION:} \textit{HARDENS achieved TRL 2--3 without experimental
validation.} While formal verification provides mathematical correctness
guarantees for the implemented discrete logic, the gap between formal
verification and actual system deployment involves myriad practical
considerations: integration with legacy systems, long-term reliability
under harsh environments, human-system interaction in realistic
operational contexts, and regulatory acceptance of formal methods as
primary assurance evidence.