From 232c39185be71e543362a1470bb58b0583745315 Mon Sep 17 00:00:00 2001 From: Dane Sabo Date: Wed, 30 Jul 2025 10:14:56 -0400 Subject: [PATCH] vault backup: 2025-07-30 10:14:56 --- .obsidian/plugins/colored-tags/data.json | 4 +- .../alshiekhSafeReinforcementLearning2018.md | 86 +++++++++ ...poulouFormalRequirementsElicitation2020.md | 49 +++++ ...nhalghEffectivenessEfficiencySearch2005.md | 61 +++++++ .../lazarusRuntimeSafetyAssurance2020.md | 167 ++++++++++++++++++ .../musauUsingRealTimeReachability2022.md | 110 ++++++++++++ .../tjengEvaluatingRobustnessNeural2019.md | 89 ++++++++++ .../woodAutonomousControlFramework2017.md | 122 +++++++++++++ .../zacchialunStateArtCyberphysical2019.md | 58 ++++++ 9 files changed, 745 insertions(+), 1 deletion(-) create mode 100644 Zettelkasten/Literature Notes/alshiekhSafeReinforcementLearning2018.md create mode 100644 Zettelkasten/Literature Notes/giannakopoulouFormalRequirementsElicitation2020.md create mode 100644 Zettelkasten/Literature Notes/greenhalghEffectivenessEfficiencySearch2005.md create mode 100644 Zettelkasten/Literature Notes/lazarusRuntimeSafetyAssurance2020.md create mode 100644 Zettelkasten/Literature Notes/musauUsingRealTimeReachability2022.md create mode 100644 Zettelkasten/Literature Notes/tjengEvaluatingRobustnessNeural2019.md create mode 100644 Zettelkasten/Literature Notes/woodAutonomousControlFramework2017.md create mode 100644 Zettelkasten/Literature Notes/zacchialunStateArtCyberphysical2019.md diff --git a/.obsidian/plugins/colored-tags/data.json b/.obsidian/plugins/colored-tags/data.json index 4531e236..c6a28b70 100755 --- a/.obsidian/plugins/colored-tags/data.json +++ b/.obsidian/plugins/colored-tags/data.json @@ -316,7 +316,9 @@ "Assurance-arguments": 304, "Cyber-physical-Systems": 305, "machine-learning": 306, - "Remote-operation": 307 + "Remote-operation": 307, + "Security": 308, + "Systematic-mapping-study": 309 }, "_version": 3 } \ No newline at end of file diff --git a/Zettelkasten/Literature Notes/alshiekhSafeReinforcementLearning2018.md b/Zettelkasten/Literature Notes/alshiekhSafeReinforcementLearning2018.md new file mode 100644 index 00000000..46b07d8e --- /dev/null +++ b/Zettelkasten/Literature Notes/alshiekhSafeReinforcementLearning2018.md @@ -0,0 +1,86 @@ +--- +authors: + + - "Alshiekh, Mohammed" + + - "Bloem, Roderick" + + - "Ehlers, Rüdiger" + + - "Könighofer, Bettina" + + - "Niekum, Scott" + + - "Topcu, Ufuk" + +citekey: "alshiekhSafeReinforcementLearning2018" +alias: "alshiekhSafeReinforcementLearning2018" +publish_date: 2018-04-29 +journal: "Proceedings of the AAAI Conference on Artificial Intelligence" +volume: 32 +issue: 1 +last_import: 2025-07-30 +--- + +# Safe Reinforcement Learning via Shielding + +## Indexing Information +Published: 2018-04 + +**DOI** +[10.1609/aaai.v32i1.11797](https://doi.org/10.1609/aaai.v32i1.11797) +#Formal-Methods + + +#InSecondPass + + + +>[!Abstract] +>Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/Safe Reinforcement Learning via Shielding-notes.md]] + +### Highlights From Zotero + + + + +>[!highlight] Highlight +> . Increasing use of learning-based controllers inphysical systems in the proximity of humans strengthens theconcern of whether these systems will operate safely. +> 2025-05-13 4:28 pm +> + +>[!tip] Brilliant +> In this paper, we introduce shielded learning, a frame-work that allows applying machine learning to control sys-tems in a way that the correctness of the system’s executionagainst a given specification is assured during the learningand controller execution phases, regardless of how fast the learning process converges. The shield monitors the actionsselected by the learning agent and corrects them if and onlyif the chosen action is unsafe. +> 2025-05-13 4:29 pm +> + +>[!done] Important +> Last but not least, the shieldingframework is compatible with mechanisms such as functionapproximation, employed by learning algorithms in order to improve their scalability. +> 2025-05-13 4:31 pm +> + + + + +>[!tip] Brilliant +> A specification that is however satisfied along all traces of the system is G(G(ra ∧ ¬rb) → FGga), which can be read as “If from some point onwards, request ra is always set to true while request proposition rb is not, then eventually, a grant is given to process a for eternity.” +> 2025-06-06 2:05 pm +> + +>[!tip] Brilliant +> A specification is called a safety specification if every trace σ that is not in the language represented by the specification has a prefix such that all words starting with the prefix are also not in the language. Intuitively, a safety specification states that “something bad should never happen”. Safety specifications can be simple invariance properties (such as “the level of a water tank should never fall below 1 liter”), but can also also be more complex (such as “whenever a valve is opened, it stays open for at least three seconds”). +> 2025-06-06 2:05 pm +> + + +### Follow-Ups + +>[!example] + >(Bloem et al. 2015) proposed the idea to synthesize ashield that is attached to a system to enforce safety properties at run time. + >- [ ] #Follow-Up + diff --git a/Zettelkasten/Literature Notes/giannakopoulouFormalRequirementsElicitation2020.md b/Zettelkasten/Literature Notes/giannakopoulouFormalRequirementsElicitation2020.md new file mode 100644 index 00000000..f8a42f9c --- /dev/null +++ b/Zettelkasten/Literature Notes/giannakopoulouFormalRequirementsElicitation2020.md @@ -0,0 +1,49 @@ +--- +authors: + + - "Giannakopoulou, Dimitra" + + - "Mavridou, Anastasia" + + - "Rhein, Julian" + + - "Pressburger, Thomas" + + - "Schumann, Johann" + + - "Shi, Nija" + +citekey: "giannakopoulouFormalRequirementsElicitation2020" +alias: "giannakopoulouFormalRequirementsElicitation2020" +publish_date: 2020-01-01 +last_import: 2025-07-30 +--- + +# Formal requirements elicitation with FRET + +## Indexing Information +Published: 2020-01 + + + + +#InSecondPass + + + +>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/Formal requirements elicitation with FRET-notes.md]] + +### Highlights From Zotero +>[!tip] Brilliant +> A fretish requirement description is automatically parsed into six sequential fields, with the fret editordynamically coloring the text corresponding to the fields as the requirement is typed in (Figure 2): scope, condition, component, shall, timing, and response. +> 2025-07-15 11:32 pm +> + + +### Follow-Ups + diff --git a/Zettelkasten/Literature Notes/greenhalghEffectivenessEfficiencySearch2005.md b/Zettelkasten/Literature Notes/greenhalghEffectivenessEfficiencySearch2005.md new file mode 100644 index 00000000..dd258ea6 --- /dev/null +++ b/Zettelkasten/Literature Notes/greenhalghEffectivenessEfficiencySearch2005.md @@ -0,0 +1,61 @@ +--- +authors: + + - "Greenhalgh, Trisha" + + - "Peacock, Richard" + +citekey: "greenhalghEffectivenessEfficiencySearch2005" +alias: "greenhalghEffectivenessEfficiencySearch2005" +publish_date: 2005-11-03 +journal: "BMJ" +volume: 331 +issue: 7524 +pages: 1064-1065 +last_import: 2025-07-30 +--- + +# Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources + +## Indexing Information +Published: 2005-11 + +**DOI** +[10.1136/bmj.38636.593461.68](https://doi.org/10.1136/bmj.38636.593461.68) + + + +#InSecondPass + + + +>[!Abstract] +>Objective To describe where papers come from in a systematic review of complex evidence. +Method Audit of how the 495 primary sources for the review were originally identified. +Results Only 30% of sources were obtained from the protocol defined at the outset of the study (that is, from the database and hand searches). Fifty one per cent were identified by “snowballing” (such as pursuing references of references), and 24% by personal knowledge or personal contacts. +Conclusion Systematic reviews of complex evidence cannot rely solely on protocol-driven search strategies.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/Effectiveness and efficiency of search methods in systematic reviews of complex evidence- audit of primary sources-notes.md]] + +### Highlights From Zotero +>[!tip] Brilliant +> Results Only 30% of sources were obtained from theprotocol defined at the outset of the study (that is,from the database and hand searches). Fifty one per cent were identified by “snowballing” (such as pursuing references of references), and 24% bypersonal knowledge or personal contacts. +> 2025-05-11 9:33 am +> + +>[!done] Important +> Citation tracking: using special citation trackingdatabases (Science Citation Index, Social Science Cita-tion Index, and Arts and Humanities Citation Index),we forward tracked selected key papers publishedmore than three years previously, thereby identifying articles in mainstream journals that had subsequentlycited those papers +> 2025-05-11 9:39 am +> + +>[!done] Important +> Overall, the greatest yield was from pursuing selected references of references +> 2025-05-11 9:40 am +> + + +### Follow-Ups + diff --git a/Zettelkasten/Literature Notes/lazarusRuntimeSafetyAssurance2020.md b/Zettelkasten/Literature Notes/lazarusRuntimeSafetyAssurance2020.md new file mode 100644 index 00000000..a586676a --- /dev/null +++ b/Zettelkasten/Literature Notes/lazarusRuntimeSafetyAssurance2020.md @@ -0,0 +1,167 @@ +--- +authors: + + - "Lazarus, Christopher" + + - "Lopez, James G." + + - "Kochenderfer, Mykel J." + +citekey: "lazarusRuntimeSafetyAssurance2020" +alias: "lazarusRuntimeSafetyAssurance2020" +publish_date: 2020-10-01 +pages: 1-9 +last_import: 2025-07-30 +--- + +# Runtime Safety Assurance Using Reinforcement Learning + +## Indexing Information +Published: 2020-10 + +**DOI** +[10.1109/DASC50938.2020.9256446](https://doi.org/10.1109/DASC50938.2020.9256446) +#Control-systems, #Reinforcement-learning, #reinforcement-learning, #Safety, #Switches, #Aerospace-control, #Aircraft, #Atmospheric-modeling, #runtime-safety-assurance, #Unmanned-Aerial-Systems-UAS + + +#InSecondPass + + + +>[!Abstract] +>The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/Runtime Safety Assurance Using Reinforcement Learning-notes.md]] + +### Highlights From Zotero +>[!tip] Brilliant +> We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a metacontroller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. +> 2025-07-08 9:37 am +> + +>[!tip] Brilliant +> Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. +> 2025-07-08 9:37 am +> + +>[!highlight] Highlight +> Unfortunately, the cost to formally verify a nonpedigreed or black-box autopilot for a variety of vehicle types and use cases is generally prohibitive. +> 2025-07-08 9:44 am +> + +>[!highlight] Highlight +> In order for this mechanism to work, the system needs to be able to distinguish between safe scenarios under which the operation should remain controlled by πn and scenarios that would likely lead to unsafe conditions in which the control should be switched to πr. We assume that a recovery controller πr is given and this work does not focus on its design or implementation. +> 2025-07-08 9:46 am +> + +>[!highlight] Highlight +> The problem that we address in this work is determining how to decide when to switch from the nominal controller πn to the recovery controller πr while balancing the trade-off between safety and efficiency. +> 2025-07-08 9:46 am +> + +>[!warning] Dubious +> We postulate that the task of navigating an aircraft from an origin to a destination by following a pre-planned path is far more complex than the task of predicting whether the aircraft is operating safely within a short time horizon of a given point. +> 2025-07-08 9:47 am +> + +>[!done] Important +> The goal of RTSA systems is to guarantee the safe operation of a system despite having black-box components as a part of its controller. Safe operation is specified by an envelope E ⊂ S which corresponds to a subset of the state space S within which the system is expected to operate ideally. RTSA continuously monitors the state of the system and switches to the recovery controller if and only if when not doing so, would lead the system to exit the safety envelope. +> 2025-07-08 9:50 am +> + +>[!warning] Dubious +> It must switch to the recovery control πr whenever the aircraft leaves the envelope. +> 2025-07-08 9:51 am +> +> *I mean really there is an envelope E_r \subset E \subset S that is the region within the safety envelope that is recoverable... no?* + + +>[!done] Important +> • Its implementation must be easily verifiable. This means it must avoid black box models that are hard to verify such as deep neural networks (DNNs)[3]. +> 2025-07-08 9:53 am +> + +>[!done] Important +> We model the evolution of the flight of an aircraft equipped with an RTSA system by defining the following MDP: M = (S, A, T, R) where the elements are defined below. • State space S ∈ Rp: a vector representing the state of the environment and the vehicle. • Action space A ∈ {deploy, continue}: whether to deploy the recovery system or let the nominal controller remain in control. • Transition T (s, a): a function that specifies the transition probability of the next state s′ given that action a was taken at step s, in our case this will be sampled from a simulator by querying f (s, a). • Reward R(s, a, s′), the reward collected corresponding to the transition. This will be designed to induce the desired behavior. +> 2025-07-08 9:56 am +> + +>[!done] Important +> In this model, the RTSA system is considered the agent while the states correspond to the position, velocity and other relevant information about the aircraft with respect to the envelope and the actions correspond to deploying the recovery controller or not. The agent receives a large negative reward for abandoning the envelope and a smaller negative reward for deploying the recovery controller in situations where it was not required. This reward structure is designed to heavily penalize situations in which the aircraft exits the safety envelope and simultaneously disincentivize unnecessary deployments of the recovery controller. The rewards at each step are weighted by a discount factor γ < 1 such that present rewards are worth more than future ones. +> 2025-07-08 9:59 am +> + +>[!done] Important +> Additionally, we consider the nominal controller to be a black box. All of these conditions combined lead us to operate under the assumption that the transition function is unknown and we do not have access to it. We do, however, have access to simulators from which we can query experience tuples (s, a, r, s′) by providing a state s and an action a and fetching the next state s′ and associated reward r. In this setting we can learn a policy from experience with RL. +> 2025-07-08 10:00 am +> + +>[!fail] This ain't right +> RL has successfully been applied to many fields [8]–[10]. +> 2025-07-08 10:00 am +> +> *Citation stash lmao* + + +>[!done] Important +> Q-learning as described above enables the estimation of tabular Q-functions which are useful for small discrete problems. However, cyber-physical systems often operate in contexts which are better described with a continuous state space. The problem with applying tabular Q-learning to these larger state spaces is not only that they would require a large state-action table but also that a vast amount of experience would be required to accurately estimate the values. An alternative approach to handle continuous state spaces is to use Q-function approximation where the state-action value function is approximated which enables the agent to generalize from limited experience to states that have not been visited before and additionally avoids storing a gigantic table. Policies based on value function approximation have been successfully demonstrated in the aerospace domain before [12]. +> 2025-07-08 11:14 am +> + +>[!warning] Dubious +> Instead, we will restrict our attention to linear value function approximation, which involves defining a set of features Φ(s, a) ∈ Rm that captures relevant information about the state and then use these features to estimate Q by linearly combining them with weights θ ∈ Rm×|A| to estimate the value of each action. In this context, the value function is represented as follows: Q(s, a) = m ∑ i=1 θiφi(s, a) = θT Φ(s, a) (9) Our learning problem is therefore reduced to estimating the parameters θ and selecting our features. Typically, domain knowledge will be leveraged to craft meaningful features Φ(s, a) = (φ1(s, a), φ2(s, a), ..., φm(s, a)), and ideally they would capture some of the geometric information relevant for the problem, e.g. in our setting, heading, velocity and distance to the geofence. The ideas behind the Q-learning algorithm can be extended to the linear value function approximation setting. Here, we initialize our parameters θ and update them at each transition to reduce the error between the predicted value and the observed reward. The algorithm is outlined below and it forms the basis of the learning procedure used in the experiments in Section III. +> 2025-07-08 11:18 am +> +> *This sounds like a constantly updating principal component analysis with some best fitting going on. Interesting, but if the reward is nonlinear (which seems likely?) isn't this problematic?* + + +>[!highlight] Highlight +> We restrict our function family to linear functions which can be easily understood and verified. A major drawback, however, is that linear functions are less expressive than DNNs, which makes their training more difficult and requires careful crafting of features. +> 2025-07-08 11:25 am +> + +>[!done] Important +> Despite the relative simplicity of linear value function approximators when compared to DNNs, we observed that they are able to capture relevant information about the environment and are well suited for this task. +> 2025-07-08 11:29 am +> + +>[!highlight] Highlight +> One approach to address this problem is to avoid or reduce the chance of random exploration. We dramatically increase the likelihood of observing episodes where the mission is completed successfully without exiting the envelope, but we also bias the learning process towards exploitation. It is well known that in order for a policy to converge under Q-learning, exploration must proceed indefinitely [15]. Additionally, in the limit of the the number of steps, the learning policy has to be greedy with respect to the Q-function [15]. Accordingly, avoiding or dramatically reducing random exploration can negatively affect the learning process and should be avoided. +> 2025-07-08 11:33 am +> + +>[!highlight] Highlight +> Instead of randomly initializing the parameters of the Qfunction approximation and then manually biasing the weights to decrease the chance of randomly deploying the recovery controller, we can use a baseline policy to generate episodes in which the RTSA system exhibited a somewhat acceptable performance. From these episodes, we learn the parameters in an offline approach known as batch reinforcement learning. It is only after we learn a good initialization of our parameters that we then start the training process of our policy πRT SA. For this purpose and to have a benchmark to compare our approach to, we define a baseline policy that consists of shrinking the safety envelope by specifying a distance threshold δ > 0. When the vehicle reaches a state that is less than δ distance away from exiting the envelope, the recovery controller is deployed. This naive approach serves both as a baseline for our experiments and also provides us with experience to initialize the weights of our policy before we do on-policy learning. +> 2025-07-08 11:36 am +> + +>[!highlight] Highlight +> We used configuration composed of a hexarotor simulator that models lift rotor aircraft features and includes a three dimensional Bezier curve trajectory definition module, nonlinear multi-rotor dynamics, input/output linearization, nested saturation, a cascade PID flight control model, and an extended Kalman filter estimation model. An illustration of the simulator environment in three dimensions is included in Figure 2 [17]. +> 2025-07-08 11:46 am +> +> *Notably, not really anything special with the controller setup. Shouldn't we be able to do math using the PID controller and set bounds? or perhaps it's a nonlinear issue? Probably the second thing.* + + +>[!fail] This ain't right +> deploying a parachute which was modeled using a simplified model that introduces a drag coefficient that only affects the z coordinates in the simulation. +> 2025-07-08 11:47 am +> + +>[!fail] This ain't right +> The state space in our simulation is comprised of more than 250 variables. Some correspond to simulation parameters 0246 0 1 2 3 −0−.040..200.02.4 0 1 2 3 path waypoints trajectory 0246 −0.4 −0.2 0.0 0.2 0.4 Fig. 3. Example of environment configuration and episode data with wind. such as sampling time, simulation time and physical constants. Another set of variables represent physical magnitudes such as velocity scaling factors, the mass of components of the hexarotor, distribution of the physical components of the hexarotor, moments of inertia and drag coefficients. Other variables represent maximum and minimum roll, pitch and yaw rates, rotor speed and thrust. The sensor readings, their biases, frequencies and other characteristics are also represented by other variables. Other variables represent the state of the controller and the actions it prescribes for the different actuators in the vehicle. And a few of them correspond to the position and velocity of the hexarotor during the simulation. Figure 4 shows the evolution of the position and velocity variables for a simulation episode corresponding to the example environment configuration. All of these variables are needed to completely specify the state of the world in the simulation and illustrate the high dimensional requirements of a somewhat accurate representation of flight dynamics in a simulator. In principle, all these variables would be needed to specify a policy for our RTSA system, but as discussed in Section II we can rely on value function approximation by crafting a set of informative features which significantly reduces the dimensionality of our problem. +> 2025-07-08 11:50 am +> +> *This has to be wrong. There's no way they need 250 states when many of these phenomena must be coupled. Drag coefficients for example are completely dependent on velocity and constants. That is a redundant state, less for the issue of it changing for the recovery controller. I'd still say that's a hybrid system thing.* + + +>[!highlight] Highlight +> In this work we restricted our attention to terminal recovery controllers. +> 2025-07-08 9:43 am +> + + +### Follow-Ups + diff --git a/Zettelkasten/Literature Notes/musauUsingRealTimeReachability2022.md b/Zettelkasten/Literature Notes/musauUsingRealTimeReachability2022.md new file mode 100644 index 00000000..2fd43077 --- /dev/null +++ b/Zettelkasten/Literature Notes/musauUsingRealTimeReachability2022.md @@ -0,0 +1,110 @@ +--- +authors: + + - "Musau, Patrick" + + - "Hamilton, Nathaniel" + + - "Lopez, Diego Manzanas" + + - "Robinette, Preston" + + - "Johnson, Taylor T." + +citekey: "musauUsingRealTimeReachability2022" +alias: "musauUsingRealTimeReachability2022" +publish_date: 2022-03-01 +pages: 1-10 +last_import: 2025-07-30 +--- + +# On Using Real-Time Reachability for the Safety Assurance of Machine Learning Controllers + +## Indexing Information +Published: 2022-03 + +**DOI** +[10.1109/ICAA52185.2022.00010](https://doi.org/10.1109/ICAA52185.2022.00010) +#formal-verification, #Real-time-systems, #Reinforcement-learning, #Safety, #deep-reinforcement-learning, #hybrid-automaton, #imitation-learning, #reachability-analysis, #Runtime, #Sensors, #Switches, #Vehicle-dynamics + + +#InSecondPass + + + +>[!Abstract] +>Over the last decade, advances in machine learning and sensing technology have paved the way for the belief that safe, accessible, and convenient autonomous vehicles may be realized in the near future. Despite the prolific competencies of machine learning models for learning the nuances of sensing, actuation, and control, they are notoriously difficult to assure. The challenge here is that some models, such as neural networks, are “black box” in nature, making verification and validation difficult, and sometimes infeasible. Moreover, these models are often tasked with operating in uncertain and dynamic environments where design time assurance may only be partially transferable. Thus, it is critical to monitor these components at runtime. One approach for providing runtime assurance of systems with unverified components is the simplex architecture, where an unverified component is wrapped with a safety controller and a switching logic designed to prevent dangerous behavior. In this paper, we propose the use of a real-time reachability algorithm for the implementation of such an architecture for the safety assurance of a 1/10 scale open source autonomous vehicle platform known as F1/10. The reachability algorithm (a) provides provable guarantees of safety, and (b) is used to detect potentially unsafe scenarios. In our approach, the need to analyze the underlying controller is abstracted away, instead focusing on the effects of the controller’s decisions on the system’s future states. We demonstrate the efficacy of our architecture through experiments conducted both in simulation and on an embedded hardware platform.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/On Using Real-Time Reachability for the Safety Assurance of Machine Learning Controllers-notes.md]] + +### Highlights From Zotero +>[!done] Important +> One approach for providing runtime assurance of systems with unveried components is the simplex architecture, where an unveried component is wrapped with a safety controller and a switching logic designed to prevent dangerous behavior. In this paper, we propose the use of a real-time reachability algorithm for the implementation of such an architecture for the safety assurance of a 1/10 scale open source autonomous vehicle platform known as F1/10. The reachability algorithm (a) provides provable guarantees of safety, and (b) is used to detect potentially unsafe scenarios. In our approach, the need to analyze the underlying controller is abstracted away, instead focusing on the effects of the controller’s decisions on the system’s future states. We demonstrate the efcacy of our architecture through experiments conducted both in simulation and on an embedded hardware platform. +> 2025-07-03 12:31 pm +> + +>[!done] Important +> In summary, the contributions of this paper are: (1) We modify the real-time reachability algorithm presented in [8] to handle static obstacles in a sound and real-time manner. (2) We implement a simplex control architecture that uses real-time reachability for online collision avoidance. (3) We show our method working with multiple machine learning controllers. (4) We demonstrate success using our method to safely navigate through obstacles the trained controllers have no prior experience with. (5) We evaluate the safety of machine learning components transferred to real-world hardware without additional training. +> 2025-07-03 12:40 pm +> + +>[!warning] Dubious +> Using MATLAB’s Grey-Box SystemIdentification toolbox, we obtained the following parametersfor the simulation model: ca = 19569, cm = 00342,ch = −371967, lf = 0225, lr = 0225. +> 2025-07-03 11:00 pm +> + +>[!done] Important +> Despite the growing success of DRL approaches in many contexts, these methods are mainly leveraged within simulation due to challenges with ensuring safe training in real-world systems, designing rewardfunctions that deal with noisy and uncertain state information, and ensuring trained controllers are able to generalize beyond fixed scenarios [21]. While training a controller in simulation and moving it into the real world is possible, a process known as sim2real transfer, it often results in undesired, poor, and/or dangerous behavior [21]. +> 2025-07-04 8:47 am +> + +>[!warning] Dubious +> ARS focuses on the use of a linear policy, i.e. a weight matrix. Instead of passing the inputthrough multiple layers with non-linear activation functions, the input is multiplied by a single weight matrix to generatethe control output. +> 2025-07-04 8:51 am +> +> *Um is this just state space?* + + +>[!highlight] Highlight +> The real-time reachability algorithm always returns an overapproximation of the reachable set of states. The over-approximation error decreases with successive iterations, provided that there is enough runtimefor re-computations. +> 2025-07-04 8:54 am +> + +>[!highlight] Highlight +> Given a fixed runtime, Truntime, we compute the reachable set R[0,Treach]. If there is remaining runtime, we restart the reachability computation with a smaller step size. Inboth this work and [8], the step-size is halved in each successive iteration, leading to more accurate determinations of the reachable set. +> 2025-07-04 8:56 am +> + +>[!highlight] Highlight +> Ideally, to ensure there were no missed deadlines, we would build our system in a Real-Time Operating System (RTOS), which allows for the specification of task priorities, executing them within established time frames. However, ourimplementation does not make use of an RTOS, and insteaddepends on native Linux and ROS to handle task management. +To combat this shortcoming and reduce the number of misseddeadlines, we estimate how the time required to compute the next reachability loop. If our estimate exceeds the remaining allotted time, the process terminates. There is an inherenttradeoff between the conservativeness of our runtime estimates and the conservativeness of the resulting reachable set. In this work, we chose to maximize the number of iterations usedin constructing the reachable set at the risk of occasionally missing deadlines. Our experiments demonstrate that we weresuccessful in minimizing the number of missed deadlinesduring operation. +> 2025-07-04 9:02 am +> + +>[!highlight] Highlight +> In our design, we decouple thecontrol of the car’s steering and throttle control. The steering control, δ, is governed by the ML controller, and the throttle control is designed to maintain a constant speed, u, when the learning-based controller is in use. +> 2025-07-04 9:06 am +> + +>[!highlight] Highlight +> In the traditional simplex architecture, both the decisionmodule and the safety controller must be verified for the system to be verified as correct [8]. While this is straightforward for relatively simple controllers, it is significantly more challenging for many classes of controllers, especially whenreal-time execution is considered [18]. However, the mainfocus of this work is evaluating the use of the reachability algorithm as a switching logic for the simplex architecture. +Thus, we opted not to develop a “formally verified” safetycontroller. Instead, we selected a controller based on a gapfollowing algorithm optimized to avoid collisions with obstacles. A detailed description of the gap-following algorithm can be found in the following report [25]. It was primarily selecteddue to its robust collision avoidance ability and simplicity. +> 2025-07-04 9:07 am +> + +>[!highlight] Highlight +> While the reachability algorithm presented in this work possesses provable guarantees, our architecture does not. Obtaining these guarantees requires developing a formally verified safety controller and switching logic, which was outside thescope of the work presented herein. Therefore, it is possible toenter a state in our framework in which all future trajectories will result in a collision. These states are known as inevitable collision states and have been well studied within the motionplanning literature [27]. In future work, we hope to address this limitation by leveraging approaches such as viability kernels, and dynamic safety envelopes that allow for the synthesis ofprovable safe control regimes [28]. +> 2025-07-04 9:19 am +> + + + +### Follow-Ups + +>[!example] + >[11] D. Seto, E. Ferriera, and T. Marz, “Case study: Development of a baseline controller for automatic landing of an f-16 aircraft using linear matrix inequalities (lmis),” Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU/SEI-99-TR-020, 2000. + >- [ ] #Follow-Up + diff --git a/Zettelkasten/Literature Notes/tjengEvaluatingRobustnessNeural2019.md b/Zettelkasten/Literature Notes/tjengEvaluatingRobustnessNeural2019.md new file mode 100644 index 00000000..cdaaf478 --- /dev/null +++ b/Zettelkasten/Literature Notes/tjengEvaluatingRobustnessNeural2019.md @@ -0,0 +1,89 @@ +--- +authors: + + - "Tjeng, Vincent" + + - "Xiao, Kai" + + - "Tedrake, Russ" + +citekey: "tjengEvaluatingRobustnessNeural2019" +alias: "tjengEvaluatingRobustnessNeural2019" +publish_date: 2019-02-18 +last_import: 2025-07-30 +--- + +# Evaluating Robustness of Neural Networks with Mixed Integer Programming + +## Indexing Information +Published: 2019-02 + +**DOI** +[10.48550/arXiv.1711.07356](https://doi.org/10.48550/arXiv.1711.07356) +#Computer-Science---Cryptography-and-Security, #Computer-Science---Machine-Learning, #Computer-Science---Computer-Vision-and-Pattern-Recognition + + +#InSecondPass + + + +>[!Abstract] +>Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, networks trained only to optimize for training accuracy can often be fooled by adversarial examples - slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. On a representative task of finding minimum adversarial distortions, our verifier is two to three orders of magnitude quicker than the state-of-the-art. We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup allows us to verify properties on convolutional networks with an order of magnitude more ReLUs than networks previously verified by any complete verifier. In particular, we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded $l_\infty$ norm $\epsilon=0.1$: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness (to perturbations with bounded norm) for the remainder. Across all robust training procedures and network architectures considered, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack.>[!note] Markdown Notes +>Comment: Accepted as a conference paper at ICLR 2019}>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/Evaluating Robustness of Neural Networks with Mixed Integer Programming-notes.md]] + +### Highlights From Zotero +>[!tip] Brilliant +> In particular, we determine for the first time the exact adversarial accuracy ofan MNIST classifier to perturbations with bounded l∞ norm = 0.1: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness to norm-bounded perturbations for the remainder. Across all robusttraining procedures and network architectures considered, and for both the MNISTand CIFAR-10 datasets, we are able to certify more samples than the state-of-the-artand find more adversarial examples than a strong first-order attack. +> 2025-07-09 7:56 am +> + +>[!tip] Brilliant +> Second, since the predicted label is determined bythe unit in the final layer with the maximum activation, proving that a unit never has themaximum activation over all bounded perturbations eliminates it from consideration. Weexploit both phenomena, reducing the overall number of non-linearities considered. +> 2025-07-09 9:09 am +> +> *Can this he used to say that safely controller has no false positives for a region??* + + +>[!highlight] Highlight +> Verification as solving an MILP. The general problem of verification is to determine whether someproperty P on the output of a neural network holds for all input in a bounded input domain C ⊆ Rm.For the verification problem to be expressible as solving an MILP, P must be expressible as theconjunction or disjunction of linear properties Pi,j over some set of polyhedra Ci, where C = ∪Ci. +> 2025-07-09 9:16 am +> + +>[!highlight] Highlight +> Let G(x) denote the region in the input domain corresponding to all allowable perturbations of a particular input x. +> 2025-07-14 9:01 pm +> + +>[!highlight] Highlight +> As in Madry et al. (2018), we say that a neural network is robust toperturbations on x if the predicted probability of the true label λ(x) exceeds that of every other labelfor all perturbations: +> 2025-07-09 9:19 am +> + +>[!highlight] Highlight +> As long as G(x) ∩ Xvalid can be expressed as the union of a set of polyhedra, the feasibility problem can be expressed as an MILP. +> 2025-07-14 9:01 pm +> + +>[!highlight] Highlight +> Let d(·, ·) denote a distance metric that measures the perceptual similarity between two input images +> 2025-07-14 9:01 pm +> + +>[!tip] Brilliant +> Determining tight bounds is critical for problem tractability: tight bounds strengthen the problem formulation and thus improve solve times (Vielma, 2015). For instance, if we can prove that the phase of a ReLU is stable, we can avoid introducing a binary variable. More generally, loose bounds on input to some unit will propagate downstream, leading to units in later layers having looser bounds. +> 2025-07-14 9:23 pm +> + +>[!tip] Brilliant +> The key observation is that, for piecewise-linear non-linearities, there are thresholds beyond which further refining a bound will not improve the problem formulation. With this in mind, we adopt a progressive bounds tightening approach: we begin by determining coarse bounds using fast procedures and only spend time refining bounds using procedures with higher computational complexity if doing so could provide additional information to improve the problem formulation.4 +> 2025-07-14 9:28 pm +> + + +### Follow-Ups + diff --git a/Zettelkasten/Literature Notes/woodAutonomousControlFramework2017.md b/Zettelkasten/Literature Notes/woodAutonomousControlFramework2017.md new file mode 100644 index 00000000..66c0caa3 --- /dev/null +++ b/Zettelkasten/Literature Notes/woodAutonomousControlFramework2017.md @@ -0,0 +1,122 @@ +--- +authors: + + - "Wood, Richard T." + + - "Upadhyaya, Belle R." + + - "Floyd, Dan C." + +citekey: "woodAutonomousControlFramework2017" +alias: "woodAutonomousControlFramework2017" +publish_date: 2017-08-01 +journal: "Nuclear Engineering and Technology" +volume: 49 +issue: 5 +pages: 896-904 +last_import: 2025-07-30 +--- + +# An autonomous control framework for advanced reactors + +## Indexing Information +Published: 2017-08 + +**DOI** +[10.1016/j.net.2017.07.001](https://doi.org/10.1016/j.net.2017.07.001) +#Autonomous-Control, #Instrumentation-and-Control-System, #Small-Modular-Reactor + + +#InSecondPass + + + +>[!Abstract] +>Several Generation IV nuclear reactor concepts have goals for optimizing investment recovery through phased introduction of multiple units on a common site with shared facilities and/or reconfigurable energy conversion systems. Additionally, small modular reactors are suitable for remote deployment to support highly localized microgrids in isolated, underdeveloped regions. The long-term economic viability of these advanced reactor plants depends on significant reductions in plant operations and maintenance costs. To accomplish these goals, intelligent control and diagnostic capabilities are needed to provide nearly autonomous operations with anticipatory maintenance. A nearly autonomous control system should enable automatic operation of a nuclear power plant while adapting to equipment faults and other upsets. It needs to have many intelligent capabilities, such as diagnosis, simulation, analysis, planning, reconfigurability, self-validation, and decision. These capabilities have been the subject of research for many years, but an autonomous control system for nuclear power generation remains as-yet an unrealized goal. This article describes a functional framework for intelligent, autonomous control that can facilitate the integration of control, diagnostic, and decision-making capabilities to satisfy the operational and performance goals of power plants based on multimodular advanced reactors.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/An autonomous control framework for advanced reactors-notes.md]] + +### Highlights From Zotero +>[!tip] Brilliant +> The long-term economic viability of these advanced reactor plants depends on significant reductions in plant operations and maintenance costs. To accomplish these goals, intelligent control and diagnostic capabilities are needed to provide nearly autonomous operations with anticipatory maintenance. +> 2025-07-06 12:44 pm +> + +>[!done] Important +> An SMR is generally characterized by: (1) an electrical generating capacity of less than 300 MWe (megawatt electric), (2) a primary system that is entirely or substantially fabricated within a factory, and (3) a primary system that can be transported by truck or rail to the plant site. +> 2025-07-06 12:47 pm +> + +>[!tip] Brilliant +> he current US nuclear industry average for O&M staff is roughly one person per every 2 megawatts of generated power. +> 2025-07-06 12:48 pm +> + +>[!done] Important +> Current automated control technologies for nuclear power plants are reasonably mature, and highly automated control for an SMR is clearly feasible under optimum circumstances. Autonomous control is primarily intended to account for the nonoptimum circumstances when degradation, failure, and other off-normal events challenge the performance of the reactor, and the capability for immediate human intervention is constrained. There are clear gaps in the development and demonstration of autonomous control capabilities for the specific domain of nuclear power operations. +> 2025-07-06 1:04 pm +> + + +>[!done] Important +> Autonomy extends the scope of primary control functions. Such capabilities can consist of automated control during all operating modes, process performance optimization (e.g., self-tuning), continuous monitoring, and diagnosis of performance indicators as well as trends for operational and safety-related parameters, diagnosis of component health, flexible control to address both anticipated and unanticipated events and to provide protection of life-limited components (such as batteries and actuators), adaptation to changing or degrading conditions, and validation and maintenance of control system performance. Key characteristics of autonomy include intelligence, robustness, optimization, flexibility, and adaptability. Intelligence facilitates minimal or no reliance on human intervention and can accommodate an integrated, whole system approach to control. It implies embedded decision-making and management/planning authority. Intelligence in control provides for anticipatory action based on system knowledge and event prediction. +> 2025-07-06 6:03 pm +> + +>[!highlight] Highlight +> As a minimum requirement of autonomy, the SMR plant control system must be able to switch between normal operational modes automatically (i.e., automatic control). Additionally, reactor protective action must be available if the desired operational conditions cannot be achieved. +> 2025-07-06 6:07 pm +> + +>[!done] Important +> Unlike conventional reactor operational concepts, in which the primary defense against potentially adverse conditions resulting from off-normal events is to scram the reactor, the objective of autonomous control is to limit the progression of off-normal events and minimize the need for shutdown. This is especially true in situations where the nuclear power plant is the stabilizing generation source on a small electric grid. +> 2025-07-06 6:10 pm +> + +>[!tip] Brilliant +> To illustrate the autonomous functionality that can be provided for the SMR plant control system, two fault management scenarios are considered in which detection and response are described. The first scenario relates to fault adaptation in the case of sensor failure. The indicators from surveillance and diagnostic functions that the plant control system can employ include divergence of redundant measurements, conflict between predicted (based on analytical or relational estimation) and measured values, and detection and isolation of a confirmed fault. The prospective response can include substitution of a redundant measurement or utilization of a diverse measurement. An example of the latter would be using neutron flux instead of temperature (i.e., core thermal power) as a power measurement. Switching to an alternate control algorithm may prove necessary for faulted or suspect measurements. The second scenario relates to fault avoidance in the case of a degrading actuator. The indicators of an incipient failure can be prediction of actuator failure based on prognostic modeling (e.g., fault forecasting) or detection of sluggish response to commands. The prospective response can be to switch to an alternate control strategy to avoid incipient failure by reducing stress on the suspect component. An example would be utilizing manipulation of core heat removal (e.g., coolant density change) instead of direct reactivity insertion (e.g., control element movement) to control reactor power. +> 2025-07-06 6:15 pm +> + +>[!done] Important +> Although having a highly reliable plant control system is important, that fact is of limited value if the control system cannot accommodate plant degradation without immediate human intervention or scram. In such a case, the result is a highly reliable control system that becomes ineffective because the plant has changed. +> 2025-07-06 6:16 pm +> + +>[!highlight] Highlight +> A three-level hierarchy is typical for robotic applications [8,30,31]. The three layers in top-to-bottom hierarchical order are the planner layer, the executive layer, and the functional layer. The general concept of the hierarchy is that commands are issued by higher levels to lower levels, and response data flows from lower levels to higher levels in the multi-tiered framework. Intelligence increases with increasing level within the hierarchy. Each of the three interacting tiers has a principal role. Basically, the functional layer provides direct control, the executive layer provides sequencing of action, and the planner layer provides deliberative planning. +> 2025-07-06 6:19 pm +> +> *Kinda mimmics the Purdue model? Application layer, scada layer, then what would be enterprise layer?* + + + +>[!tip] Brilliant +> Key characteristics that are feasible through autonomous control include Intelligence to confirm system performance and detect degraded or failed conditions Optimization to minimize stress on SMR components and efficiently react to operational events without compromising system integrity Robustness to accommodate uncertainties and changing conditions Flexibility and adaptability to accommodate failures through reconfiguration among available control system elements or adjustment of control system strategies, algorithms, or parameters +> 2025-07-06 12:51 pm +> + +>[!done] Important +> Given anticipated operational imperatives to utilize technology with demonstrated (or at least high probability) readiness, it is not practical to strive for the high-end extreme of autonomy in first-generation SMRs. Instead, modest advancement beyond fully automatic control to allow extended fault tolerance for anticipated events or degraded conditions and some predefined reconfigurability is the most realistic goal for an initial application of SMR plant autonomous control. +> 2025-07-06 1:00 pm +> + +>[!tip] Brilliant +> The primary technical gap relates to decision capabilities (e.g., strategic, interpretive, adaptive, predictive). Technology development and demonstration activities are needed to provide the desired technical readiness for implementation of an SMR autonomous control system. In particular, the capabilities to monitor, trend, detect, diagnose, decide, and self-adjust must be established within an integrated functional architecture to enable control system autonomy. +> 2025-07-06 1:00 pm +> + + +### Follow-Ups + +>[!example] + >One of the most fully digital plants currently in operation in the United States is the Oconee Nuclear Station [14]. The three units at Oconee have digital reactor protection systems and a digital integrated control system (ICS). The digital ICS coordinates the main control actions of multiple control loops through an integrated master controller that establishes feedforward control demands based on desired overall core thermal power. The ICS also has provisions for supplementary support actions among control loops to facilitate optimized performance. + >- [ ] #Follow-Up + +>[!example] + >There is an architectural approach for nearly autonomous control systems that has been developed through simulated nuclear power applications (see Fig. 2). As part of research into advanced multimodular nuclear reactor concepts, such as the ALMR, the International Reactor Innovative and Secure (IRIS), and representative advanced SMR concepts, a supervisory control system architecture was devised [24e26]. This approach provides a framework for autonomous control while supporting a high-level interface with operations staff, who can act as plant supervisors. The final authority for decisions and goal setting remains with the human, but the control system assumes expanded responsibilities for normal control action, abnormal event response, and system fault tolerance. + >- [ ] #Follow-Up + diff --git a/Zettelkasten/Literature Notes/zacchialunStateArtCyberphysical2019.md b/Zettelkasten/Literature Notes/zacchialunStateArtCyberphysical2019.md new file mode 100644 index 00000000..5dfbfd5b --- /dev/null +++ b/Zettelkasten/Literature Notes/zacchialunStateArtCyberphysical2019.md @@ -0,0 +1,58 @@ +--- +authors: + + - "Zacchia Lun, Yuriy" + + - "D’Innocenzo, Alessandro" + + - "Smarra, Francesco" + + - "Malavolta, Ivano" + + - "Di Benedetto, Maria Domenica" + +citekey: "zacchialunStateArtCyberphysical2019" +alias: "zacchialunStateArtCyberphysical2019" +publish_date: 2019-03-01 +journal: "Journal of Systems and Software" +volume: 149 +pages: 174-216 +last_import: 2025-07-30 +--- + +# State of the art of cyber-physical systems security: An automatic control perspective + +## Indexing Information +Published: 2019-03 + +**DOI** +[10.1016/j.jss.2018.12.006](https://doi.org/10.1016/j.jss.2018.12.006) +#Cyber-physical-systems, #Security, #Systematic-mapping-study + + +#InSecondPass + + + +>[!Abstract] +>Cyber-physical systems are integrations of computation, networking, and physical processes. Due to the tight cyber-physical coupling and to the potentially disrupting consequences of failures, security here is one of the primary concerns. Our systematic mapping study sheds light on how security is actually addressed when dealing with cyber-physical systems from an automatic control perspective. The provided map of 138 selected studies is defined empirically and is based on, for instance, application fields, various system components, related algorithms and models, attacks characteristics and defense strategies. It presents a powerful comparison framework for existing and future research on this hot topic, important for both industry and academia.>[!seealso] Related Papers +> + +## Annotations +### Notes +![[Zettelkasten/Literature Notes/Notes on Papers/State of the art of cyber-physical systems security- An automatic control perspective-notes.md]] + +### Highlights From Zotero +>[!highlight] Highlight +> Characteristics and focus: the bulk of the works on CPS security is focused on power grids, while somehow surprisingly, we have not found any work on the cyber-physical security of medical CPS, and only a small part of selected papers is within the application field of secure control of (unmanned) ground vehicles and aerial systems, and of heating, ventilation, and air-conditioning in large functional buildings. All the works considered in this mapping study deal with attacks, in order to either implement or to counteract them: putting together all this studies gives us the possibility to categorize the existing (cyber-physical) attack models. +> 2025-05-01 10:55 am +> + +>[!tip] Brilliant +> Even if the repeatability process, capturing how a third party may reproduce the validation results of the method or technique, is recognized as a good scientific practice, we found no studies providing a replication package. +> 2025-05-01 10:56 am +> + + +### Follow-Ups +