vault backup: 2025-07-21 10:01:20

2025-07-21 10:01:20 -04:00 · 2025-07-21 10:01:20 -04:00 · f058441755
commit f058441755
parent 747280637c
10 changed files with 488 additions and 7 deletions
--- a/.obsidian/plugins/colored-tags/data.json
+++ b/.obsidian/plugins/colored-tags/data.json
@ -281,7 +281,26 @@
    "Vehicle-dynamics": 269,
    "Autonomous-Control": 270,
    "Instrumentation-and-Control-System": 271,
-    "Small-Modular-Reactor": 272
+    "Small-Modular-Reactor": 272,
+    "Safety": 273,
+    "Aerospace-control": 274,
+    "Aircraft": 275,
+    "Atmospheric-modeling": 276,
+    "runtime-safety-assurance": 277,
+    "Unmanned-Aerial-Systems-UAS": 278,
+    "Autonomous-vehicles": 279,
+    "Autonomous-vehicle": 280,
+    "Bayesian-optimization": 281,
+    "Decision-making": 282,
+    "drivers": 283,
+    "Hidden-Markov-models": 284,
+    "lane-change-decision-making": 285,
+    "support-vector-machine": 286,
+    "Support-vector-machines": 287,
+    "Economics": 288,
+    "Finance": 289,
+    "Modularisation": 290,
+    "Modularity": 291
  },
  "_version": 3
 }
--- a/Machine.md
+++ b/Machine.md
@ -0,0 +1,44 @@
+---
+authors:
+
+  - "Liu, Yonggang"
+  
+  - "Wang, Xiao"
+  
+  - "Li, Liang"
+  
+  - "Cheng, Shuo"
+  
+  - "Chen, Zheng"
+  
+citekey: "liuNovelLaneChange2019"
+publish_date: 2019-01-01
+journal: "IEEE Access"
+volume: 7
+pages: 26543-26550
+last_import: 2025-07-21
+---
+
+# Indexing Information
+Published: 2019-01
+
+**DOI**
+[10.1109/ACCESS.2019.2900416](https://doi.org/10.1109/ACCESS.2019.2900416)
+#Trajectory, #Autonomous-vehicles, #Safety, #Autonomous-vehicle, #Bayesian-optimization, #Decision-making, #drivers’-habits, #Hidden-Markov-models, #lane-change-decision-making, #support-vector-machine, #Support-vector-machines
+
+
+#ToRead
+
+
+>[!Abstract]
+>Autonomous driving is a crucial issue of the automobile industry, and research on lane change is its significant part. Previous works on the autonomous vehicle lane change mainly focused on lane change path planning and path tracking, but autonomous vehicle lane change decision making is rarely mentioned. Therefore, this paper establishes an autonomous lane change decision-making model based on benefit, safety, and tolerance by analyzing the factors of the autonomous vehicle lane change. Then, because of the multi-parameter and non-linearity of the autonomous lane change decision-making process, a support vector machine (SVM) algorithm with the Bayesian parameters optimization is adopted to solve this problem. Finally, we compare a lane change model based on rules with the proposed SVM model in the test set, and results illustrate that the SVM model performs better than the rule-based lane change model. Moreover, the real car experiment is carried out to verify the effectiveness of the decision model.>[!seealso] Related Papers
+>
+
+# Annotations
+## Notes
+![[Notes on Papers/A Novel Lane Change Decision-Making Model of Autonomous Vehicle Based on Support Vector Machine.md]]
+
+## Highlights From Zotero
+
+## Follow-Ups
+
--- a/control.md
+++ b/control.md
@ -9,7 +9,7 @@ journal: "IEEE Control Systems Magazine"
 volume: 17
 issue: 2
 pages: 75-93
-last_import: 2025-05-12
+last_import: 2025-07-21
 ---

 # Indexing Information
@ -20,7 +20,8 @@ Published: 1997-04
 #Control-systems, #Stability-analysis, #Computer-networks, #Concurrent-computing, #Convergence-of-numerical-methods, #Electrical-equipment-industry, #Industrial-control, #Neural-networks, #Proposals, #Taxonomy


-#ToRead
+#InFirstPass
+


 >[!Abstract]
@ -29,17 +30,19 @@ Published: 1997-04

 # Annotations
 ## Notes
-![[Paper Notes/A systematic classification of neural-network-based control.md]]
+![[Notes on Papers/A systematic classification of neural-network-based control.md]]

 ## Highlights From Zotero
 >[!highlight] Highlight 
 > rigorous comparisons neural-network controllers have fared better than well- established conventional options when the plant characteristics are poorly known [2-61.
 > 2025-04-15 9:18 am
+>


 >[!done] Important
 > In order to illus- tr te the unavoidable basic terminology for the unfamiliar re 9 der, a neural network can be regarded simply as a generic
 > 2025-04-15 9:22 am
+>



@ -47,14 +50,17 @@ Published: 1997-04
 >[!highlight] Highlight 
 > mapping,
 > 2025-04-17 4:06 pm
+>

 >[!highlight] Highlight 
 > d also for classifi- cation and optimization tasks.  An overview of the proposed classification is shown in Fig. 1.  The relatively limited option of using neural networks to merely aid a non-neural controller is further classified in the following section. Of the schemes in which the con
 > 2025-04-17 4:06 pm
+>

 >[!highlight] Highlight 
 > Of the schemes in which the controller itself is a neural network, the section “Train Based on U” classifies the alternative where control-input signals U are available for training the neural controller and the section “Train Based on Goal”classifies the option where the network devises the needed control strategy on its own, based on the ultimate control objective. C
 > 2025-04-17 1:02 pm
+>



--- a/Notes/Economics
+++ b/Notes/Economics
@ -0,0 +1,80 @@
+---
+authors:
+
+  - "Mignacca, B."
+  
+  - "Locatelli, G."
+  
+citekey: "mignaccaEconomicsFinanceSmall2020"
+publish_date: 2020-02-01
+journal: "Renewable and Sustainable Energy Reviews"
+volume: 118
+pages: 109519
+last_import: 2025-07-21
+---
+
+# Indexing Information
+Published: 2020-02
+
+**DOI**
+[10.1016/j.rser.2019.109519](https://doi.org/10.1016/j.rser.2019.109519)
+#Economics, #Finance, #Modularisation, #Modularity, #Small-modular-reactor, #SMR
+
+
+#ToRead
+
+
+>[!Abstract]
+>The interest toward Small Modular nuclear Reactors (SMRs) is growing, and the economic competitiveness of SMRs versus large reactors is a key topic. Leveraging a systematic literature review, this paper firstly provides an overview of “what we know” and “what we do not know” about the economics and finance of SMRs. Secondly, the paper develops a research agenda. Several documents discuss the economics of SMRs, highlighting how the size is not the only factor to consider in the comparison; remarkably, other factors (co-siting economies, modularisation, modularity, construction time, etc.) are relevant. The vast majority of the literature focuses on economic and financial performance indicators (e.g. Levelized Cost of Electricity, Net Present Value, and Internal Rate of Return) and SMR capital cost. Remarkably, very few documents deal with operating and decommissioning costs or take a programme (and its financing) rather than a “single project/plant/site” perspective. Furthermore, there is a gap in knowledge about the cost-benefit analysis of the “modular construction” and SMR decommissioning.>[!seealso] Related Papers
+>
+
+# Annotations
+## Notes
+![[Notes on Papers/Economics and finance of Small Modular Reactors- A systematic review and research agenda.md]]
+
+## Highlights From Zotero
+>[!tip] Brilliant
+> Remarkably, very few documents deal with operating and decommissioning costs or take a programme (and its financing) rather than a “single project/plant/site” perspective. Furthermore, there is a gap in knowledge about the cost-benefit analysis of the “modular construction” and SMR decommissioning.
+> 2025-07-16 12:27 pm
+>
+
+>[!highlight] Highlight 
+> he International Atomic Energy Agency [1] defines Small Modular Reactors (SMRs) as “newer generation [nuclear] reactors designed to generate electric power up to 300 MW, whose components and systems can be shop fabricated and then transported as modules to the sites for installation as demand arises” (Page 1).
+> 2025-07-16 12:27 pm
+>
+
+>[!highlight] Highlight 
+> Economics and finance are two sides of the same coin, and the appraisal of a certain technology needs to consider both. Consequently, both economic and financial studies are reviewed in this paper.
+> 2025-07-16 12:29 pm
+>
+
+>[!highlight] Highlight 
+> SMR fuel cost is a relatively small percentage of the total cost [19, 25], and given the same technology, it is not differentiable between large and small reactors. Therefore, studies about the fuel cost are excluded from the analysis.
+> 2025-07-16 12:31 pm
+>
+
+>[!tip] Brilliant
+> The distribution of the final retrieved documents is:  - SMR Economics and finance: 46 documents; - SMR Construction: 14 documents; - SMR O&M: 3 documents; - SMR Decommissioning: 2 documents.
+> 2025-07-16 12:33 pm
+>
+> *Wow! There's barely any O&M Papers*
+
+
+>[!highlight] Highlight 
+> 3.1.1.2. Operation and maintenance costs. Operation and maintenance (O&M) costs are the costs needed for the operation and maintenance of an NPP [37]. O&M costs include “all non-fuel costs, such as costs of plant staffing, consumable operating materials (worn parts) and equipment, repair and interim replacements, purchased services, and nuclear insurance. They also include taxes and fees, decommissioning allowances, and miscellaneous costs” [10] (Page 33).
+> 2025-07-16 12:32 pm
+>
+
+>[!tip] Brilliant
+> Furthermore, cost saving in O&M costs can be achieved through the shared control of multi-module reactors determining a reduction of the staffing cost [33]. However, [31] points out an expected SMR staff cost per MWe 40% higher with respect to LRs.
+> 2025-07-16 12:35 pm
+>
+
+>[!done] Important
+> Modularisation has several implications: working in a better-controlled environment, standardisation and design simplification, reduction of the construction time, logistical challenges. Modularity allows having a favourable cash flow profile, taking advantage of the co-siting economies, cogeneration for the load following of NPPs, a higher and faster industrial learning, and better adaptability to market conditions. Furthermore, the interest in SMRs is growing because of the different applications: electrical, heat, hydrogen production, and seawater desalination.
+> 2025-07-16 12:34 pm
+>
+
+
+## Follow-Ups
+
--- a/Notes/Evaluating
+++ b/Notes/Evaluating
@ -9,7 +9,7 @@ authors:
  
 citekey: "tjengEvaluatingRobustnessNeural2019"
 publish_date: 2019-02-18
-last_import: 2025-05-12
+last_import: 2025-07-21
 ---

 # Indexing Information
@ -20,7 +20,8 @@ Published: 2019-02
 #Computer-Science---Cryptography-and-Security, #Computer-Science---Machine-Learning, #Computer-Science---Computer-Vision-and-Pattern-Recognition


-#ToRead
+#InSecondPass
+


 >[!Abstract]
@ -30,9 +31,56 @@ Published: 2019-02

 # Annotations
 ## Notes
-![[Paper Notes/Evaluating Robustness of Neural Networks with Mixed Integer Programming.md]]
+![[Notes on Papers/Evaluating Robustness of Neural Networks with Mixed Integer Programming.md]]

 ## Highlights From Zotero
+>[!tip] Brilliant
+> In particular, we determine for the first time the exact adversarial accuracy ofan MNIST classifier to perturbations with bounded l∞ norm   = 0.1: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness to norm-bounded perturbations for the remainder. Across all robusttraining procedures and network architectures considered, and for both the MNISTand CIFAR-10 datasets, we are able to certify more samples than the state-of-the-artand find more adversarial examples than a strong first-order attack.
+> 2025-07-09 7:56 am
+>
+
+>[!tip] Brilliant
+> Second, since the predicted label is determined bythe unit in the final layer with the maximum activation, proving that a unit never has themaximum activation over all bounded perturbations eliminates it from consideration. Weexploit both phenomena, reducing the overall number of non-linearities considered.
+> 2025-07-09 9:09 am
+>
+> *Can this he used to say that safely controller has no false positives for a region??*
+
+
+>[!highlight] Highlight 
+> Verification as solving an MILP. The general problem of verification is to determine whether someproperty P on the output of a neural network holds for all input in a bounded input domain C ⊆ Rm.For the verification problem to be expressible as solving an MILP, P must be expressible as theconjunction or disjunction of linear properties Pi,j over some set of polyhedra Ci, where C = ∪Ci.
+> 2025-07-09 9:16 am
+>
+
+>[!highlight] Highlight 
+> Let G(x) denote the region in the input domain corresponding to all allowable perturbations of a particular input x.
+> 2025-07-14 9:01 pm
+>
+
+>[!highlight] Highlight 
+> As in Madry et al. (2018), we say that a neural network is robust toperturbations on x if the predicted probability of the true label λ(x) exceeds that of every other labelfor all perturbations:
+> 2025-07-09 9:19 am
+>
+
+>[!highlight] Highlight 
+> As long as G(x) ∩ Xvalid can be expressed as the union of a set of polyhedra, the feasibility problem can be expressed as an MILP.
+> 2025-07-14 9:01 pm
+>
+
+>[!highlight] Highlight 
+> Let d(·, ·) denote a distance metric that measures the perceptual similarity between two input images
+> 2025-07-14 9:01 pm
+>
+
+>[!tip] Brilliant
+> Determining tight bounds is critical for problem tractability: tight bounds strengthen the problem formulation and thus improve solve times (Vielma, 2015). For instance, if we can prove that the phase of a ReLU is stable, we can avoid introducing a binary variable. More generally, loose bounds on input to some unit will propagate downstream, leading to units in later layers having looser bounds.
+> 2025-07-14 9:23 pm
+>
+
+>[!tip] Brilliant
+> The key observation is that, for piecewise-linear non-linearities, there are thresholds beyond which further refining a bound will not improve the problem formulation. With this in mind, we adopt a progressive bounds tightening approach: we begin by determining coarse bounds using fast procedures and only spend time refining bounds using procedures with higher computational complexity if doing so could provide additional information to improve the problem formulation.4
+> 2025-07-14 9:28 pm
+>
+

 ## Follow-Ups

--- a/Notes/Formal
+++ b/Notes/Formal
@ -0,0 +1,46 @@
+---
+authors:
+
+  - "Giannakopoulou, Dimitra"
+  
+  - "Mavridou, Anastasia"
+  
+  - "Rhein, Julian"
+  
+  - "Pressburger, Thomas"
+  
+  - "Schumann, Johann"
+  
+  - "Shi, Nija"
+  
+citekey: "giannakopoulouFormalRequirementsElicitation2020"
+publish_date: 2020-01-01
+last_import: 2025-07-21
+---
+
+# Indexing Information
+Published: 2020-01
+
+
+
+
+#InSecondPass
+
+
+
+>[!seealso] Related Papers
+>
+
+# Annotations
+## Notes
+![[Notes on Papers/Formal requirements elicitation with FRET.md]]
+
+## Highlights From Zotero
+>[!tip] Brilliant
+> A fretish requirement description is automatically parsed into six sequential fields, with the fret editordynamically coloring the text corresponding to the fields as the requirement is typed in (Figure 2): scope, condition, component, shall, timing, and response.
+> 2025-07-15 11:32 pm
+>
+
+
+## Follow-Ups
+
--- a/Notes/Runtime
+++ b/Notes/Runtime
@ -0,0 +1,164 @@
+---
+authors:
+
+  - "Lazarus, Christopher"
+  
+  - "Lopez, James G."
+  
+  - "Kochenderfer, Mykel J."
+  
+citekey: "lazarusRuntimeSafetyAssurance2020"
+publish_date: 2020-10-01
+pages: 1-9
+last_import: 2025-07-21
+---
+
+# Indexing Information
+Published: 2020-10
+
+**DOI**
+[10.1109/DASC50938.2020.9256446](https://doi.org/10.1109/DASC50938.2020.9256446)
+#Control-systems, #Reinforcement-learning, #reinforcement-learning, #Safety, #Switches, #Aerospace-control, #Aircraft, #Atmospheric-modeling, #runtime-safety-assurance, #Unmanned-Aerial-Systems-UAS
+
+
+#InSecondPass
+
+
+
+>[!Abstract]
+>The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach.>[!seealso] Related Papers
+>
+
+# Annotations
+## Notes
+![[Notes on Papers/Runtime Safety Assurance Using Reinforcement Learning.md]]
+
+## Highlights From Zotero
+>[!tip] Brilliant
+> We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a metacontroller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed.
+> 2025-07-08 9:37 am
+>
+
+>[!tip] Brilliant
+> Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency.
+> 2025-07-08 9:37 am
+>
+
+>[!highlight] Highlight 
+> Unfortunately, the cost to formally verify a nonpedigreed or black-box autopilot for a variety of vehicle types and use cases is generally prohibitive.
+> 2025-07-08 9:44 am
+>
+
+>[!highlight] Highlight 
+> In order for this mechanism to work, the system needs to be able to distinguish between safe scenarios under which the operation should remain controlled by πn and scenarios that would likely lead to unsafe conditions in which the control should be switched to πr. We assume that a recovery controller πr is given and this work does not focus on its design or implementation.
+> 2025-07-08 9:46 am
+>
+
+>[!highlight] Highlight 
+> The problem that we address in this work is determining how to decide when to switch from the nominal controller πn to the recovery controller πr while balancing the trade-off between safety and efficiency.
+> 2025-07-08 9:46 am
+>
+
+>[!warning] Dubious
+> We postulate that the task of navigating an aircraft from an origin to a destination by following a pre-planned path is far more complex than the task of predicting whether the aircraft is operating safely within a short time horizon of a given point.
+> 2025-07-08 9:47 am
+>
+
+>[!done] Important
+> The goal of RTSA systems is to guarantee the safe operation of a system despite having black-box components as a part of its controller. Safe operation is specified by an envelope E ⊂ S which corresponds to a subset of the state space S within which the system is expected to operate ideally. RTSA continuously monitors the state of the system and switches to the recovery controller if and only if when not doing so, would lead the system to exit the safety envelope.
+> 2025-07-08 9:50 am
+>
+
+>[!warning] Dubious
+> It must switch to the recovery control πr whenever the aircraft leaves the envelope.
+> 2025-07-08 9:51 am
+>
+> *I mean really there is an envelope E_r \subset E \subset S that is the region within the safety envelope that is recoverable... no?*
+
+
+>[!done] Important
+> • Its implementation must be easily verifiable. This means it must avoid black box models that are hard to verify such as deep neural networks (DNNs)[3].
+> 2025-07-08 9:53 am
+>
+
+>[!done] Important
+> We model the evolution of the flight of an aircraft equipped with an RTSA system by defining the following MDP: M = (S, A, T, R) where the elements are defined below.  • State space S ∈ Rp: a vector representing the state of the environment and the vehicle. • Action space A ∈ {deploy, continue}: whether to deploy the recovery system or let the nominal controller remain in control. • Transition T (s, a): a function that specifies the transition probability of the next state s′ given that action a was taken at step s, in our case this will be sampled from a simulator by querying f (s, a). • Reward R(s, a, s′), the reward collected corresponding to the transition. This will be designed to induce the desired behavior.
+> 2025-07-08 9:56 am
+>
+
+>[!done] Important
+> In this model, the RTSA system is considered the agent while the states correspond to the position, velocity and other relevant information about the aircraft with respect to the envelope and the actions correspond to deploying the recovery controller or not. The agent receives a large negative reward for abandoning the envelope and a smaller negative reward for deploying the recovery controller in situations where it was not required. This reward structure is designed to heavily penalize situations in which the aircraft exits the safety envelope and simultaneously disincentivize unnecessary deployments of the recovery controller. The rewards at each step are weighted by a discount factor γ < 1 such that present rewards are worth more than future ones.
+> 2025-07-08 9:59 am
+>
+
+>[!done] Important
+> Additionally, we consider the nominal controller to be a black box. All of these conditions combined lead us to operate under the assumption that the transition function is unknown and we do not have access to it. We do, however, have access to simulators from which we can query experience tuples (s, a, r, s′) by providing a state s and an action a and fetching the next state s′ and associated reward r. In this setting we can learn a policy from experience with RL.
+> 2025-07-08 10:00 am
+>
+
+>[!fail] This ain't right
+> RL has successfully been applied to many fields [8]–[10].
+> 2025-07-08 10:00 am
+>
+> *Citation stash lmao*
+
+
+>[!done] Important
+> Q-learning as described above enables the estimation of tabular Q-functions which are useful for small discrete problems. However, cyber-physical systems often operate in contexts which are better described with a continuous state space. The problem with applying tabular Q-learning to these larger state spaces is not only that they would require a large state-action table but also that a vast amount of experience would be required to accurately estimate the values. An alternative approach to handle continuous state spaces is to use Q-function approximation where the state-action value function is approximated which enables the agent to generalize from limited experience to states that have not been visited before and additionally avoids storing a gigantic table. Policies based on value function approximation have been successfully demonstrated in the aerospace domain before [12].
+> 2025-07-08 11:14 am
+>
+
+>[!warning] Dubious
+> Instead, we will restrict our attention to linear value function approximation, which involves defining a set of features Φ(s, a) ∈ Rm that captures relevant information about the state and then use these features to estimate Q by linearly combining them with weights θ ∈ Rm×|A| to estimate the value of each action. In this context, the value function is represented as follows:  Q(s, a) =  m  ∑  i=1  θiφi(s, a) = θT Φ(s, a) (9)  Our learning problem is therefore reduced to estimating the parameters θ and selecting our features. Typically, domain knowledge will be leveraged to craft meaningful features Φ(s, a) = (φ1(s, a), φ2(s, a), ..., φm(s, a)), and ideally they would capture some of the geometric information relevant for the problem, e.g. in our setting, heading, velocity and distance to the geofence. The ideas behind the Q-learning algorithm can be extended to the linear value function approximation setting. Here, we initialize our parameters θ and update them at each transition to reduce the error between the predicted value and the observed reward. The algorithm is outlined below and it forms the basis of the learning procedure used in the experiments in Section III.
+> 2025-07-08 11:18 am
+>
+> *This sounds like a constantly updating principal component analysis with some best fitting going on. Interesting, but if the reward is nonlinear (which seems likely?) isn't this problematic?*
+
+
+>[!highlight] Highlight 
+> We restrict our function family to linear functions which can be easily understood and verified. A major drawback, however, is that linear functions are less expressive than DNNs, which makes their training more difficult and requires careful crafting of features.
+> 2025-07-08 11:25 am
+>
+
+>[!done] Important
+> Despite the relative simplicity of linear value function approximators when compared to DNNs, we observed that they are able to capture relevant information about the environment and are well suited for this task.
+> 2025-07-08 11:29 am
+>
+
+>[!highlight] Highlight 
+> One approach to address this problem is to avoid or reduce the chance of random exploration. We dramatically increase the likelihood of observing episodes where the mission is completed successfully without exiting the envelope, but we also bias the learning process towards exploitation. It is well known that in order for a policy to converge under Q-learning, exploration must proceed indefinitely [15]. Additionally, in the limit of the the number of steps, the learning policy has to be greedy with respect to the Q-function [15]. Accordingly, avoiding or dramatically reducing random exploration can negatively affect the learning process and should be avoided.
+> 2025-07-08 11:33 am
+>
+
+>[!highlight] Highlight 
+> Instead of randomly initializing the parameters of the Qfunction approximation and then manually biasing the weights to decrease the chance of randomly deploying the recovery controller, we can use a baseline policy to generate episodes in which the RTSA system exhibited a somewhat acceptable performance. From these episodes, we learn the parameters in an offline approach known as batch reinforcement learning. It is only after we learn a good initialization of our parameters that we then start the training process of our policy πRT SA. For this purpose and to have a benchmark to compare our approach to, we define a baseline policy that consists of shrinking the safety envelope by specifying a distance threshold δ > 0. When the vehicle reaches a state that is less than δ distance away from exiting the envelope, the recovery controller is deployed. This naive approach serves both as a baseline for our experiments and also provides us with experience to initialize the weights of our policy before we do on-policy learning.
+> 2025-07-08 11:36 am
+>
+
+>[!highlight] Highlight 
+> We used configuration composed of a hexarotor simulator that models lift rotor aircraft features and includes a three dimensional Bezier curve trajectory definition module, nonlinear multi-rotor dynamics, input/output linearization, nested saturation, a cascade PID flight control model, and an extended Kalman filter estimation model. An illustration of the simulator environment in three dimensions is included in Figure 2 [17].
+> 2025-07-08 11:46 am
+>
+> *Notably, not really anything special with the controller setup. Shouldn't we be able to do math using the PID controller and set bounds? or perhaps it's a nonlinear issue? Probably the second thing.*
+
+
+>[!fail] This ain't right
+> deploying a parachute which was modeled using a simplified model that introduces a drag coefficient that only affects the z coordinates in the simulation.
+> 2025-07-08 11:47 am
+>
+
+>[!fail] This ain't right
+> The state space in our simulation is comprised of more than 250 variables. Some correspond to simulation parameters  0246  0  1  2  3  −0−.040..200.02.4  0  1  2  3 path waypoints trajectory  0246  −0.4  −0.2  0.0  0.2  0.4  Fig. 3. Example of environment configuration and episode data with wind.  such as sampling time, simulation time and physical constants. Another set of variables represent physical magnitudes such as velocity scaling factors, the mass of components of the hexarotor, distribution of the physical components of the hexarotor, moments of inertia and drag coefficients. Other variables represent maximum and minimum roll, pitch and yaw rates, rotor speed and thrust. The sensor readings, their biases, frequencies and other characteristics are also represented by other variables. Other variables represent the state of the controller and the actions it prescribes for the different actuators in the vehicle. And a few of them correspond to the position and velocity of the hexarotor during the simulation. Figure 4 shows the evolution of the position and velocity variables for a simulation episode corresponding to the example environment configuration. All of these variables are needed to completely specify the state of the world in the simulation and illustrate the high dimensional requirements of a somewhat accurate representation of flight dynamics in a simulator. In principle, all these variables would be needed to specify a policy for our RTSA system, but as discussed in Section II we can rely on value function approximation by crafting a set of informative features which significantly reduces the dimensionality of our problem.
+> 2025-07-08 11:50 am
+>
+> *This has to be wrong. There's no way they need 250 states when many of these phenomena must be coupled. Drag coefficients for example are completely dependent on velocity and constants. That is a redundant state, less for the issue of it changing for the recovery controller. I'd still say that's a hybrid system thing.*
+
+
+>[!highlight] Highlight 
+> In this work we restricted our attention to terminal recovery controllers.
+> 2025-07-08 9:43 am
+>
+
+
+## Follow-Ups
+
--- a/control.md
+++ b/control.md
@ -0,0 +1,32 @@
+# First Pass
+**Category:**
+This paper is a literature review
+
+**Context:** 
+Mukul says that there are a whole bunch of thrusts out there about neural 
+network based controllers, and that there needs to be a better way to 
+parse what research is what
+
+**Correctness:** 
+Looks good to me I suppose.
+
+**Contributions:** 
+Here's the most important figure in the whole thing:
+![[Pasted image 20250721095146.png]]
+
+**Clarity:** 
+Well written.
+
+# Second Pass
+**What is the main thrust?**
+
+**What is the supporting evidence?**
+
+**What are the key findings?**
+
+# Third Pass
+**Recreation Notes:**
+
+**Hidden Findings:**
+
+**Weak Points? Strong Points?**
--- a/Papers/Evaluating
+++ b/Papers/Evaluating
@ -0,0 +1,42 @@
+# First Pass
+**Category:** 
+This is a methods paper. 
+
+**Context:** 
+This paper proposes a way of using mixed integer linear programming (MILP)
+to  evaluate properties of neural networks.
+
+**Correctness:** 
+Formal
+
+**Contributions:** 
+They do nifty things with bounds tightening and presolving that makes their
+solver very fast compared to the state of the art or Reluplex. They also talk 
+about stable and unstable neurons.
+
+**Clarity:** 
+They have a really good explanation of what a MILP problem is and how one
+might encode a neural network as one.
+
+# Second Pass
+**What is the main thrust?**
+The main thrust is their new solving method of MILPs for neural networks.
+With their method, neural networks can have their neurons analyzed to 
+prove whether or not the network is robust to input perturbations. This is 
+especially important for classifiers, who need to know if there are sneaky
+nonlinearities that can be harmful to a built system (like a glitch). This 
+method of bounds tightening and MILP usage makes their solver much faster
+and therein more capable to handle large networks.
+
+**What is the supporting evidence?**
+They have a whole bunch of experimental results.
+
+**What are the key findings?**
+MILPs and bound tightening is very good!
+
+# Third Pass
+**Recreation Notes:**
+
+**Hidden Findings:**
+
+**Weak Points? Strong Points?**
--- a/20250721095146.png
+++ b/20250721095146.png