18 KiB
| authors | citekey | publish_date | last_import | ||
|---|---|---|---|---|---|
|
urbanReviewFormalMethods2021 | 2021-04-21 | 2025-07-07 |
Indexing Information
Published: 2021-04
DOI 10.48550/arXiv.2104.02466 #Computer-Science---Machine-Learning, #Computer-Science---Logic-in-Computer-Science, #Computer-Science---Programming-Languages
#ToRead
[!Abstract] We review state-of-the-art formal methods applied to the emerging field of the verification of machine learning systems. Formal methods can provide rigorous correctness guarantees on hardware and software systems. Thanks to the availability of mature tools, their use is well established in the industry, and in particular to check safety-critical applications as they undergo a stringent certification process. As machine learning is becoming more popular, machine-learned components are now considered for inclusion in critical systems. This raises the question of their safety and their verification. Yet, established formal methods are limited to classic, i.e. non machine-learned software. Applying formal methods to verify systems that include machine learning has only been considered recently and poses novel challenges in soundness, precision, and scalability. We first recall established formal methods and their current use in an exemplar safety-critical field, avionic software, with a focus on abstract interpretation based techniques as they provide a high level of scalability. This provides a golden standard and sets high expectations for machine learning verification. We then provide a comprehensive and detailed review of the formal methods developed so far for machine learning, highlighting their strengths and limitations. The large majority of them verify trained neural networks and employ either SMT, optimization, or abstract interpretation techniques. We also discuss methods for support vector machines and decision tree ensembles, as well as methods targeting training and data preparation, which are critical but often neglected aspects of machine learning. Finally, we offer perspectives for future research directions towards the formal verification of machine learning systems.>[!seealso] Related Papers
Annotations
Notes
!Paper Notes/A Review of Formal Methods applied to Machine Learning.md
Highlights From Zotero
[!tip] Brilliant Recent advances in artificial intelligence and machine learning and the availability of vast amounts of data allow us to develop computer software that efficiently and autonomously perform complex tasks that are difficult or even impossible to design using traditional explicit programming (e.g., image classification, speech recognition, etc.). This makes machine-learned software particularly attractive even in safetycritical applications, as it enables performing a whole world of new functions that could not be envisioned before 2025-05-06 5:03 pm
[!warning] Dubious Safety-critical applications require an extremely high level of insurance that the software systems behave correctly and reliably. 2025-05-06 5:04 pm
[!quote] Other Highlight For instance, they are used at an industrial level in avionics [146], where the development processes of aircraft software systems have very stringent assurance and verification requirements mandated by international standards (i.e., DO-178C). 2025-05-12 11:25 am
[!highlight] Highlight In contrast, research in formal methods for the development and assurance of machine learning systems remains today extremely limited. It is however rapidly gaining interest and popularity, mostly driven by the growing needs of the automotive and avionics industries. 2025-05-06 5:07 pm
[!highlight] Highlight Specifically, in the following, we give an introduction to formal methods (Section 2) with a particular focus on static analysis by abstract interpretation [37], as it offers a unifying theory for reasoning about disparate formal verification approaches (Section 2.3). 2025-05-06 5:07 pm
[!done] Important Formal methods are an array of techniques that employ logic and mathematics in order to provide rigorous guarantees about the correctness of computer software. 2025-06-23 2:23 pm
[!done] Important Developing and checking by hand the very large proofs that are required to verify real-sized applications in realistic languages would be intractable. 2025-06-23 2:24 pm
[!tip] Brilliant However, a requirement of all formal methods is that they should be sound, that is, any correctness property established by the tool is indeed true of the actual executions of the program. 2025-06-23 2:38 pm
A good definition of soundness
[!highlight] Highlight Deductive Verification. Deductive methods stem directly from the work of Floyd, Hoare, and Naur [68, 83, 121]. The user provides a program and a formal specification expressed in a logic language. The specification is then propagated through the program source code, using symbolic execution as proposed by Burstall [23] or weakest preconditions as proposed by Dijkstra [51]. 2025-06-23 2:53 pm
[!done] Important One benefit of deductive verification is its ability to handle complex specifications using decidable logical theories (reasoning about, e.g., integers, floating-point numbers, pointers, arrays, recursive data-structures) and to perform modular verification by breaking down the specification into function contracts with preconditions and postconditions. 2025-06-23 2:53 pm
[!highlight] Highlight Proof Assistants. Interactive proof assistants are general-purpose tools able to produce reliable proofs of theorems, with application to mathematics, logic, and computer science. Their strength lies in the use of very expressive logic and the ability to check every proof with a small trusted core, reducing the possibilities of errors. Although they can offer some degree of automation (e.g., through tactics), they employ undecidable logics and essentially rely on a very tight interaction between the programmer and the tool, which can prove time consuming. 2025-06-23 2:56 pm
[!highlight] Highlight and other liveness properties), and the ability to check concurrent models. Model 2025-06-23 2:35 pm
[!tip] Brilliant Crafting a model that is both faithful and efficiently checked is a difficult task. Even then, the soundness and completeness properties only hold with respect to the model, not the original system. 2025-06-24 10:06 am
[!done] Important Bounded model checking [14] (e.g., the CBMC tool [35]) limits the exploration to execution traces of fixed length. The drawback of this method is that it loses soundness and completeness with respect to the full program; while it can uncover errors occurring at the beginning of the execution of the program, it cannot prove the absence of errors. 2025-06-24 10:07 am
[!done] Important The benefit of static analysis by abstract interpretation is full automation, parameterized precision, and the ability to verify large programs in real languages at the source level (C, Java, etc.) or binary level. The analysis is however inherently incomplete, and can fail to verify a desired property due to over-approximations. Thus, it can output false alarms, that must be checked by other means. 2025-06-24 10:08 am
[!done] Important As a critical aircraft component, avionic software is subject to certification by certification authorities and must obey the DO-178 international standard, which notably describes the verification process that such software must follow. Traditional methods consist in massive test campaigns and intellectual reviews, which have difficulties scaling up and maintaining a reasonable cost (which already accounts for more than half of the cost of overall software development). A shift occurred in the 2010s with the new DO-178C revision (and its DO-333 supplement) which opens the door to certification by formal methods. One key aspect of the standard is the emphasis on employing sound formal methods. Another key aspect is that any verification technique (formal or not) employed at the source level cannot be considered valid at the binary level unless the compilation process is also certified, which is important in practice as some tools can only reason at the source level. 2025-06-24 10:11 am
[!highlight] Highlight This state semantics forgets about the history of computation (i.e., which state appears before which state), and so, cannot be used to verify temporal properties. 2025-06-24 10:18 am
[!warning] Dubious With this abstraction, we can no longer prove that X never takes the value 4. 2025-06-24 10:19 am
Huh?
[!warning] Dubious As machine learning enters critical industries, it would be interesting to combine verification techniques for traditional software and for artificial intelligence using abstract interpretation as a common theory, and develop an analyzer adapted to this new generation of critical software. 2025-06-25 12:44 pm
This posits that artificial intelligence and software are the same thing, kind of.
[!done] Important The training phase is highly non-deterministic, which is in contrast with predictability and traceability requirements of traditional safety-critical software development processes 2025-06-25 12:45 pm
[!highlight] Highlight Most formal verification methods proposed so far in the literature apply to trained machine learning models (Section 3.1) while only few are dedicated to the data preparation and model training phases (Section 3.2). 2025-06-25 12:46 pm
[!fail] This ain't right they generally require several hours for neural networks with hundreds or thousands of neurons) 2025-06-25 12:51 pm
Relative to what???
[!highlight] Highlight Note, however, that soundness is not typically guaranteed with respect to floating-point arithmetic, but only with respect to computations on reals that ignore rounding errors and may thus differ from the actual computations [122, 105]. 2025-06-25 12:51 pm
[!done] Important SMT-based formal methods reduce the safety verification problem to a constraint satisfiability problem. Specifically, they encode a given neural network and (the negation of a) desired safety property as a set of constraints. If the constraints are satisfiable, the corresponding solution represents a counterexample to the safety property. Otherwise, the constraints are unsatisfiable, no counterexample can exist and thus the safety property holds. 2025-06-25 12:52 pm
[!tip] Brilliant The approach is evaluated on a small vehicle collision avoidance use case and on a neural network with over a thousand neurons trained for digit recognition on the MNIST dataset [101]. 2025-06-25 1:00 pm
[!tip] Brilliant The approach is applied to the verification of the ACAS Xu neural networks [90], developed as early prototype for the next-generation airborne collision avoidance system for unmanned aircraft. The neural networks take sensor data as input (i.e., speed and present course of the current aircraft and of any nearby intruder) and issue appropriate navigation advisories. They consist of six hidden layers with 300 neurons each. 2025-06-25 1:02 pm
[!done] Important The experimental evaluation shows that Marabou (in divide-and-conquer mode running on four cores) generally outperforms both Planet [60] and Reluplex and is able to verify all ACAS Xu benchmarks within a one hour timeout per benchmark. 2025-06-25 1:03 pm
[!highlight] Highlight Other Complete Formal Methods. Ruan et al. [134] presented an approach based on global optimization for verifying Lipschitz continuous neural networks [148], such as neural network with piecewise linear layers (i.e., fully-connected, max- and averagepooling, and convolutional layers) and ReLU as well as sigmoid and tanh activations. Specifically, given inputs to the neural networks and a Lipschitz continuous functions over its outputs (e.g., a function determining the confidence of the neural network in classifying an input), their approach computes lower and upper bounds on the function values. The approach is implemented in a tool named DeepGO (https://github.com/trustAI/DeepGO). Compared to Reluplex [92] and Sherlock [58], DeepGO on average is 36 times faster than Sherlock and almost 100 times faster than Reluplex. Moreover, DeepGO is shown to be able to scale to state-of-the-art neural networks with millions of neurons trained on the MNIST dataset [101]. 2025-06-25 5:53 pm
[!highlight] Highlight On the ACAS Xu neural networks, Neurify is on average 20 times faster than ReluVal and 5000 times faster than Reluplex. The experimental evaluation also shows that Neurify is able to verify safety properties of neural networks with over ten thousands hidden neurons such as the self-driving car convolutional neural network Dave [15]. 2025-06-25 5:56 pm
[!highlight] Highlight Incomplete formal verification methods are generally able to scale to large neural network architectures (they generally require at most a few minutes for neural networks with thousands of neurons) and are sound (often also with respect to floating-point arithmetic [140, 141, 105]) but suffer from false positives. 2025-06-30 10:26 am
[!done] Important The experimental evaluation on 90 of the ACAS Xu benchmarks shows that Marabou [93] combined with their abstraction framework outperforms Marabou alone. Within a 20 hours timeout, it is able to verify 58 benchmarks, while Marabou alone can only verify 35 benchmarks. The abstractionenhanced Marabou is also orders of magnitude faster than Marabou alone. 2025-06-30 10:44 am
[!done] Important Support vector machines [44] partition their input space in regions using hyperplanes that separate the training inputs according to their labels. They hyperplanes are chosen (i.e., learned during training) to maximize their distance (called margin) to their closest training inputs (called support vectors). 2025-06-30 1:59 pm
[!tip] Brilliant hen the training inputs are not linearly separable in the input space, the input space is implicitly projected, using so-called kernel functions, into a much higher-dimensional space in which the training inputs become linearly separable. This allows a support vector machine to also learn non-linear decision boundaries. 2025-07-07 11:41 am
[!done] Important Robust training aims at minimizing the worst-case loss for each given input, that is, the maximum loss (i.e., model error) over all possible perturbations of a given training input. However, since calculating the exact worst-case loss can be computationally costly in practice, robust training approaches typically minimize an estimate of the worst-case loss. 2025-06-30 3:14 pm
[!highlight] Highlight Another interesting possibility is to develop approaches to determine constraints on the training process that enforce a desired property on the trained model. For instance, one could enforce a specific behavior for a given input range of interest, or a particular structure of the trained model that facilitates its subsequent verification [171] (or its abstraction for verification [125, 145, 62]). 2025-05-06 5:10 pm
[!highlight] Highlight More generally, verification methods should allow verifying the behavior of trained models under all circumstances and not just over the expected safe input space. This is particularly useful in cases in which the trained model is part of a larger machine learning-based system [53] and is thus more susceptible to receive unexpected inputs, e.g., from malfunctioning sensors 2025-05-06 5:11 pm
[!highlight] Highlight In this case, it would be useful to design verification methods able to relate the behavior of the inexact model to the optimal expected behavior, e.g., by determining the worst-case approximation of the trained model. 2025-05-06 5:12 pm
[!highlight] Highlight Finally, there is a need for further approaches that support diverse model structures, such as recurrent neural networks [95, 175], and that can verify model implementations by taking floating-point rounding errors into account [140, 141, 105]. 2025-05-06 5:13 pm
Follow-Ups
[!example] The CompCert certified compiler [103] is used to compile the C code into binary.
- #Follow-Up
[!example] Kurd and Kelly [98] were the among the first to propose a characterizations of verification goals for neural networks used in safety-critical applications.
- #Follow-Up
[!example] Note that the presence of these non-linear ReLU constraints makes the problem NP-complete [92]
- What does 'NP Complete' mean? #Follow-Up
[!example] MILP-based Formal Methods.
- What the fuck is MILP #Follow-Up
[!example] Their framework supports the abstract domains of intervals [36], zonotopes [73], and polyhedra [38], as well as their bounded powersets.
- My brain hurty. #Follow-Up
[!example] Finally, a couple of theoretical results on the abstract interpretation of neural networks can be found in the literature. Baader et al. [5] show that for any continuous function f there exists a feed-forward fully-connected neural network with ReLU activations whose abstract interpretation using the domains of intervals [36] from an input region B is an arbitrarily close approximation of f on B.
- #Follow-Up
[!example] [4] A. Atki, A. Brahmi, D. Delmas, H. Essoussi, T. Marie, and F. Randimbivololona. Formalise to Automate: Deployment of a Safe and Cost-Efficient Process for Avionics Software. In ERTS 2018
- #Follow-Up
[!example] [60] R. Ehlers. Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks. In ATVA 2017.
- #Follow-Up
[!example] [107] B. Livshits, M. Sridharan, Y. Smaragdakis, O. Lhot ́ak, J. Nelson Amaral, B.-Y. E. Chang, S. Z. Guyer, U. P. Khedker, A. Møller, and Dimitrios Vardoulakis. In Defense of Soundiness: A Manifesto. 2015.
- #Follow-Up
[!example] [131] F. Ranzato, M. Zanella Robustness Verification of Support Vector Machines. In SAS 2019.
- #Follow-Up