13
Jul

## Logic for Safety

Many people seem to be misunderstanding the nature of mathematical proof in discussions of AI and software correctness, security, and safety. In this post, I’ll describe some of the background and context for this.

Almost every aspect of logic and mathematical proof has its origins in human language which emerged about 100,000 years ago. 2500 years ago, Aristotle and Euclid began the process of making the natural language rules precise. Modern logic began in 1677 when Leibniz tried to create a “calculus ratiocinator” to mechanically check precise arguments. The job was finished by Frege, Cantor, Zermelo, and Fraenkel in 1922 when they created a precise logical system capable of representing every mathematical argument and which stands as the foundation for mathematics today. Church and Turing extended this system to computation in 1936. Every precise argument in every computational, engineering, economic, scientific, and social discipline can be precisely represented in this formalism and efficiently checked on computer.

Some had hoped that beyond checking arguments, every statement might also be proven true or false in a mechanical way. These hopes were dashed by Goedel in 1931 when he published his incompleteness theorem showing that any logical system rich enough to represent the natural numbers must have statements which can neither be proven true nor false. In 1936 Turing found a simple computational variant now called “the halting problem” which showed there are some properties of some programs which cannot be proven or disproven.

But the Goedel statements and the uncomputable program properties are abstruse constructions that we never want to use in engineering! Engineering is about building devices we are confident will behave as we intend! Any decent programmer will have an argument as to why his program will work as intended. If he doesn’t have such an argument, he should be fired! If his argument is correct, it can be precisely represented in mathematical logic and checked by computer. The fact that this is not current standard practice is not due to limitations of logic or understanding but to sloppiness in the discipline and poor educational training. If engineers built bridges the way that we write programs, no one would dare drive over them. The abysmal state of today’s level of software correctness and security will likely be looked at with wonder and disgust by future generations.

Many countries are now building autonomous vehicles, autonomous drones, robot soldiers, autonomous trading systems, autonomous data gathering systems, etc. and the stakes are suddenly getting much bigger. If we want to be confident that these systems will not cause great harm, we need precise arguments to that effect. If you want to learn more about some of the extremely expensive and life harming consequences that have already happened due to ridiculous sloppiness in the design of our technology check out the beginning of a talk I gave at Stanford in 2007: https://www.youtube.com/watch?v=omsuTsOmvsc

11
Jul

## Formal Methods for AI Safety

Future intelligent systems could cause great harm to humanity. Because of the large risks, if we are to responsibly build intelligent systems, they must not only be safe but we must be very convinced that they are safe. For example, an AI which is taught human morality by reinforcement learning might be safe, but it’s hard to see how we could become sufficiently convinced to responsibly deploy it.

Before deploying an advanced intelligent system, we should have a very convincing argument for its safety. If that argument is rigorously correct, then it is a mathematical proof. This is not a trivial endeavor! It’s just the only path that appears to be open to us.

The mathematical foundations of computing have been known since Church and Turing‘s work in 1936. Both created computational models which were simultaneously logical models about which theorems could be proved. Church created the lambda calculus which has since become the foundation for programming languages and Turing created the Turing machine which is the fundamental model for the analysis of algorithms.

Many systems for formal verification of properties of hardware and software have been constructed. John McCarthy created the programming language Lisp very explicitly from the lambda calculus. I studied with him in 1977 and did many projects proving properties of programs. de Bruijn‘s Automath system from 1967 was used to prove and verify many mathematical and computational properties.

There are now more than one hundred formal methods systems and they have been used to verify a wide variety of hardware systems, cryptographic protocols, compilers, and operating systems. After Intel had to write off \$475 million due to the Pentium P5 floating point division bug, they started verifying their hardware using formal methods.

While these advances have been impressive, the world’s current technological infrastructure is woefully buggy, insecure, and sloppy. Computer science should be the most mathematical of all engineering disciplines with a precise stack of verified hardware, software, operating systems, and networks. Instead we have seething messes at all levels.

Building precise foundations will not be easy! I have been hard at work on new programming languages, specification languages, verification principles, and principles for creating specifications. Other groups have been proceeding in similar directions. Fortunately, intelligent systems are likely to be very helpful in this enterprise if we can build a trust foundation on top of which we can safely use them.

I have proposed the “Safe-AI Scaffolding Strategy” as a sequence of incremental steps toward the development of more powerful and flexible intelligent systems in which we have provable confidence of safety at each step. The systems in the early steps are highly constrained and so the safety properties are simpler to specify: only run on specified hardware, do not use more than the allocated resources, do not self-improve in uncontrolled ways, do not autonomously replicate, etc. Specifying the safety properties of more advanced systems which directly engage with the world is more challenging. I will present approaches for dealing with those issues at a later time.