Formal Methods for AI Safety
Future intelligent systems could cause great harm to humanity. Because of the large risks, if we are to responsibly build intelligent systems, they must not only be safe but we must be very convinced that they are safe. For example, an AI which is taught human morality by reinforcement learning might be safe, but it’s hard to see how we could become sufficiently convinced to responsibly deploy it.
Before deploying an advanced intelligent system, we should have a very convincing argument for its safety. If that argument is rigorously correct, then it is a mathematical proof. This is not a trivial endeavor! It’s just the only path that appears to be open to us.
The mathematical foundations of computing have been known since Church and Turing‘s work in 1936. Both created computational models which were simultaneously logical models about which theorems could be proved. Church created the lambda calculus which has since become the foundation for programming languages and Turing created the Turing machine which is the fundamental model for the analysis of algorithms.
Many systems for formal verification of properties of hardware and software have been constructed. John McCarthy created the programming language Lisp very explicitly from the lambda calculus. I studied with him in 1977 and did many projects proving properties of programs. de Bruijn‘s Automath system from 1967 was used to prove and verify many mathematical and computational properties.
There are now more than one hundred formal methods systems and they have been used to verify a wide variety of hardware systems, cryptographic protocols, compilers, and operating systems. After Intel had to write off $475 million due to the Pentium P5 floating point division bug, they started verifying their hardware using formal methods.
While these advances have been impressive, the world’s current technological infrastructure is woefully buggy, insecure, and sloppy. Computer science should be the most mathematical of all engineering disciplines with a precise stack of verified hardware, software, operating systems, and networks. Instead we have seething messes at all levels.
Building precise foundations will not be easy! I have been hard at work on new programming languages, specification languages, verification principles, and principles for creating specifications. Other groups have been proceeding in similar directions. Fortunately, intelligent systems are likely to be very helpful in this enterprise if we can build a trust foundation on top of which we can safely use them.
I have proposed the “Safe-AI Scaffolding Strategy” as a sequence of incremental steps toward the development of more powerful and flexible intelligent systems in which we have provable confidence of safety at each step. The systems in the early steps are highly constrained and so the safety properties are simpler to specify: only run on specified hardware, do not use more than the allocated resources, do not self-improve in uncontrolled ways, do not autonomously replicate, etc. Specifying the safety properties of more advanced systems which directly engage with the world is more challenging. I will present approaches for dealing with those issues at a later time.