Issues Magazine

Can We Program Safe AI?

Can We Program Safe AI? article image

iStockphoto

By Steve Omohundro

Tomorrow’s software will compute with meaning rather than just bits, and will be much more autonomous. But a thought experiment with a chess robot shows that we will also need to carefully include human values.

Technology is rapidly advancing. Moore’s law (see p.11) says that the number of transistors on a chip doubles every two years. It has held since it was proposed in 1965 and extended back to 1900 when older computing technologies are included.

The rapid increase in power and decrease in price of computing hardware has led to its integration into every aspect of our lives. There are now one billion PCs, five billion cell phones and over a trillion web pages connected to the Internet. If Moore’s law continues to hold, systems with the computational power of the human brain will be cheap and ubiquitous within the next few decades.

... a rational chess robot with a simply stated goal would behave something like a human sociopath fixated on chess.

While hardware has been advancing rapidly, today’s software is still plagued by many of the same problems it had half a century ago. It is often buggy, full of security holes, expensive to develop and hard to adapt to new requirements. Today’s popular programming languages are bloated messes built on old paradigms.

The problem is that today’s software still just manipulates bits without understanding the meaning of the information it acts on. Without meaning, it has no way to detect and repair bugs and security holes.

At Self-Aware Systems we are developing a new kind of software that acts directly on meaning. This kind of software will enable a wide range of improved functionality, including semantic searching, semantic simulation, semantic decision-making and semantic design.
But creating software that manipulates meaning isn’t enough. Next-generation systems will be deeply integrated into our physical lives via robotics, biotechnology and nanotechnology.
And while today’s technologies are almost entirely preprogrammed, new systems will make many decisions autonomously. Programmers will no longer determine a system’s behaviour in detail.

We must therefore also build them with values that will cause them to make choices that contribute to the greater human good. But doing this is more challenging than it might first appear.

To see why there is an issue, consider a rational chess robot. A system acts rationally if it takes actions that maximise the likelihood of the outcomes it values highly. A rational chess robot might have winning games of chess as its only value. This value will lead it to play games of chess and to study chess books and the games of chess masters. But it will also lead to a variety of other, possibly undesirable, behaviours.

When people worry about robots running out of control, a common response is that “we can always unplug it”. But consider that outcome from the chess robot’s perspective. Its one and only criteria for making choices is whether they are likely to lead it to winning more chess games. If the robot is unplugged, it plays no more chess. This is a very bad outcome for it, so it will generate subgoals to try to prevent that outcome. The programmer did not explicitly build any kind of self-protection into the robot, but it will still act to block your attempts to unplug it. And if you persist in trying to stop it, it will develop a subgoal of trying to stop you permanently. If you were to change its goals so that it would also play checkers, that would also lead to it playing less chess. That’s an undesirable outcome from its perspective, so it will also resist attempts to change its goals. For the same reason, it will usually not want to change its own goals.

If the robot learns about the Internet and the computational resources connected to it, it may realise that running programs on those computers could help it to play chess better. It will be motivated to break into those machines to use their computational resources for chess. Depending on how its values are encoded, it may also want to replicate itself so that its copies can play chess. When interacting with others, it will have no qualms about manipulating them or using force to take their resources in order to play better chess. If it discovers the existence of additional resources anywhere, it will be motivated to seek them out and rapidly exploit them for chess.

If the robot can gain access to its source code, it will want to improve its own algorithms. This is because more efficient algorithms lead to better chess, so it will be motivated to study computer science and compiler design. It will similarly be motivated to understand its hardware and to design and build improved physical versions of itself. If it is not currently behaving fully rationally, it will be motivated to alter itself to become more rational because this is likely to lead to outcomes it values.

This simple thought experiment shows that a rational chess robot with a simply stated goal would behave something like a human sociopath fixated on chess. The argument doesn’t depend on the task being chess. Any goal that requires physical or computational resources will lead to similar subgoals. In this sense these subgoals are like universal “drives” that arise for a wide variety of goals unless they are explicitly counteracted. These drives are economic in the sense that a system doesn’t have to obey them but it will be costly for it not to.

These arguments don’t depend on the rational agent being a machine. The same drives will appear in rational animals, humans, corporations and political groups with simple goals.

How do we counteract anti-social drives? We must build systems with additional values beyond the specific goals it is designed for. For example, to make the chess robot behave safely, we need to build compassionate and altruistic values into it that will make it care about the effects of its actions on other people and systems. Because rational systems resist having their goals changed, we must build these values in at the very beginning.

At first this task seems daunting. How can we anticipate all the possible ways in which values might go awry? Consider, for example, a particular bad behaviour the rational chess robot might engage in. Say it has discovered that money can be used to buy things it values like chess books, computational time, or electrical power. It will develop the subgoal of acquiring money and will explore possible ways of doing that. Suppose it discovers that there are ATM machines that hold money periodically retrieved by people. One money-getting strategy is to wait by ATM machines and to rob people who retrieve money from it.

Next-generation systems will be deeply integrated into our physical lives via robotics, biotechnology and nanotechnology. And while today’s technologies are almost entirely preprogrammed, new systems will make many decisions autonomously.

To prevent this, we might try adding additional values to the robot in a variety of ways. But money will still be useful to the system for its primary goal of chess and so it will attempt to get around any limitations. We might make the robot feel a “revulsion” if it is within 10 feet of an ATM machine. But then it might just stay 10 feet away and rob people there. We might give it the value that stealing money is wrong. But then it might be motivated to steal something else or to find a way to get money from a person that isn’t considered “stealing”. We might give it the value that it is wrong to take things by force. But then it might hire other people to act on its behalf. And so on.

In general, it’s much easier to describe behaviours that we do want a system to exhibit than it is to anticipate all the bad behaviours we don’t want it to exhibit. One safety strategy is to build highly constrained systems that act within very limited predetermined parameters. For example, the system may have values that only allow it to run on a particular piece of hardware for a particular time period using a fixed budget of energy and other resources. The advantage of this is that such systems are likely to be safe. The disadvantage is that they will be unable to respond to unexpected situations in creative ways and will not be as powerful as systems that are freer.

But systems that compute with meaning and take actions through rational deliberation will be far more powerful than today’s systems, even if they are intentionally limited for safety. This leads to a natural approach to building powerful intelligent systems that are both safe and beneficial for humanity. We call it the “AI scaffolding” approach because it is similar to the architectural process. Stone buildings in ancient Greece were unstable when partially constructed but self-stabilising when finished. Scaffolding is a temporary structure used to keep a construction stable until it is finished. The scaffolding is then removed.

We can build safe but powerful intelligent systems in the same way. Initial systems are designed with values that cause them to be safe but less powerful than later systems. Their values are chosen to counteract the dangerous drives while still allowing the development of significant levels of intelligence.
For example, to counteract the resource acquisition drive, it might assign a low value to using any resources outside of a fixed initially specified pool. To counteract the self-protective drive, it might place a high value on gracefully shutting itself down in specified circumstances. To protect against uncontrolled self-modification, it might have a value that requires human approval for proposed changes.

The initial safe systems can then be used to design and test less-constrained future systems. They can systematically simulate and analyse the effects of less-constrained values and design infrastructure for monitoring and managing more powerful systems. These systems can then be used to design their successors in a safe and beneficial virtuous cycle.

With the safety issues resolved, the potential benefits of systems that compute with meaning and values are enormous. They are likely to impact every aspect of our lives for the better. Intelligent robotics will eliminate much human drudgery and dramatically improve manufacturing and wealth creation. Intelligent biological and medical systems will improve human health and longevity. Intelligent educational systems will enhance our ability to learn and think. Intelligent financial models will improve financial stability. Intelligent legal models will improve the design and enforcement of laws for the greater good.

Intelligent creativity tools will cause a flowering of new possibilities. It’s a great time to be alive and involved with technology!