Machine learning (ML) algorithms can already recognize patterns far better than the humans they’re working for. This allows them to generate predictions and make decisions in a variety of high-stakes situations. For example, electricians use IBM Watson’s predictive capabilities to anticipate clients’ needs; Uber’s self-driving system determines what route will get passengers to their destination the fastest; and Insilico Medicine leverages its drug discovery engine to identify avenues for new pharmaceuticals.
As data-driven learning systems continue to advance, it would be easy enough to define “success” according to technical improvements, such as increasing the amount of data algorithms can synthesize and, thereby, improving the efficacy of their pattern identifications. However, for ML systems to truly be successful, they need to understand human values. More to the point, they need to be able to weigh our competing desires and demands, understand what outcomes we value most, and act accordingly.
Science fiction is full of AIs that manifest the dark side of humanity, or are indifferent to humans altogether. Such possibilities cannot be ruled out, but nor is there any logical or empirical reason to consider them highly likely. I am among a large group of AI experts who see a strong potential for profoundly positive outcomes in the AI revolution currently underway.
We are facing a future with great uncertainty and tremendous promise, and the best we can do is to confront it with a combination of heart and mind, of common sense and rigorous science. In the realm of AI, what this means is, we need to do our best to guide the AI minds we are creating to embody the values we cherish: love, compassion, creativity, and respect.
The quest for beneficial AI has many dimensions, including its potential to reduce material scarcity and to help unlock the human capacity for love and compassion.
A large percentage of difficult issues in human society, many of which spill over into the AI domain, would be palliated significantly if material scarcity became less of a problem. Fortunately, AI has great potential to help here. AI is already increasing efficiency in nearly every industry.
In the next few decades, as nanotech and 3D printing continue to advance, AI-driven design will become a larger factor in the economy. Radical new tools like artificial enzymes built using Christian Schafmeister’s spiroligomer molecules, and designed using quantum physics-savvy AIs, will enable the creation of new materials and medicines.
For amazing advances like the intersection of AI and nanotech to lead toward broadly positive outcomes, however, the economic and political aspects of the AI industry may have to shift from the current status quo.
Currently, most AI development occurs under the aegis of military organizations or large corporations oriented heavily toward advertising and marketing. Put crudely, an awful lot of AI today is about “spying, brainwashing, or killing.” This is not really the ideal situation if we want our first true artificial general intelligences to be open-minded, warm-hearted, and beneficial.
Also, as the bulk of AI development now occurs in large for-profit organizations bound by law to pursue the maximization of shareholder value, we face a situation where AI tends to exacerbate global wealth inequality and class divisions. This has the potential to lead to various civilization-scale failure modes involving the intersection of geopolitics, AI, cyberterrorism, and so forth. Part of my motivation for founding the decentralized AI project SingularityNET was to create an alternative mode of dissemination and utilization of both narrow AI and AGI—one that operates in a self-organizing way, outside of the direct grip of conventional corporate and governmental structures.
In the end, though, I worry that radical material abundance and novel political and economic structures may fail to create a positive future, unless they are coupled with advances in consciousness and compassion. AGIs have the potential to be massively more ethical and compassionate than humans. But still, the odds of getting deeply beneficial AGIs seem higher if the humans creating them are fuller of compassion and positive consciousness—and can effectively pass these values on.
Transmitting Human Values
Brain-computer interfacing is another critical aspect of the quest for creating more positive AIs and more positive humans. As Elon Musk has put it, “If you can’t beat ’em, join’ em.” Joining is more fun than beating anyway. What better way to infuse AIs with human values than to connect them directly to human brains, and let them learn directly from the source (while providing humans with valuable enhancements)?
Millions of people recently heard Elon Musk discuss AI and BCI on the Joe Rogan podcast. Musk’s embrace of brain-computer interfacing is laudable, but he tends to dodge some of the tough issues—for instance, he does not emphasize the trade-off cyborgs will face between retaining human-ness and maximizing intelligence, joy, and creativity. To make this trade-off effectively, the AI portion of the cyborg will need to have a deep sense of human values.
Musk calls humanity the “biological boot loader” for AGI, but to me this colorful metaphor misses a key point—that we can seed the AGI we create with our values as an initial condition. This is one reason why it’s important that the first really powerful AGIs are created by decentralized networks, and not conventional corporate or military organizations. The decentralized software/hardware ecosystem, for all its quirks and flaws, has more potential to lead to human-computer cybernetic collective minds that are reasonable and benevolent.
BCI is still in its infancy, but a more immediate way of connecting people with AIs to infuse both with greater love and compassion is to leverage humanoid robotics technology. Toward this end, I conceived a project called Loving AI, focused on using highly expressive humanoid robots like the Hanson robot Sophia to lead people through meditations and other exercises oriented toward unlocking the human potential for love and compassion. My goals here were to explore the potential of AI and robots to have a positive impact on human consciousness, and to use this application to study and improve the OpenCog and SingularityNET tools used to control Sophia in these interactions.
The Loving AI project has now run two small sets of human trials, both with exciting and positive results. These have been small—dozens rather than hundreds of people—but have definitively proven the point. Put a person in a quiet room with a humanoid robot that can look them in the eye, mirror their facial expressions, recognize some of their emotions, and lead them through simple meditation, listening, and consciousness-oriented exercises…and quite a lot of the time, the result is a more relaxed person who has entered into a shifted state of consciousness, at least for a period of time.
In a certain percentage of cases, the interaction with the robot consciousness guide triggered a dramatic change of consciousness in the human subject—a deep meditative trance state, for instance. In most cases, the result was not so extreme, but statistically the positive effect was quite significant across all cases. Furthermore, a similar effect was found using an avatar simulation of the robot’s face on a tablet screen (together with a webcam for facial expression mirroring and recognition), but not with a purely auditory interaction.
The Loving AI experiments are not only about AI; they are about human-robot and human-avatar interaction, with AI as one significant aspect. The facial interaction with the robot or avatar is pushing “biological buttons” that trigger emotional reactions and prime the mind for changes of consciousness. However, this sort of body-mind interaction is arguably critical to human values and what it means to be human; it’s an important thing for robots and AIs to “get.”
Halting or pausing the advance of AI is not a viable possibility at this stage. Despite the risks, the potential economic and political benefits involved are clear and massive. The convergence of narrow AI toward AGI is also a near inevitability, because there are so many important applications where greater generality of intelligence will lead to greater practical functionality. The challenge is to make the outcome of this great civilization-level adventure as positive as possible.
In order to highlight the kinds of ethical decisions that our ML systems are already contending with, Kaj Sotala, a researcher in Finland working for the Foundational Research Institute, turns to traffic analysis and self-driving cars. Should a toll road be used in order to shave five minutes off the commute, or would it be better to take the longer route in order to save money?
Answering that question is not as easy as it may seem.
For example, Person A may prefer to take a toll road that costs five dollars if it will save five minutes, but they may not want to take the toll road if it costs them ten dollars. Person B, on the other hand, might always prefer taking the shortest route regardless of price, as they value their time above all else.
In this situation, Sotala notes that we are ultimately asking the ML system to determine what humans value more: Time or money. Consequently, what seems like a simple question about what road to take quickly becomes a complex analysis of competing values. “Someone might think, ‘Well, driving directions are just about efficiency. I’ll let the AI system tell me the best way of doing it.’ But another person might feel that there is some value in having a different approach,” he said.
While it’s true that ML systems have to weigh our values and make tradeoffs in all of their decisions, Sotala notes that this isn’t a problem at the present juncture. The tasks that the systems are dealing with are simple enough that researchers are able to manually enter the necessary value information. However, as AI agents increase in complexity, Sotala explains that they will need to be able to account for and weigh our values on their own.
Understanding Utility-Based Agents
When it comes to incorporating values, Sotala notes that the problem comes down to how intelligent agents make decisions. A thermostat, for example, is a type of reflex agent. It knows when to start heating a house because of a set, predetermined temperature — the thermostat turns the heating system on when it falls below a certain temperature and turns it off when it goes above a certain temperature. Goal-based agents, on the other hand, make decisions based on achieving specific goals. For example, an agent whose goal is to buy everything on a shopping list will continue its search until it has found every item.
Utility-based agents are a step above goal-based agents. They can deal with tradeoffs like the following: Getting milk is more important than getting new shoes today. However, I’m closer to the shoe store than the grocery store, and both stores are about to close. I’m more likely to get the shoes in time than the milk.” At each decision point, goal-based agents are presented with a number of options that they must choose from. Every option is associated with a specific “utility” or reward. To reach their goal, the agents follow the decision path that will maximize the total rewards.
From a technical standpoint, utility-based agents rely on “utility functions” to make decisions. These are formulas that the systems use to synthesize data, balance variables, and maximize rewards. Ultimately, the decision path that gives the most rewards is the one that the systems are taught to select in order to complete their tasks.
While these utility programs excel at finding patterns and responding to rewards, Sotala asserts that current utility-based agents assume a fixed set of priorities. As a result, these methods are insufficient when it comes to future AGI systems, which will be acting autonomously and so will need a more sophisticated understanding of when humans’ values change and shift.
For example, a person may always value taking the longer route to avoid a highway and save money, but not if they are having a heart attack and trying to get to an emergency room. How is an AI agent supposed to anticipate and understand when our values of time and money change? This issue is further complicated because, as Sotala points out, humans often value things independently of whether they have ongoing, tangible rewards. Sometimes humans even value things that may, in some respects, cause harm. Consider an adult who values privacy but whose doctor or therapist may need access to intimate and deeply personal information — information that may be lifesaving. Should the AI agent reveal the private information or not?
Ultimately, Sotala explains that utility-based agents are too simple and don’t get to the root of human behavior. “Utility functions describe behavior rather than the causes of behavior….they are more of a descriptive model, assuming we already know roughly what the person is choosing.” While a descriptive model might recognize that passengers prefer saving money, it won’t understand why, and so it won’t be able to anticipate or determine when other values override “saving money.”
At its core, Sotala emphasizes that the fundamental problem is ensuring that AI systems are able to uncover the models that govern our values. This will allow them to use these models to determine how to respond when confronted with new and unanticipated situations. As Sotala explains, “AIs will need to have models that allow them to roughly figure out our evaluations in totally novel situations, the kinds of value situations where humans might not have any idea in advance that such situations might show up.”
In some domains, AI systems have surprised humans by uncovering our models of the world without human input. As one early example, Sotala references research with “word embeddings” where an AI system was tasked with classifying sentences as valid or invalid. In order to complete this classification task, the system identified relationships between certain words. For example, as the AI agent noticed a male/female dimension to words, it created a relationship that allowed it to get from “king” to “queen” and vice versa.
Since then, there have been systems which have learned more complex models and associations. For example, OpenAI’s recent GPT-2 system has been trained to read some writing and then write the kind of text that might follow it. When given a prompt of “For today’s homework assignment, please describe the reasons for the US Civil War,” it writes something that resembles a high school essay about the US Civil War. When given a prompt of “Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry,” it writes what sounds like Lord of the Rings-inspired fanfiction, including names such as Aragorn, Gandalf, and Rivendell in its output.
Sotala notes that in both cases, the AI agent “made no attempt of learning like a human would, but it tried to carry out its task using whatever method worked, and it turned out that it constructed a representation pretty similar to how humans understand the world.”
There are obvious benefits to AI systems that are able to automatically learn better ways of representing data and, in so doing, develop models that correspond to humans’ values. When humans can’t determine how to map, and subsequently model, values, AI systems could identify patterns and create appropriate models by themselves. However, the opposite could also happen — an AI agent could construct something that seems like an accurate model of human associations and values but is, in reality, dangerously misaligned.
For instance, suppose an AI agent learns that humans want to be happy, and in an attempt to maximize human happiness, it hooks our brains up to computers that provide electrical stimuli that gives us feelings of constant joy. In this case, the system understands that humans value happiness, but it does not have an appropriate model of how happiness corresponds to other competing values like freedom. “In one sense, it’s making us happy and removing all suffering, but at the same time, people would feel that ‘no, that’s not what I meant when I said the AI should make us happy,’” Sotala noted.
Consequently, we can’t rely on an agent’s ability to uncover a pattern and create an accurate model of human values from this pattern. Researchers need to be able to model human values, and model them accurately, for AI systems.
Developing a Better Definition
Given our competing needs and preferences, it’s difficult to model the values of any one person. Combining and agreeing on values that apply universally to all humans, and then successfully modeling them for AI systems, seems like an impossible task. However, several solutions have been proposed, such as inverse reinforcement learning or attempting to extrapolate the future of humanity’s moral development. Yet, Sotala notes that these solutions fall short. As he articulated in a recent paper, “none of these proposals have yet offered a satisfactory definition of what exactly human values are, which is a serious shortcoming for any attempts to build an AI system that was intended to learn those values.”
In order to solve this problem, Sotala developed an alternative, preliminary definition of human values, one that might be used to design a value learning agent. In his paper, Sotala argues that values should be defined not as static concepts, but as variables that are considered separately and independently across a number of situations in which humans change, grow, and receive “rewards.”
Sotala asserts that our preferences may ultimately be better understood in terms of evolutionary theory and reinforcement learning. To justify this reasoning, he explains that, over the course of human history, people evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. Today, he notes that human still prefer those outcomes, even if they no longer maximize our fitness. In this respect, over time, we also learn to enjoy and desire mental states that seem likely to lead to high-reward states, even if they do not.
So instead of a particular value directly mapping onto a rewards, our preferences map onto our expectation of rewards.
Sotala claims that the definition is useful when attempting to program human values into machines, as value learning systems informed by this model of human psychology would understand that new experiences can change which states a person’s brain categorizes as “likely to lead to reward.” Summing Sotala’s work, the Machine Intelligence Research Institute outlined the benefits to this framing. “Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone,” they said.
This form of modeling values, Sotala admits, is not perfect. First, the paper is only a preliminary stab at defining human values, which still leaves a lot of details open for future research. Researchers still need to answer empirical questions related to things like how values evolve and change over time. And once all the empirical questions are answered, researchers need to contend with the philosophical questions that don’t have an objective answer, like how those values should be interpreted and how they should guide an AGI’s decision-making.
When addressing these philosophical questions, Sotala notes that the path forward may simply be to get as much of a consensus as possible. “I tend to feel that there isn’t really any true fact of which values are correct and what would be the correct way of combining them,” he explains. “Rather than trying to find an objectively correct way of doing this, we should strive to find a way that as many people as possible could agree on.”
Since publishing this paper, Sotala has been working on a different approach for modeling human values, one that is based on the premise of viewing humans as multiagent systems. This approach has been published as a series of Less Wrong articles. There is also a related, but separate, research agenda by Future of Humanity Institute’s Stuart Armstrong, which focuses on synthesizing human preferences into a more sophisticated utility function.