Back button

Ethical AI Whitepaper

September 1, 2022
Lady justice

This paper at-a-glance

  • Artificial intelligence is inscrutable, its predictions are often not intuitive to humans, it operates at immense scale and speed. AI now filters and guides many of our everyday experiences. We need to know what and how it thinks.
  • AI Governance is about building organizational capabilities to do this, to manage “machine employees.” This includes teaching machines about human values and ethics, governing how the AI operates “in the wild” and ensuring the right accountabilities are in place for when the AI makes mistakes.
  • AI redefines privacy. Privacy regimens are designed around inputs – what information we offer up to companies. AI forces us to think about outputs – what inferences a company makes about us.
  • AI design needs to be based on the most relevant and up-to-date science about humans. Influencing techniques need to be ethically considered or risk crossing the line and becoming manipulation.
  • Bias in data is now widely recognized. There are many tools available for detecting and mitigating bias and this process needs to be part of an AI Governance program.
  • Proxy discrimination in AI has the potential to happen without anyone being aware. Fairness tests can produce conflicting results. AI Governance needs to be vigilant to ensure that AI does not inflict harm. Fixing it after it breaks is not a good strategy.
  • AI Governance needs to be flexible and agile because human ethics and values are dynamic.

AI is Everywhere

Artificial intelligence touches our lives every day. For many people, AI conjures images of humanoid robots or The Terminator. But in reality, artificial intelligence is the algorithms that power, filter and guide the information we receive from the world.

AI is now an important filter between our minds and our experience of the world. We find our information through Google’s search algorithm or the algorithms of news providers. We find products we want to buy through Amazon’s recommendation system. We are influenced, and sometimes manipulated, by algorithms like Facebook’s that know that when a trusted friend says something we are likely to believe it and to share it with like-minded others.

In this way, AI is changing our future selves as it opens and closes options for us depending on what it predicts we will need, will want or will be good at.

These predictions and decisions impact people’s lives every day in multiple ways. Some decisions are more critical and more sensitive than others and should demand oversight by humans. But even those we think might be less important, or even frivolous, contain personal information about us that is searchable and connectable in ways we can’t imagine.

AI is Alien

One of the reasons AI is so valuable is that it is not human, it is indeed quite alien. AI works beyond the scale of human intelligence. An AI’s knowledge is non-intuitive and often inscrutable. An AI can find correlations and patterns that no human can. An AI can do this over billions of parameters. Often no human – or AI – can explain how or why it knows what it knows or did what it did.

This leaves humans with a great opportunity, and a great threat. The opportunity is to harness this alien new intelligence and make it work for us. The threat is that AI controls us or that AI is used for harm. It is not an exaggeration to say that the future of humanity will be defined by how we make ethical AI.

While business leaders may not think to engage in such existential considerations, there are important ethical choices embedded in any AI design. AI Governance is the assurance that all choices – macro and micro – are made with an ethical lens; with the values and aspirations of the organization as a guide.

What AI Governance is and isn’t

AI Governance is not about slowing down technology or technologists; quite the contrary. Yes, the brakes come on with oversight, but ask yourself: how slow would you have to drive your car if you didn’t have brakes? Very, very slow. Governance is about brakes. Brakes are about going fast then stopping safely when you need to.

AI Governance is about building the capability in an organization to manage “machine employees.” Machine employees are expected to do many things that human employees do – engage with customers, make decisions on who to hire, offer insight into what will make successful new products and markets. These machine employees are every bit as complex as the humans they replace or augment. These machine employees frequently make decisions and act autonomously based on data “in the wild” – leaving much of an AI’s behavior to what it experiences after its designer’s work is done. Small decisions early “in life” can have big implications later on.

AI Governance is about understanding everything that goes into making an AI. It’s having the knowledge to know what decisions are being made on the frontline that need to be visible. This becomes critical the more sophisticated the technology and the more complex and technocratic an organization becomes. Facebook executives were famously caught out when it came to light that junior engineers had used a classification system left over from a previous project that had defined news as stories involving “politics, crime or tragedy.”[i] This decision, made at a low level of the company, biased the newsfeed for millions of people, skewing content away from a host of other stories in categories such as health, sports, finance and science. The most troubling aspect of this is that it went undetected by managers for months. By the time it entered the consciousness of senior business leaders, the damage had been done.

This kind of error happens all the time in AI: it’s all too easy for technologists to make decisions about how algorithms work (rightly or wrongly, with good intentions or bad) and all too difficult for leaders to know how these decisions impact at scale. AI Governance is about knowing where decisions like these are made, how they are made and making sure they are made well.

Deploying AI without deploying AI governance in tandem is the same as driving without brakes. Eventually you’ll either crash or decide you’ll have to drive at one mile per hour. This whitepaper is designed to outline the key governance steps and give insight into the important components and dilemmas that AI developers need to grapple with as they develop AI for humans.

AI ethics needs to be responsive to changing ethical and cultural norms as well as transparent in what its ethics actually are “under the hood.”

Governance looks under the hood, establishing a process for oversight as well as design, engineering and mathematical rigor around important ethical decisions. Governance also enables ethics to be dynamic and responsive to how AI alters the world. It is how humans can retain ultimate control of the right and wrong uses of AI.

AI at Scale is Governed AI

It’s easy to get started in AI. All organizations have data and many have an experimental culture and support learning-by-doing. With the availability of open source AI tools, libraries and datasets for training as well as curious, skilled analysts and data scientists, it’s not difficult to pilot and demonstrate AI models for recommendation systems, chatbots and image recognition systems, for example.

The hard part comes with scaling. McKinsey[ii] has identified that the most important processes for scaling are governance related. For example, senior leaders need to understand AI, be engaged and take ownership over projects and initiatives, especially the CEO.

It may seem logical to delegate these concerns to data-science leaders and teams, since they are the experts when it comes to understanding how AI works. However, we are finding through our work that the CEO’s role is vital to the consistent delivery of responsible AI systems and that the CEO needs to have at least a strong working knowledge of AI development to ensure he or she is asking the right questions to prevent potential ethical issues.”

McKinsey’s advice to the CEO is three-pronged:

  • Clarify how values translate into AI
  • Provide guidance on definitions and metrics for bias and fairness
  • Advise on the hierarchy of company values

Accomplishing this is not straightforward. Underneath these objectives nest a plethora of ambiguous, context-dependent and mathematically mutually-exclusive decisions. Governance is about unpicking these, allowing some to be clear and inviable and others to be fuzzy and to rely on human judgment. Some ideas or decisions need to be tested on multiple levels, multiple times, and successful implementation requires answering the same question in different ways to different groups in the AI workflow. For example, evaluating fairness mathematically, from the perspective of the data scientist who has at their disposal many different mathematical definitions of fairness, is likely to be a separate, albeit complementary, process from how the marketing or the HR team may intuit the concept.

AI Governance spans the big decisions – “How might we improve our recruiting system to remove human bias and increase both fairness and our ability to predict top performers?” – to the details – “should we use group fairness or equalized odds as our fairness metric for different cohorts in our dataset?”

An AI governance system captures these, allows for anomalies and makes it easier for people who are ultimately responsible to know the breadth of their accountability. Only then can the flywheel of data-AI-growth start to spin.

How AI Redefines Privacy

We have been conditioned to think that the age of privacy is over, that our personal data is a free-for-all, long gone in the era of zero-priced internet services and apathetic digital natives.

Privacy is a fluid concept and AI is changing it. If anything, algorithms that operate at the scale, scope and speed of AI have brought privacy back to center stage.

Privacy regimens are based on principles of control, secrecy, choice and consent. Rules, disclosures and terms of service articulate a user’s control of inputs: their personal data and how it will be collected, stored and used. AI shifts the paradigm because what matters more are outputs; how we remain obscure, how we keep our autonomy and what inferences are made about us.

Obscurity is an important privacy enabler. When we see something that we expected to be private suddenly not be private, we experience an “obscurity lurch.”[iii]

AI makes this far more likely. We rely on limited searchability and clarity. We move about in public not expecting that images of us from security cameras or other sources of surveillance will be stitched together and remembered forever.

AI enables the tracking of our faces, our emotions, our health, our behaviors and our actions all the time, wherever we are. It makes our digital features searchable and predictive and ensures they are never forgotten. People don’t operate this way; people forget in order to remember. AI breaks our mental model of how obscurity should work in public.

An obscurity lurch may not even be public but still feel privacy-intrusive. An interactive video[iv] designed to showcase how popular social media apps can use facial emotion recognition technology to make decisions about your life and promote inequalities generates a downloadable scorecard that includes some tongue-in-cheek predictions about the viewer. Predictions include racial and gender bias. People can experience an obscurity lurch with themselves when the AI feeds them bias they didn’t realize was even in play.

Screenshot from showing AI predictions from facial recognition
Source:, Sonder Studio

AI systems magnify manipulation. A system that learns about us in a passive way can know more about our online preferences than we do.

Eye-tracking, mouse movements and keyboard strokes can hint at your state-of-mind. This means that our hesitations, doubts and frustrations are no longer private. This kind of data never used to be widely available, but now, even small companies have access to this capability is wide-spread. The startup, FullStory, provides such functionality to online retailers. This data is different because it’s passive, unintentional, unavoidable.

Screenshot from FullStory showing frustration signals
Source: FullStory

Personalization impedes autonomy because part of what’s private is our understanding of our own willpower. An AI can know more about us than we know ourselves, manipulate a cognitive bias or vulnerability and interfere with our ability to make decisions for ourselves.

An AI can take personal data about us and apply it in a different context, one we could never envisage. This leads to the phenomenon of “context collapse,” where an AI output changes because of something we never envisaged was valuable, relevant or, indeed, knowable. Suddenly our personal data reveals our doubts and hesitations or the biases that exist between us and others.

Here, another important mental model is broken – the intuitive link between what you know about yourself and what others infer about you.

Here’s how inferences are used. Netflix uses AI to make movie recommendations and to display different artwork depending on what it infers about the viewer.[v] In this example, different users are shown different imagery depending on what race the algorithm inferred them to be.

Yahoo News article showing screenshot of the movie art of Love Actually on Netflix
Source: Sandra Wachter

AI doesn’t understand causality. But it’s very good at understanding correlation, and hence at building inferences. Inferences about us are built by AI using information and data from others, much of it collected passively, unavoidable in the modern digital world. Inferences can be very predictive and very privacy invasive. And we do not know what they are, cannot have them explained to us, nor can we change them.

This means that privacy, transparency, explanation and ethics are now completely intertwined.

It gets worse. Machine learning takes data from the past and makes a prediction about the future. And this means that, while AI might be highly effective, it’s not a perfect predictor because the future isn’t here yet. Fundamentally, privacy underpins our right to develop our personality, to change, to make mistakes and to be the exception to the rule. AI breaks our mental model that our past should not define us.

Privacy must now include outputs and actions not just inputs and facts. This means that ethical AI should take account of people’s right to know how the AI sees them, to change these associations and to be free from nudging, in-group judgment or other autonomy-reducing practices. Ethical AI must eventually include designs that reduce the risk of information being used in different contexts. Ultimately ethical AI has to align with the human rights we subscribe to as a society.

We are more than the data we leave behind.” – Sandra Wachter, Oxford Internet Institute.

How Humans Behave with AI and with Each Other

When we get to know another person, it takes time to trust them. Trust doesn’t happen immediately, it’s a process as we develop our understanding for how they see the world – their values, motivations, vulnerabilities, delusions, desires, self-deceptions and likeability. Unless someone is in a position of direct authority over us, we generally operate on a spectrum of how much, and what kind of influence, they have over us.

AI can shortcut this process. Automation bias is a predictable cognitive bias where people over-trust a suggestion simply because it came from a machine.

The design of an AI experience is an ethical choice. We now know, for example, that inferring emotional states from someone’s facial expressions has no scientific justification.[vi] Facial movements are tied to immediate context rather than to an inner state. This means that it's vital to know someone's goal. For instance, if the goal of being angry is to overcome an obstacle, it may be more useful to scowl, smile or laugh, rather than furrow brows, widen eyes and press lips together, as represented in the picture below as one standard for anger.

Person identified as angry with facial recognition
Source: Emotional expressions reconsidered

Now that we know this, is it ethical to use emotion analysis AI, at least without specific design considerations? We would think not. At the very least, those who deploy this AI should be able to assure themselves of complete confidence in the veracity of the training data.

Designers could also consider having a feedback loop from user to AI, such as the avatar-based emotion AI used by New Zealand company, Soul Machines.

AI generated avatar from Soul Machines
Source: Soul Machines

New science about humans is a key ingredient in any AI governance program. Catching this information may not be someone’s job, yet it’s vital. Especially as more neuroscientists and psychologists are inspired by AI research and vice-versa.

The Relationship Between the AI and the Human

The design of the relationship between the human and the AI is a series of ethical choices, many highly context dependent.

Take tone, where the AI can be serious or humorous. You might think that a serious tone would require more ethical consideration. Certainly, an AI designed to intelligently manage a medical prediction would constitute a lower risk when assigned a serious tone compared with a humorous tone, but in a different context a frivolous AI could raise the stakes. This has been shown in surveys of student behavior, where simply changing the tone from serious to humorous makes a difference.

Compare this:

Basic school student survey
Source: Woodrow Hartzog, Sonder Studio

To this:

Same, basic school survey with the title, How BAD are U?!?
Source: Woodrow Hartzog, Sonder Studio

The frivolous tone in the second option, as well as other components in the design, resulted in students offering up potentially incriminating personal information at a rate 1.7 times higher, just because it was “fun.” AI amplifies this effect and exposes us to our own weaknesses.

AI elicits specific, predictable responses in humans, and the AI relationship is critical to getting the response intended. The emerging field of AI-powered behavioral nudging is a case-in-point. Nudging is now widely used and is often based on sensitive personal data, monitored in real-time. It could be the number of steps you’ve done today or how your tone of voice is affecting your conversation partner or whether you are on target for closing a sale. Many of these AI systems operate behind the scenes and we may not be consciously aware of them until they present us with a nudge. At that moment, we are drawn into the present and notice. An important psychological phenomenon called “psychological reactance” can come into play: people kick back when they feel they are being coerced. This means that, while nudging can be powerful, determining the details – opt-in versus opt-out, master versus servant, energetic versus passive – are vital design decisions that require constant review.

Another example is the way humans compete with machines. Loss aversion is a theory of behavioral economics that predicts that people won’t try as hard when their competitors are doing better. Turns out, the same thing happens when humans compete against a machine, finding themselves less competent against a fast, competitive robot. When people rate a machine as performing better, they also rated its competence higher, its likability lower and their own competence lower. This is an ethical as well as an economic decision. According to the experts, “while it may be tempting to design robots for optimal productivity, engineers and managers need to take into consideration how the robots’ performance may affect the human workers’ effort and attitudes toward the robot and even toward themselves.”

Ethical design and AI governance ensures that people are the ultimate users and beneficiaries of AI, especially when human psychology plays a role in the success of the human-machine system.

AI Has Exposed Human Bias, but is Yet to Help Us Fix It

What do you think are more pleasant: flowers or bees? Chances are, you’ll think flowers are more pleasant. Flowers smell nice, bees can sting you (although flowers can make us sneeze and we do like honey). This bias reflects our human experience rather than the experience of bees. This human experience creates bias in the data we have about the world because data is created by humans.

More relevantly, the data is biased towards those who have been digitized the longest: those who have had access to the internet and all the digital tools that connect us. AI then learns from the data from these populations and finds truth in this data; the most likely to be successful in business, politics or Hollywood (males with European-sounding names) or most likely to commit crimes (young black men).

But truth in a dataset isn’t truth. Bias will exist in every data set but, thankfully, there are well-established data science practices for detecting it, visualizing it and putting in place strategies for mitigating its effect. AI Governance goes a step further: choosing what truth you want in the data. For example, Googling for images of CEOs brings up pictures of CEOs. Aside from a small degree of variation based on the way the Google algorithm personalizes search, the first single-person image of a woman CEO is likely to be the fifth or sixth image. This represents what the Google algorithm thinks represents CEOs based on the data on the internet. This may or may not be an accurate representation of actual women CEOs in the offline world from around the world, in multiple industries or types of companies. For instance, in 2016, the first image of a female CEO in the Google search was displayed somewhere around 20th and was CEO Barbie. That’s right, a doll, not even a real person was the top female CEO according to Google (bottom right corner).

Image search results for "CEO" of all men with Barbie the doll as the last result
Source: BBC

The lesson here is not necessarily to ask, “what’s the right way to fix the data so that we have all the women CEOs,” it’s instead, “would we rather show that women can be CEOs?”

AI Governance is about non-technical people helping technical people make important, sometimes aspirational, choices. It’s about having a mechanism, an authority and an accountability for showing the world how you might want it to be, rather than how the data represents it.

Interestingly, a Google image search for CEO today presents a group of people. This is not an AI choice; this is an ethical choice, made by people, and shows what kind of person Google wants to present as a CEO.

Search results for "CEO" with both men and women of various races
Source: Google

Another major source of bias are the labels that AI uses to help it learn about the world initially. An art project brought this to people’s consciousness better than any scientific research project has. In 2019, Kate Crawford, AI researcher at Microsoft and one of the founders of AI Now, along with artist Trevor Paglen, undertook a project called “the archeology of datasets.”[vii] They took one of the largest, most used image datasets and studied the principles and values by which it was constructed. “By excavating the construction of these training sets and their underlying structures, many unquestioned assumptions are revealed. These assumptions inform the way AI systems work—and fail—to this day.”

They built a tool called ImageNetRoulette so that people could experiment with selfies and see how their own faces are labeled based on the taxonomy and classifications in the original dataset. It comes with a warning: “system often returns misogynistic, racist and cruel labels.”

Hell yeah.

Smiling man identified as "creep, weirdo, weirdie, spook"
Source: ImageNetRoulette, Sonder Studio

This is a fairly ordinary photo of a fairly ordinary white dude, yet, somewhere deep in the labelling of other people’s faces, there’s an association that means the AI thinks this guy is a “creep.” If this was an automated recruitment system, his application wouldn’t even make to a human for review. If this was a school security system, he might not be allowed in to pick up his child.

The ImageNetRoulette project daylighted how systemic bias from AI systems is now a permanent feature of our world. We don’t know the individuals who were paid pennies to label the thousands of images in this dataset. We don’t know every place this training data has been used. We don’t know how systems that have been trained on this dataset are being governed. We can no longer forensically trace the effect of these labeled images because many images from the original dataset have been deleted. It’s like removing DNA; there’s now a hole in our future ability to trace and understand where these labels may have infected models forever.

As training sets are increasingly part of our urban, legal, logistical, and commercial infrastructures, they have an important but underexamined role: the power to shape the world in their own images.” – Kate Crawford, Trevor Paglen authors, Excavating AI

AI is a Game-Changer for Discrimination

AI’s greatest strength is its ability to find hidden patterns in data. In order to protect against discrimination, anti-discrimination regimes are designed with the idea that the best way to stop discrimination against protected classes is to not allow information about membership in a protected class to be used. This means that, for example, to prevent discrimination against older workers, an employer cannot ask the applicant their age. Of course, people who are motivated to discriminate will find ways around these hard lines. Graduation dates are a giveaway for age.

AI concentrates this problem.[viii] By its very nature, if an AI is denied knowledge about a protected characteristic, it will simply find a less obvious proxy for it. Combine this capability with passive and unavoidable digital surveillance and the ability to discriminate (intentionally or unintentionally) is amplified.

Discrimination can have counter-intuitive results. Take a life insurer that uses AI to price its policies. Let’s say the insurers can connect an applicant with his/her browsing history, available through a data broker or perhaps an app downloaded on their phone. Let’s also say the individual applicant had visited the website of an organization that provides free testing for the BRCA mutation which is highly predictive of certain cancers. In this situation, if the insurer charges the applicant more, the insurer would almost certainly be proxy discriminating for genetic information - the AI would latch on to the link between the website visit and genetic history. However, if the AI had access to genetic history, then visiting the website would not be predictive of any applicant’s genetic disposition to cancer.

It is this weaving and building of relationships among suspect classifiers, neutral variables and desired outcomes that makes AI so good at what it does and so dangerous in this context. All sorts of new big data streams can now proxy for things rather than the traditional proxies that were easier to identify. Instead of headgear, hairstyles, height and weight, proxies can theoretically be such things as Netflix viewing preferences and social media posts.

An AI governance program should look for the signs of proxy discrimination; protected classes, new data streams, unintended consequences.

Author of Weapons of Math Destruction,[ix] Cathy O’Neil, worries that many data scientists are unwittingly creating discriminatory models. “When you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent.” She calls for a data scientist’s “Hippocratic Oath,” an equivalent of the oath that newly graduated medical students take, and a version of “do no harm.” It could include, for example, “I will consider the privacy, dignity and fair treatment of individuals when selecting the data permissible in a given application, putting those considerations above the model’s performance.”

There are troubling signs that AI has allowed discrimination to become a math-justified slippery slope of exclusion based on proxies, strategic pricing or other algorithmic outcomes.

On the positive side, lending discrimination overall is has been on a steady decline, according to research from UC Berkeley Haas School of Business. This is because face-to-face loan application meetings where people make decisions about other people – are a significant source of racial bias. The rise of fintech and algorithmic lending has resulted in less discrimination overall by eliminating a big source of bias: human prejudice.

When well-designed and governed, we can’t afford to not use AI.

Fairness Turns Out to Be Very Complicated

Fairness is one of the most complicated assessments in AI. There are multiple mathematical definitions of fairness, not to mention individuals’ intuitive judgments. AI governance has to take the full spectrum into account because a person’s perception of fairness matters as much as the fairness metric itself.

The simplest, most common mathematical representations of fairness are:

Table showing several mathematical methods of calculating fairness
Source: Sonder Studio

Wow! Simple, right? Not really.

First, it’s impossible to fulfill all five of these at the same time, it just doesn’t mathematically work. Second, different people can use different fairness metrics. This is known as the Impossibility Theorem of Fairness. This mutual-exclusivity in metrics can lead to problems and confusion because people end up having different ideas of what’s fair.

Perhaps the most public dispute over fairness in AI came out after ProPublica published its article[x] on machine bias in predicting recidivism. The subtitle of the article was, “There’s software used across the country to predict future criminals. And it’s biased against blacks.”

ProPublica's article documented the "significant racial disparities" found in COMPAS, a recidivism prediction model sold by NorthPointe. In their response, Northpointe disputed ProPublica's claims. What ultimately came out was that they each had different ideas about what constituted fairness. Northpointe used Conditional Use Accuracy Equality, while ProPublica used Treatment Equality. It is not possible to satisfy both definitions of fairness, given that different populations have different base rates of recidivism. Different base rates of recidivism do not mean that certain individuals are more likely to re-offend simply based on their race. In reality, the likelihood of reoffending is far more contextual - due to unequal treatment and circumstances from past and present biases.

To make it even more complex, the way fairness is explained to people impacts their judgment of fairness.[xi] Certain explanations are considered inherently less fair. There are pros and cons in providing details – more details give more confidence in the way the AI works but can also expose more information to scrutiny and is therefore subject to critique about different standards of fairness.

Fairness perception also depends on people’s pre-existing biases or prejudices. While some people consider it to be fair to “compare the actions of people with similar history and backgrounds,” others will question an underlying rationale such as “is anyone really identical if more things are considered?”

Clearly this issue of fairness is far more nuanced than just math.

AI Needs to Explain Itself Better, But Until it Can, Humans Have to Make Tradeoffs

As a general rule-of-thumb, there’s an inverse relationship between predictive power and interpretability.[xii] This means that, if explainability and transparency are required, it may be necessary to trade-off some predictive power to achieve trust. This decision may be difficult to make – the value of the predictive capability could be very high. But it could also be worthless if people don’t trust it. There are also circumstances where people over-trust the prediction. In these circumstances, it’s vital to include designs that show users the AI confidence level in a way that a user understands – numerical, categorical or in a visualization.

Learning Techniques today related to notional explainability
Source: US Department of Defense, DARPA

If AI was to be regulated, explainability is the one thing that everyone agrees upon! In a recent survey by PWC[xiii] on AI regulatory options, explainability and transparency is the only area that all jurisdictions proposed regulating.

Regulatory developments table
Source: Jacob Turner

In an AI governance process, the degree of explainability required is a key decision and is one that potentially requires a judgment call upfront and iteration to get right. It’s also important to have humans responsible and able to provide more transparency.

It Won’t Work to Blame the AI

If a machine screws up, is it the person or the machine who’s at fault?

Here’s an example of how a human-machine system can fail.

Let’s sort these photos of dogs into two categories:

6 photos of dogs
Source: Sonder Studio

An AI might classify these photos by seeing snow and not snow:

Same six photos of dogs sorted into whether the dog is on snow or not on snow
Source: Sonder Studio

This is very reasonable classification and one that is intuitive to us. It’s also possible that it isn’t what the human had in mind. What if the human was instead looking for pictures of their own dog as opposed to all other dogs

Same six photos of dogs sorted into "my dog" and "not my dog?
Source: Sonder Studio

Great! An AI might also be able to sort the images into the human’s own dog and other dogs.

Except, has it really? Each image of “my dog” actually contains a shoe. It’s not easy to notice the shoe, especially for the human who owns the dog, because of a natural human bias – confirmation bias, where we look for information that confirms our existing beliefs. For most humans, our confirmation bias favors seeing the dog rather than the less prominent shoe. This is especially true for the owner of the dog (which in this case, also happens to be the designer of the AI).

Cassie Kozyrkov, Google’s Chief Decision Scientist, runs a version of this experiment[xiv] with staff at Google. (Her experiment is a radiator classifier posing as a cat classifier rather than a shoe classifier posing as a dog classifier). Most people miss it.

The point is, if an AI isn’t tested or if an AI is built for one task then blindly applied to another, then it’s the human’s fault. We can’t expect an AI to think like us and we can’t expect humans to intuitively spot mistakes.

AI governance is there to make sure human fallibility and machine fallibility don’t combine. Governance is there to make people aware that, in the end, they are accountable for machine decisions.

Is AI an Existential Threat?

In this whitepaper, we’ve discussed many reasons to be concerned about AI. But does AI pose a larger threat? Could AI mean the end of the human race?

The answer is yes, it could.

The scholarly work on AI failure modes[xv] and existential threat is vast and diverse, an active area of research in fields as varied as computer science to philosophy.

Whether AI could cause irreparable harm to humanity boils down to three core problems: control, abuse and overreliance.

How humans control a superintelligent agent and ensure it aids us as its creators is often called the control problem. At its core, the control problem recognizes that a superintelligent agent will have super-human abilities and, unless explicitly programmed otherwise, may achieve its goals in ways that are not consistent with human values.

There are many complex mathematical and philosophical arguments that logically support the possibility of scenarios where a superintelligent AI removes resources from humans simply by going about the business of achieving its goal, however mundane the goal.

This is a consequence of control theory and optimization. A superintelligence maximizing its own objective would be smart enough to derive plans with side-effects that go against our interests.

This is no longer theoretical. In new research from OpenAI,[xvi] researchers were surprised to find that agents invented tools and emergent competitive strategies that 1) weren’t at all expected and 2) shouldn’t have been possible.

In this research, there are no direct incentives created by human AI designers. Everything interesting happens solely a result of the competing agents continually creating new tasks for each other.

Here’s what they do:

  • the hiders learn to use the tools at their disposal and intentionally modify their environment. They begin to construct secure shelters in which to hide by moving many boxes together or against walls and locking them in place.
  • the seekers also learn rudimentary tool use; they learn to move and use ramps to jump over obstacles, allowing them to enter the hiders’ shelter.
  • the hiders then learn to defend against this strategy; they bring the ramps to the edge of the play area and lock them in place, seemingly removing the only tool the seekers have at their disposal.
  • the seekers learn to bring a box to the edge of the play area where the hiders have locked the ramps. The seekers then jump on top of the box and surf **it to the hiders’ shelter.
  • the hiders learn to lock all of the boxes in place before building their shelter.

Virtual world with avatars playing hide and seek
Source: OpenAI

Leading researcher and AI guru, Stuart Russell, professor at University of California at Berkeley, believes it’s possible to make progress on the control problem by introducing explicit uncertainty in the objective. This has the effect of “keeping the machines guessing,” by designing in a mathematical way to ensure a machine can never be quite sure that it knows what a human wants.

Russell advocates for design of “provably beneficial machines” precisely because superintelligent machines are going to be really good at finding loopholes.

Abuse and misuse is another important threat. This is a wide topic – ranging from bad actors and autonomous weapons to unethical surveillance and professional standards (or lack of them).

But perhaps the most frightening threat from AI is the one that sneaks up on us: overreliance and overuse. As Russell says, human civilization has been built up over thousands of years by using our minds, evaluating choices and exercising our autonomy.

This is changing as we gradually turn over more and more knowledge about, and management of, the world to AI.

We used to memorize phone numbers and facts. We used to read maps and navigate with an inner sense of geography. These are two things that we’ve largely outsourced to machines. And as we continue to do this, we switch from master to servant. This is almost irreversible because when we lose the incentive to do something, we usually don’t have the incentive to keep the skill.

AI is a step change in the need to keep knowledge in our head. We are taking the lazy route and likely can’t go back.

Red light with phrase "I'm sorry, Dave, I'm afraid I can't do that"
Source: Stuart Russell

Our Process – Governance by Design

At Sonder Stud we have designed a broad yet comprehensive AI governance process.

  • Stage 1: Learn and set scope
  • Stage 2: Ethical AI design standards
  • Stage 3: AI project risk assessment
  • Stage 4: Implementation

In the first stage, we help level set; educating and training everyone. We define current company values in an AI-relevant way help teams set accountabilities at the right level in the company.

The second stage is setting ethical design standards that will guide development. Design standards include:

  • User need, why AI?
  • Data collection
  • Optimizing algorithms
  • Fairness
  • Mental model design
  • Explainability and transparency
  • Trust, context and intent
  • Feedback and control
  • Error testing and design for failure

The third stage is to assess risk. With more than 140 criteria across 9 risk categories, our risk screen runs vertically from high-level decisions to the details of AI development.

  • Management
  • Team
  • Design
  • Data, Analysis, Build, Operate
  • Explainability
  • Fairness and Inclusion
  • Safety
  • Compliance & Competition
  • Accountability
  • Safety

We’ve wrapped this up into a flexible and customizable tool for evaluating AI risks and setting key aspects of governance review such as standards and thresholds.

The fourth phase concerns learning: both of humans and machines. Our implementation stage embeds AI governance design across the organization.

  • Risk (re)assessment process
  • Data management, archiving & traceability
  • Exception management
  • Dashboard and reporting
  • Failure and crisis management
  • Learning cycle review

As AI diffuses through our entire economy and across all societies, there are few areas of corporate governance that are more important than governance for AI.


[i] 15 Months of Fresh Hell Inside Facebook | WIRED

[ii] McKinsey, Leading your organization to responsible AI

[iii] Privacy’s Blueprint, Woodrow Hartzog


[v] Right to reasonable inferences, Sandra Wachter, University of Oxford, video

[vi] Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements

Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella,

Aleix M. Martinez, Seth D. Pollak

[vii] Kate Crawford and Trevor Paglen, “Excavating AI: The Politics of Training Sets for Machine Learning (September 19, 2019)

[viii] Prince, Anya and Schwarcz, Daniel B., Proxy Discrimination in the Age of Artificial Intelligence and Big Data (August 5, 2019). Iowa Law Review, Forthcoming. Available at SSRN:

[ix] Weapons of Math Destruction, Cathy O’Neil 

[x] Machine Bias - ProPublica

[xi] Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment


[xiii] Jacob Turner, Robot Rules


[xv] Stuart Russell, interview

[xvi] OpenAi, emergent tool use

[xvii] Future of Life Institute