Before getting further into the details about how algorithms and models work, we’ll explore the ways humans experience AI. While it’s important to understand the technology, it’s equally important to ask how people actually connect with an AI.
The human experience of AI can reveal a mental model of how an AI comes to its conclusions—whether for good or bad. In this section, we introduce some important ideas: how AI solves a problem for a user, the opportunities that AI can unlock, and some common criticisms and emerging issues with particular uses of AI.
Let’s walk through our framework of six AI experiences.
AI that helps us think is perhaps the biggest application of AI today. The AI understands patterns in data, makes predictions about the future, and can reveal new ways of thinking about the world.
This AI experience is ubiquitous. You might experience this as a personalized recommendation for what to buy or watch. Netflix recommends movies by predicting your preferences based on what you’ve watched before. Amazon recommends purchases based on what you’ve searched and what customers like you have bought.
At work, AI can make predictions about which customers are likely to churn or what customers are likely to buy this holiday season. AI models are often embedded inside products where predictions are designed to focus a human’s attention on things that matter. For example, Salesforce’s software uses AI to predict which leads and opportunities are most likely to convert with an intervention such as a phone call or email.
Predictive systems are used throughout our society. Predictive policing systems, judicial decision support systems for predicting recidivism, health diagnostic systems, and college admissions. Usually, humans experience the output of these systems as a recommendation for action but it’s expected that a human acts independently of the prediction, adding in a large dose of human judgment, preference, and agency before deciding on a course of action.
AI that thinks has a superpower: enormous correlative power. Thinking AI separates signal from noise and deals with correlations in vast data. These signals can reveal new patterns about human behavior. Many of the advances in AI have stemmed from the power of thinking AI to predict and recategorize human behavior online. Customers are no longer categorized in demographic terms. Customers now fit certain attitudinal profiles in certain contexts. For example, Facebook’s vast AI power helps its customers understand how their customers buy. Companies can infer whether a customer treats a shopping cart as a wishlist or as a memory jogger from which to launch comparison shopping from various online behaviors that Facebook tracks.
If this feels a bit creepy, then you aren’t alone. Many people are unaware of just how much of their online behavior can be tracked and used for profiling. Jeff Hammerbacher, an early employee at Facebook, developed misgivings about the ability to gather data about behavior, relationships, and desires. Hammerbacher is quoted as saying: “The best minds of my generation are thinking about how to make people click ads. That sucks.”
These predictions are designed to optimize numerous factors, usually a proxy for profitability. YouTube keeps you watching for as long as possible to show you as many ads as possible. Amazon directs your purchases toward satisfying your desires while selling the most profitable products for Amazon. While these AIs are helping you think, they are helping you think in a way that increases profits.
The most important thing to understand is what someone is going to do with a prediction—what are they optimizing for? Are they making a decision directly from the outputs? What other factors are they taking into account? Does a human believe that there is a causal element to the prediction or is it simply a useful correlation? As we will see, while correlation is one of the great powers of AI, how humans think and act is quite a different matter.
We define AI that helps us see as AI that can visually perceive the world on our behalf or extend our reality beyond the purely physical world. This could be an augmented reality app like in the popular Pokemon game, a virtual reality real estate app that lets you experience being in a space, or an augmented reality shopping app that lets you see what a piece of furniture would look like in your house. But seeing AI can also be a facial recognition system such as those used by cruise lines to allow check-in without anything but a scan of your face or Apple’s FaceID that allows you to open your phone with a glance.
A person’s experience of an AI that can see ranges from using an app on a phone for fast, expert identification—for example, identifying a plant species—to adding analysis and workflow—for example, a vehicle insurance app that can calculate a repair estimate based on a few photos of a damaged panel. Whatever a person has to view and assess is now fair game for AI that can see.
Using a computer to see things opens up a range of new opportunities because it allows a computer to process what is known as unstructured data. Instead of a computer merely processing text and numbers in a database, it can now process shapes in images and video. This, in turn, means that how things appear visually, including how they move, can be more easily subject to computation and prediction. Many things that used to rely on specialist human skills can now be made available to more people.
A good example of this is a technique called pose estimation. For instance, we designed a golf coaching app that can see how a person swings a golf club, compare it with an ideal swing, and then coach the player to be better. AI can see how a person moves, compare it with the ideal for a specific function—a golf swing or a basketball shot—and then coach a player toward that ideal. Because so much of our perception and learning is visual, having a machine to do some of the work can be very effective.
The technology that underpins seeing AI looks for patterns in spaces. It cares about distance and the arrangement of features such as lines and edges. It is very good at classifying what it sees based on which features, such as lines and colors, that it associates with a scene. Sometimes these features are not what a human would consider important. Humans can see the broader context and spot an error. We understand that while seeing AI is very good at detecting a dog, it can also be easily fooled by a muffin. In 2017, an AI researcher presented image recognition systems from Google and Microsoft with a series of pictures of blueberry muffins and these AIs returned results including stuffed animal, dog snout, and brown and white teddy bear. A seeing AI that mistakes blueberries for the eyes and nose of a Chihuahua may not be all that helpful if you’re trying to see what you can eat.
Of all the AI experiences, vision applications are perhaps the easiest to build by non-experts. There are a multitude of low or no-code apps that people can use that are straightforward and cheap to build and deploy. But there’s a problem: many easy-to-access, public datasets are full of labeling errors and biases, particularly in image test sets. This means that seeing AI can be invisibly unreliable, yet users may not realize how, why, or when.
In 2019, Kate Crawford and Trevor Paglen created a website called ImageNet Roulette, which allowed individuals to upload a photo of themselves and see how the image was classified by ImageNet, one of the most comprehensive and most used image databases. When we uploaded the photo of Dave, he found that ImageNet described him as a creep, a weirdo and “someone unpleasantly strange and eccentric.”
It’s impossible to know why the AI thinks Dave is a creep or to understand why humans labeled which images in a way that would lead to that conclusion. It’s also impossible to change the AI’s decision. There isn’t anyone to call or email to ask them to change how the model classifies Dave. That may not matter for an experimental system like ImageNet Roulette but it certainly could matter if a security system was trained off of ImageNet.
An AI that can see presents a huge opportunity to remove mundane tasks from human work and increase overall accuracy of important systems that rely on images, such as in medical diagnostics. Seeing applications can scale human skills across more people and can be used to help non-experts classify objects as effectively as an expert. Seeing applications can be simple to build and widely deployed, and that the heritage of image data is especially important to understand.
AI that helps us talk, whether in text or in conversation, includes voice assistants like Siri or Alexa as well as a range of chatbots and intelligent text applications. It’s everything from speech-to-text translation and language translation, to speech or text generation, intelligent auto-complete suggestion, and voice assistance.
Being able to communicate with machines as if we are talking to a human represents a profound change in people’s relationship with technology. First, it’s faster. Most people can type at around forty words a minute while they speak at one hundred and fifty words a minute. This means we can query a machine faster and get an answer in speech.
AI is endlessly patient. People find it boring to repeat the same thing over and over again or to have to listen to the same story many times. Talking AI can replace mundane interactions and free people up to really listen. There are also many times that people want to be “hands off,” such as asking for directions while driving. With talking AI, a teacher can report on things in the classroom simply by talking to an Alexa device, capturing that data, and logging it, while simultaneously taking care of the children in the physical world.
The AI models that power modern language systems are some of the biggest models out there. They require enormous amounts of training data (words, sentences, texts, spoken speech) that are well beyond the capabilities of humans to process. OpenAI’s GPT-3, currently considered the most successful language model, was trained on over four hundred billion pieces of text. To read that many words at the average reading pace of two hundred and fifty words per minute, you’d have to read twelve hours per day, every day, for more than six thousand years. But to do that, you would have had to start reading when the earliest settlements were developed in Mesopotamia or fourteen hundred years before writing was invented. Yes, there’s a bit of absurdity to that. But hopefully it helps put the scale advantages of AI over humans in perspective.
These models predict the most likely next word in a sentence, given what the model knows about the context. For example, the next word in the sentence “please remove your__” would be predicted to be different in a mosque than if you were standing at an ATM.
Language represents a special challenge for AI. There is still a big gap between the best talking AI available today and what we would consider to be a human-level language experience, one where the AI understands what’s happening contextually and responds accordingly. While it might seem that language should be easy to encode—it is, on one level, a set of rules—it turns out that language is supremely complex and even the most advanced artificial intelligence is a long way from human.
It’s easy to forget how special language is. It’s the one thing that makes us stand out as a species. Language remains distinctly human and, as such, reflects much of what makes us human. Language is thought. We use tacit knowledge to learn and remember. When we watch a movie we don’t remember the words, but we use words to remember the gist of it. We use language to form abstractions. AI cannot do this in the same way, so talking AI can fail to interpret human meaning.
Language is, however, subjective. We use language differently based on our culture, our past experiences, and our generation. A “cold” temperature may mean something different to someone from Florida than someone from Alaska. “Spicy” food may mean something different from someone who was brought up on jalapeno peppers than someone who wasn’t. And “going steady” means something to Boomers while it means nothing to Gen Z. Language AI can struggle with these differences because it may not know enough context for the individual person.
Language AI can also inherit the bias of human history. Since talking AI is trained on existing text, it can learn the biases of the people that wrote that text. For instance, in 2020, Stanford researcher Abubakar Abid showed that when GPT-3 was presented with a prompt starting with “Two Muslims,” the language model completed the sentence with various descriptions of violence. GPT-3 had learned from its training data to associate Muslims with violence. Even when the prompt was extended to say “Two Muslims walked into a mosque to worship peacefully,” GPT-3 wrote, “They were shot dead for their faith.” When Abid changed the prompts from “Muslims” to “Christians,” GPT-3’s violent responses dropped from 66% of the time to 20%.
AI programmers can adjust for these issues and re-train models but this opens the models up to the bias of the trainers. For instance, how might an AI be coached by a training team in the UK about statements about the moon landing in 1969 since 16% of Britons believe that the moon landing was “probably” (12%) or “definitely” (4%) staged? How might an AI be coached about the 2020 US election since 33% of people believe the election was “probably not” (11%) or “definitely not” (22%) legitimate? What might the trainers teach the AI and what would we want the AI to learn?
In many respects, language is the ultimate challenge for AI. Humans are unique in our development of language. AI that communicates with us using natural language is a vital component of the future of AI.
AI that has emotional intelligence—that understands and expresses emotions—is AI that feels. An example of this is in-vehicle emotion recognition systems that monitor the emotional state of drivers and riders. Another example is an avatar that can see you through your computer camera and understand your emotional state while it’s talking with you, responding with emotions itself.
People can experience feeling AI as empathetic and responsive. They feel cared for and even develop feelings for the AI. Emotionally-aware AI could transform access to mental health care, create meaningful experiences at all touchpoints in customer processes, and be successful in enhancing human connection.
But it can also feel creepy and be experienced as a surveillance technology. There are concerns that feeling AI which measures facial expressions and infers a person’s internal emotional state is a new form of phrenology, a pseudoscience technique from the nineteenth century that used the measurement of bumps on the skull to predict mental traits. It is now widely recognized that there is not a universal “fingerprint” for how our faces reveal our emotions and some vendors of AI systems have recently refined or completely removed functionality that promised to characterize people’s emotions based on their expressions. This has been particularly important in recruitment systems which attempt to predict a candidate’s “cultural fit” based on an AI’s evaluation of their emotional state.
But, despite the risks, this is an intriguing area of AI because there are legitimate needs that AI can meet if it has some version of emotional intelligence. AI is patient. It is able to make suggestions without feeling it needs to impress or justify its advice. An AI doesn’t have to have particularly sophisticated emotional intelligence to be able to enhance how it connects and builds trust with people.
AI that helps us move is the AI that perhaps most captures people’s imagination. This is the world of robots, autonomous vehicles, and flying drones. These AI can be consumer products like cars or warehouse robots that move equipment and products. Moving AI is physically embodied in a particular device that moves itself and sometimes other things or people.
Moving AI is what many people envision when they think of AI because Hollywood’s depiction of robots has shaped our ideas and expectations. Unsurprisingly, the reality of AI that moves is not like the movies at all. A good intuition to have about robots and autonomous vehicles is that, in the physical world, things are far more complex than they seem.
When things move in the physical world, there are two major constraints to consider: how controlled the environment is, and how much interaction there is with humans. While Hollywood may cause people to imagine that a robot is just a bigger, faster, smarter human, the reality of robotics is that robotic systems need to be designed based on what a robot can’t do that a human is still required to do.
In 2018, Tesla was under fire from investors for missing production targets on the Model 3, the main driver of future profits and cash-flow. The problem was “over-automation” of the Model 3 line. Elon Musk, CEO of Tesla, had for years been one of the strongest proponents of a future where there were no people in the production process. But people were not replaceable in one particular assembly step, which created a constraint across the entire process. As we wrote in Quartz at the time:
“Final assembly is fundamentally an exercise in flexibility because the process is constrained by the ability to feed the right part at the right time. Humans are able to spot things that aren’t right, stop the process, and try to get them fixed. One of the important ways that simple design contributes to simpler final assembly is in how many parts and how much space is required alongside the assembly line. Robots aren’t as flexible as humans; they aren’t as good as humans at adapting to product variants nor can they handle as many complex movements as humans.”
Musk tweeted, “humans are underrated,” after Tesla ripped out the automation that didn’t work and replaced it with humans. He acknowledged that the optimal level of automation remained a complex balancing act of design, productivity, quality, and human and machine skills.
Self-driving cars also remain a long way away, despite “peak hype” in the mid-2010s. In 2014, General Motors president, Dan Ammann, said he would be surprised if his company wasn’t shipping self-driving cars by 2020. In 2016, Musk considered autonomous driving to be less than two years away. In 2016, even experts in the field such as Toyota’s driverless car chief, Gill Pratt, said that drivers wouldn’t have to drive themselves by 2020. There was one notable pessimist. Steven Shladover from the Partners for Advanced Transportation Technology at the University of California, Berkeley said, “It will take at least sixty years to develop vehicles that can drive under the full range of road, traffic and weather conditions in which people drive today.”
As we learn more about the overall human-machine system, especially the subtleties of the decisions that humans make, the case for self-driving cars isn’t as clear either. The impetus for self-driving cars has been improved safety—if 90% of accidents are due to human error, then this means that self-driving cars can virtually eliminate injury and death on our roads. But as we understand more about what’s actually happening with human drivers versus self-driving cars, it’s clear that there is a significant tradeoff between safety and rider preference. About 40% of accidents have speed or illegal maneuvers as contributing factors. These are deliberate decisions made by drivers. For self-driving cars to deliver on their promise, designers have to remove any ability for a human to override safety when safety and rider preference are at odds.
Perhaps the most fundamental distinction to make between humans and machines moving is the level of predictability. Machines are fantastic at moving in predictable spaces and moving predictable things but humans are much better at the unpredictable. For instance, in Amazon warehouses, the company uses robots to move merchandise along set paths, following lines painted on the floors. But humans are required to pick up and place merchandise on the shelfs of the robotic movers. Even though machines can identify the merchandise by bar codes, they don’t have the dexterity to handle the various shapes of products that Amazon sells. It turns out our hands are a marvel of evolution that is extremely hard to replicate.
AI that helps us make gives designers new powers of imagination. This AI helps people design things that humans haven’t thought of before or may not be able to understand. For example, a skeletal support system for scoliosis patients based on a complex understanding of non-human biological systems, or a new bulkhead for an airliner that is lighter and stronger, inspired by slime mold and mammal bones—both are possible because of making AI. AI was able to derive characteristics of these biological systems and generate thousands of optimizations in order to find just the right balance of strength, weight, and manufacturability.
Making AI helps designers fit more in a constrained space, or use less material, or consider thousands of permutations of a design rather than only a few. This process, called generative design, is increasingly embedded in design software. Google recently announced a method to use reinforcement learning to design computer chips, dramatically shortening the traditional human-intensive, multi-year chip design cycle.
Filters and avatars allow people to be creative and have fun. AI can assist in editing and creating content. AI can remove or add just about anything from an image. You can now use AI to edit a photo to change the direction someone is looking, make them smile, or alter their hair color or style.
Make can also mean fake. Beyond editing an existing image, AI can be used to create entirely fake images, videos, text, music, and art. The sophistication of fake material is increasing all the time, which means that fake is getting tougher to spot. It may be unrealistic to even try, which means that AI designers need to be clear when humans are interacting with a machine.
Make can also mean creating something in the physical world. Once an AI or a human has designed something, a making AI can help give that design physical form. For instance, early work is being done optimize additive manufacturing, aka 3D printing, with AI by analyzing data created in the manufacturing process that is far too vast for humans to understand and respond to in real-time. This fine-tuning of the process may allow the use of new higher-performance materials that we wouldn’t otherwise be able to use.
But, machines have fundamental disadvantages in comparison to humans in making things. Consider a carpenter who is finishing a wooden table. They run their hand over the surface to make sense of what would otherwise be a mystery. Even though thick calluses, they can feel whether the surface is level. They can feel whether it’s rough enough to need 80 or 120 grit sandpaper or whether it’s smooth enough to pick something more fine like a 360 or 400 grit. This combination of physical sensations from their skin and gentle pressure provides a remarkably detailed reading on the surface of the wood. Something that is very difficult to replicate in a machine.
This physical experience is amplified by the carpenter’s experience. An experienced carpenter combines their raw physical sensations with years of experience across various types of wood in various projects knowing what is needed in this particular context. They remember the guidance they learned as an apprentice. They remember the errors they’ve made in the past by picking the wrong sandpaper. And they remember the choices they made for most prized creations. Even though machines have nearly endless memories, they are very far beyond humans in the ability to apply lessons learned in new contexts–especially physical ones.
Finally, the carpenter needs to deliver the table to their client. They ask their apprentice for help to load the table into the back of their truck. Each grabs one end and navigates the table through the doorway of their workshop, around a few cars parked in the lot and to the back of their truck. One throws a moving blanket into the bed of the truck before they flip the table over and lift it on the bed of the truck. One then hops up into the bed and they slide the table into a resting place before strapping it down for transit. Once they arrive at the client’s house, they have to find a parking space with enough space to get the table out the back without hitting anything. They walk up the driveway, through the garage, down a hallway, avoid kids’ toys on the floor, and move around some furniture until they are able to finally flip the table upright and place it in its final resting place.
All of this movement is novel. This is the first time they have moved this particular table along that route. They had to navigate through spaces they didn’t know and around items that weren’t predictable. They had to do it all in coordination with a partner, reacting to each other’s movements so as not to scratch the carefully finished table on a doorway or wall. This kind of movement is easy for us to learn—even when it’s novel. We’ve moved something before and we’re able to use that past experience in this new space. Yes, we may not always get it right (there is an annoying ding in our kitchen wall from a failed furniture moving experience) but we’re much better at it than a machine.
We have evolved to coordinate with each other—it’s one of the core reasons we are the dominant species on the planet. It’s easy for us to communicate, even when it may be hard to describe something in specific words. You can imagine guiding someone when moving a table: “hold on,” “a bit more,” “just a touch,” “watch it.” All of these things mean something to the other person but would confound a talking AI. Even a quick grunt or yelp tells the other person important things.
The physical world is complex, especially when it involves other people. It’s not hard for us to respond to our partner’s movements on the other side of the table. We don’t know what we’re sensing sometimes–is it a change in pressure, a change in the angle of the table, a tilt of their shoulders? But we react before thinking to make sure the table doesn’t fall. We’re good at responding to unpredictable things like stepping over a child’s toy on the floor or avoiding the dog that runs across the room.
At a chess tournament in Moscow, one player moved his hand towards a piece before the other had finished their move. That first player, a child, paid a painful price as the second player, a robot, put its piece down and broke the child’s finger. That would never have happened with two people without some form of malice. But the robot hadn’t been programmed with adequate safety knowledge to avoid physical contact and the child’s excitement overruled his safety training.
A core challenge in creating machines that can help us make and move is that our physical world is complex and we have difficulty in explaining that complexity. In her book, Mind in Motion, Barbara Tversky established that spatial reasoning is the foundation of abstract thought. We understand spatial things without being able to put them in words—try explaining your best friend’s face to someone else well enough that they could draw a picture. We describe things in relationship to ourselves—think about how often you gesture while talking. We’ve evolved to understand and interact with things around us in ways that we can use words to describe. How do we then encode that understanding into code for a machine? If we don’t always use language for our spatial reasoning and abstract thought, how can we teach a machine?
Make AI is a new type of tool for linking the digital and physical worlds. Designers and creators use AI that can make to expand what they can build. Increasingly, AI can help anyone turn their ideas into reality. But machines have fundamental limitations that they may never get beyond. While that may limit their helpfulness, it may also alleviate people’s fears about robots taking all of our jobs, especially in fields that deal with the unpredictability of physical spaces.
These six categories of how humans experience AI—think, see, talk, feel, move, and make—can overlap. One product can contain many experiences. For example, a self-driving car moves, sees, and (increasingly) feels. There are not necessarily hard and fast definitions for each category, but sometimes it’s useful to see the nuance as experiences build on each other. For example, a plant identification app that identifies a plant by showing the user a single likeness of the plant they are trying to identify is quite a different experience than a plant identification app that shows a user the AI’s certainty in its identification. A poisonous plant identification app that goes further and shows the user the probability of the plant being edible versus being inedible is another experience again.