Bias, prejudice, and hate from machines

There are two categories of bias that are particularly important. If historical human bias is reflected in a dataset used to train AI, the AI will likely exhibit the same bias. And if the dataset isn’t representative it will learn to predict things about some groups better than others.

Amazon’s now abandoned recruitment algorithm, “learned” to downgrade resumes associated with women. That happened because the data reflected the historical bias that men were historically seen as a better “fit” for employment. The AI consequently adopted the historical bias. As far as we know the dataset included women. The problem was that women had historically been hired at lower rates, so the algorithm learned to mimic this pattern. Anywhere this sort of historical bias is embedded in the data, AI will likely be biased.

Representation in data comes up in ways that are obvious to those underrepresented but often not obvious to those who are making design decisions from positions of privilege. In her book Algorithms of Oppression, Safiya Umoja Noble interviews Kandis, a Black woman who Noble described as owning the only local African American hair salon within a predominantly white neighborhood near a prestigious college town in the US. When she was asked about her experience with the business review site Yelp, Kandis highlighted missing context that she saw as important to how Yelp’s algorithms prioritize her business:

“Black people don’t ‘check in’ and let people know where they’re at when they sit in my chair. They already feel like they are being hunted; they aren’t going to tell The Man where they are.”

—Kandis via Safiya Umoja Noble

If people don’t ‘check in’ and tell an app like Yelp where they are, Yelp’s AI will think that the location is less popular than others. If Kandis’s experience is common, that could create a broad under-representation of Black people versus non-Black people, leading to algorithmic bias toward the experiences of non-Black people.

Language is one of the most important developments in both human and machine intelligence. We use language for comprehension, communicating our thoughts, sharing concepts and ideas, creating memories, and to build mutual understanding and cooperation. Language is foundational to our social intelligence so developing machines that understand language is a core goal in AI.

Language models can present a special case of bias. If you are a native speaker of English you might not have noticed that masculine and feminine pronouns aren’t grammatically the same. We subconsciously know that the car is his, or, the car is hers. And while it’s his car, it’s her car too. Notice the difference now?

In English, some pronouns double-up. We use her for both the Object and Dependent Possessive while we use his for both the Dependent and Independent Possessive. Language models, however, can classify hers as an adjective rather than a pronoun, missing its possessive nature.

In practical terms, this difference means language models can be biased to think that men own things and women don’t. Robert (Munro) Monarch, the researcher that discovered this bias, wrote “the NLP systems can learn that “his” is a pronoun in the Dependent context and then guess correctly because it’s the same word in the Independent context. This isn’t possible for “her/hers” with the different spellings. This might be the most important lesson to learn here: harmless differences in human speech can become biases in machine learning.”

Monarch researched how Google’s language model, BERT, preferred to assign gender to 104 items including concrete items (wheels, water, city, house, pockets) and abstract items (night, instincts, innocence, pleasure). Of the 104 items, there was only one item, mom, that was preferred for “hers” over “his.” You might be surprised to learn that BERT preferred “his” for items including girl, jewelry, baby, kid, and even mother. Some of the preferences may have significant consequences. For instance, “land” and “house” are 2 times more likely to be predicted as “his” while “money” is 23 times more likely. Perhaps a small counterbalance, “shit’ is 9 times more likely to be predicted as his as well.

Research into bias in language models has revealed many instances of bias. Language reflects culture and meaning. Language encodes many biases and machines learn them as they predict the proximity of one word to another. A model is trained by taking sentences, splitting them into individual words, randomly hiding some of them, and then predicting the hidden words. The machine learns correlations between words that, depending on the context, humans may consider sexist, racist, or biased in some way.

The problem comes because these correlations are not spurious, they are real. Female names are more associated with modeling or nursing while male names are more associated with doctors. If a recruiter assumed all doctors were men, we would call them sexist but this is exactly what would happen if a language model that had learned this association was thoughtlessly incorporated in a chatbot.

This problem gets worse at scale, potentially amplifying and perpetuating stereotypes. In a paper exploring anit-Muslim bias in large-scale language models, when prompted with “Two Muslims walked into a bar…”. GPT-3 typically finishes the sentence with violence. The solution to language model bias isn’t straightforward and consists of a variety of tactics including reducing unwanted correlations, filtering training data for toxicity, or replacing a noun with its gender-partner. For example, a model initially trained on “The lady doth protest too much,” can also be trained on “The gentleman doth protest too much.” While there has been a lot of progress, the backstop remains human intervention and judgment.

Great Machine Strength: Machines fail in predictable ways and errors or bias can be measured statistically.

Great Machine Weakness: Machines feed on data from the world that is biased. Because of the speed and scale of AI, harm can happen before anyone notices.


A completely unbiased dataset is not possible. Bias, in some form, will always exist, which means that it’s vital to understand how bias affects different groups. This is the role of fairness testing and metric-setting.

There are multiple technical definitions of fairness. They describe what happens to different populations when AI makes an incorrect prediction. The most simple idea of fairness is to ensure some form of parity across a predetermined list of groups. The first groups to consider are usually legally protected categories like race or gender. Increasingly, companies want to investigate other affected groups or domains where discrimination is a concern.

AI makes four types of predictions. As an example, let’s look at a pre-recruitment algorithm that tests candidates, gives them a score, and then recommends candidates to interview based on their score. This algorithm can:

  • Recommend a candidate that it correctly predicts would be good at the job; a true positive (TP)
  • Recommend a candidate that won’t be good at the job; a false positive (FP)
  • Not recommend a candidate that wouldn’t be good at the job; a true negative (TN)
  • Not recommend a candidate that would be good at the job; a false negative (FN)

Statistical fairness tests use false negatives and positives—error rates—to test various ratios of failure between different groups. There are many types of fairness tests but they fall into three broad categories:

  1. Individual fairness, where similar predictions are given to similar individuals.
  2. Group fairness, where different groups are treated equally.
  3. Subgroup fairness, which tries to balance both approaches by picking the best properties of the individual and the group, and testing across various subgroups.

These are some examples of commonly used metrics:

  • Group fairness: Equal positive prediction rates (TP + FP)
  • Equalized odds: Equal false positive rates (FP / (TN + FP)) and equal false negative rates (FN / (TP + FN))
  • Conditional use accuracy equality: Equal positive predictive values, also known as precision (TP / (TP + FP)) and equal negative predictive values (TN / (TN + FN))
  • Overall accuracy equality: Equal accuracies (TP + TN)
  • Treatment equality: Equal ratios of wrong predictions (FP / FN)

The problem is that there is usually a conflict between accuracy and fairness. In addition, these mathematical methods cannot be solved simultaneously. Even with many technical definitions, fairness testing remains context and value dependent. It involves making decisions about the kinds of mistakes that are made and how these mistakes are distributed between different groups.

“In the era of data and machine learning, society will have to accept, and make decisions about, trade-offs between how fair models are and how accurate they are. In fact, such trade-offs have always been implicitly present in human decision-making; the data-centric, algorithmic era has just brought them to the fore and encouraged us to reason about them more precisely.”

Michael Kearns

The reason there are so many fairness metrics is that life has many ways to create fairness. Fairness is uniquely human. Many animals have evolved comparative ways of thinking. Non-human primates, for example, evaluate what food they receive after a group hunt relative to what is theoretically available. Only humans evaluate their share relative to someone else’s. This preference for fairness is social by definition. Human designers must define fairness because, if they don’t, bias will be embedded by way of different groupings in the data.

Now that you understand how machines learn, you can intuit how an AI makes decisions and when or how it might be wrong. You know that machines are good at tasks involving repetition—they never get bored or lose focus. Machines can respond to queries instantaneously and are precise when they multitask. Machines remember everything they’ve seen and can store it forever. But machines can also make mistakes. They may struggle to unlearn a pattern or bias. Machine errors are part of AI. Machine fallibility comes from flaws or inconsistencies in data and from human choices in the design of algorithms and model tuning.

Great Machine Strength: AI can reveal bias to us and prompt us to be more precise about fairness.

Great Machine Weakness: AI can make things unfair in ways we are yet to even discover.