Machines

We’ve looked at four ways that machines use data to produce an output. Now we will look at four types of algorithms—the recipes to make models about the world.

Inverse deduction works backward from conclusions or “learning by example.” An algorithm includes some known or presumed premises and asks the system “what knowledge is missing?”

These algorithms can automatically construct rules to account for observations by attempting to find general patterns and then inferring a rule. Inputs are selected or identified that do the best job of dividing the dataset into similar parts.

In many ways, inverse deduction mirrors the scientific method as specific observations are linked to a more general rule by way of testing and iterating against a hypothesis.

Think of how subtraction is to addition as inverse deduction is to deduction.

In machine learning, analogizers match bits of data using mathematical functions to ask “what things do I see that are most like things I’ve seen before?”

In picture y we can see examples of simple analogies: arrow is to bow as wheel is to car, apple is to tree as oar is to boat. Understanding analogies is something we humans do intuitively all the time. We recognize similarities between situations and infer other situations.

Algorithms that analogize are extremely powerful because they can handle huge multi-dimensional spaces and efficiently classify large data sets. Analogizers are commonly used in recommendation systems where it’s valuable to cluster like with like. For instance, if you like Michael Jackson’s “Thriller” you will probably like “When Doves Cry” by Prince.

Humans reason using analogies all the time. Much of our abstract reasoning and pattern recognition is by analogy as our brains build higher abstractions and associations.

Neural networks are inspired by the structure and function of the brain. There are many varieties of neural networks, each utilizing different “neuron” designs, embedded statistical functions, and computational tricks.

A mathematical neuron receives inputs and each input is assigned a weight. A higher weight is excitatory and a lower weight is inhibitory, relatively speaking. Each neuron can receive multiple inputs, each with a different weight. The strength of the connections is called “activation” and it is calculated as a weighted sum of the activations of all the neurons that feed into it. The activation function within the neuron takes the input and calculates the output activation. This output is used as input for other neurons farther up the network.

Neural networks have received more attention than any other form of algorithm over the past decade. While the concepts have been around for a long time, it was only when scientists had access to modern sources of big data (such as image libraries) and large computers that neural networks took off. This increased compute power has allowed neural networks to get really big and complex, with many layers stacked on top of each other, frequently called deep learning. These networks now run to millions of neurons and billions of connections and are ubiquitously used in image classification, speech and voice, and many other applications. Next time you say “hey Siri” or “ask Alexa,” remember there is a deep neural network under the hood.

Neural networks build representations of concepts by using hidden layers. These layers compress input feature information so that the network has to form a new, more abstract, representation which allows the network to discover patterns. For example, in an image recognition network, lower levels will recognize lines while higher levels of the network will use the position of the lines to recognize a particular line as the edge of an eye. As the level of abstraction increases, networks can “conceptualize” by creating intermediate representations. For example, a network that classifies fruit contains internal representations in the network for fruit-like characteristics such as red, fuzzy or shiny. Humans aren’t big deep learning networks but there are parallels. We are pattern matchers too. Networks that weave associations and link abstract concepts help us understand how we can have a sense of* just knowing*.

There has been exciting progress in the performance of neural networks but they aren’t brains. Networks perform operations that humans do unconsciously in a few microseconds. These operations are complex but are only one task in the chain of human perception. A neural network will recognize an image, categorize it, and access its meaning. But this meaning is limited compared with a human’s idea of meaning because our brains go so much further. We take the image, explore it consciously for many seconds. Then our brains formulate representations and theories about the object that we share with others. We go far beyond pattern-recognition because our learning is integrated in a network of knowledge. We build abstract models of the world, not superficial associations with no link to generalizable concepts.

Most of the analytical processes we’ve talked about so far use values as inputs, such as determining the color of a pixel, the price of a house, or the body temperature of a patient. What if the input is a probability instead of value? How do we use an input like “10% of happy people are rich” in an algorithm? For problems like this we can use Bayesian algorithms.

Bayesian learning says that the right way to make inferences is to use probabilities to extract as much information as possible from data. Even the most uncertain observations can be used to build knowledge. In practice, it is the process of continually updating the probabilities of inputs, which in turn updates the probability of the output.

Bayesian algorithms use probabilistic inference, a statistical procedure used to estimate the parameters of an underlying or hidden distribution based on a prior distribution. So, while in a neural network we apply weights to inputs, in a Bayesian algorithm, we assign probabilities instead. Bayesian learning is useful because observations in the real world are not usually true or false so Bayes gives us a way to include uncertainty in logical reasoning. It is a mathematically precise way of reasoning with probabilistic data.

Bayesian statistics is often quite counter-intuitive because humans have a tendency to neglect base rates—the background rate of something being true. These figures illustrate the idea. When asked “is this person a farmer or a librarian?” after being told that someone wears glasses and is an introvert, most people think this person is a librarian. But, in the absence of any other information the correct answer is that this person is most likely to be a farmer, simply because there are many, many more farmers in the world than librarians. In fact, there are around twenty male farmers as compared to male librarians in the USA! The base rate of *farmer given person* is much higher than the base rate of *librarian given person*.

Machines don’t get fooled about probability like we do—it’s a powerful skill.

Great Machine Strength: AI can be engineered to solve problems in many different ways, some of which do not come easily to humans.

Great Machine Weakness: Engineering AI is supremely complex. AI doesn’t behave like traditional software and remains, for now, a specialized skill.

Machines don’t get fooled about probability like we do—it’s a powerful skill.

Great Machine Strength: AI can be engineered to solve problems in many different ways, some of which do not come easily to humans.

Great Machine Weakness: Engineering AI is supremely complex. AI doesn’t behave like traditional software and remains, for now, a specialized skill.