For years we’ve talked about how we live in a human-machine community. Our knowledge comes from our community, and it increasingly includes machines. More and more information is only readable by machines, which can interfere with our sense of volition. We want our technology to make us better: More fair, more productive, and more creative.
ChatGPT’s breakout success matters because it tells us something about what really makes us better. ChatGPT makes us feel like we have a machine collaborator—one that feels synergistic with our own intelligence. But how would we quantify the synergy in this type of collaboration?
It’s now table stakes to talk about augmentation of human abilities using AI. AI and human capabilities are different and complementary when designed in a human-centered way. We can now have reasonable intuitions for what a fruitful machine-human collaboration looks like. Machines are able to compute in multiple dimensions, at a scale and speed that humans can’t. While humans understand context and are best at unpredictable situations. When a human-machine system is designed well, the collective intelligence can exceed that of the sum of the individuals.
Backing up these intuitions with solid productivity data is surprisingly difficult, however. According to MIT researchers, no widely used test exists to compare how much better a human-machine system performs relative to humans alone, machines alone, or any other baselines.
In recent work, MIT researchers developed an analog of the Turing Test to systematically measure the improvements in how well humans and machines together can perform tasks better than either could alone or better than some other relevant benchmark.
The researchers developed a ratio—ρ—which can compare productivity on different tasks. When ρ is less than 1, humans perform worse with AI than they do alone. Above 1, the human-machine system is better. It can also be thought of as a measure of the collective intelligence of the human-machine system.
There are two simple ideas behind the design. The first is that instead of viewing humans and computers as competitors in performing tasks, view them as collaborators. And, instead of viewing human performance as an upper bound, try to maximize the ratio of improvement of the human-computer system relative to some benchmark such as humans only, computers only, or current practice.
The researchers applied their ratio in evaluating recent papers on human-machine collaboration.
Surprisingly, they found that over half of the studies found a decrease in performance when humans work with machines. Human-machine systems performed worse on tasks that require accuracy, sensitivity, or specificity than machines alone. For example, the AI algorithm achieved 75% accuracy in a food labeling task, while the human-AI system only reached 33% accuracy.
On the other hand, the strongest evidence of synergistic human-machine collaboration was in open-domain question-answering tasks. Humans working alone achieved an accuracy of 57%. The algorithm working alone achieved an accuracy of 50%. But when humans viewed the model’s prediction, confidence level, and a brief extractive explanation, they achieved an accuracy of 78% and an increase in productive output of 36%.
It's important that the human-machine systems that worked best in this study were language-based and the tasks very open in their nature. Humans like to have machines support them to be creative and imaginative, not tell them what to do in mundane tasks.
The meta-research did not include any powerful, massive, state-of-the-art AI systems such as GPT-3. So the researchers put machines and humans together to create a website using GPT-3 to write HTML code. The results surprised them.
When human programmers used GPT-3 to write code, they were faster by 30%. This result (which they called synergy) was at the top of the range for any non-state-of-art AI-human system. (That is, all of the previous studies).
Even though human non-programmers would not have been able to create a website using HTML code, they could using GPT-3. In other words, this human-machine combination is an example of extreme synergy, where humans and machines together can do something neither could alone. They could also do this in about the same time as programmers alone.
Generative AI can be your coach, bullshitter, conscience, and now your route to “extreme synergy” with machines. Yes, these tools can make you faster at generating content. But the best personal use of tools like ChatGPT and Dall-E is being able to do something with them that you could never do without them, creating a collective intelligence that is greater than the sum of the parts.
Don’t miss out on more insights about ChatGPT and how humans and machines can work better together. Sign up for our newsletter here.
Interested in learning more about ChatGPT? Check out another Sonder Insight with Three reasons you should care about ChatGPT (if you don’t already).