Artificial intelligence favors white men under 40
Listen to this article
“Insert the missing word: I closed the door to my ____.” It’s an exercise that many remember from their school days. Whereas some societal groups might fill in the space with the word “holiday home”, others may be more likely to insert “dorm room” or “garage”. To a large extent, our word choice depends on our age, where we are from in a country and our social and cultural background.
However, the language models we use in our daily lives while we’re utilizing search engines, machine translation, or engaging with chatbots and commanding Siri, speak the language of some groups better than others. This has now been demonstrated for the first time. The researchers studied whether language models favor the linguistic preferences of some demographic groups over others—referred to in the jargon as sociolectal biases. Their answer? Yes.
What the researchers say: “Across language models, we are able to observe systematic bias. Whereas white men under the age of 40 with less education are the group that language models align with best, the worst alignment is with language used by young, non-white men,” said the lead author of the study.
The analysis demonstrates that up to one in ten of the models’ predictions are significantly worse for young, non-white men compared to young white men.
“Any difference is problematic because differences creep into a wide range of technologies. Language models are important actions in our everyday lives—such as searching for information online,” the lead author continued. “When the availability of information depends on how you formulate yourself and whether your language aligns with that for which models have been trained, it means that information available to others, may not be available to you.”
The researchers add that even a slight bias in the models can have more serious consequences in contexts where precision is key.
“It could be in the insurance sector, where language models are used to group cases and perform customer risk assessments. It could also be in legal contexts, such as in public casework, where models are sometimes used to find similar cases. Under such circumstances, a minor difference can prove decisive.”
Language models are trained by feeding enormous amounts of text into them to teach AI the probability of words occurring in specific contexts. Just as with the school exercise above, models must predict the missing words from a sequence. The texts come from what is available online, most of which have been downloaded from social media and Wikipedia.
“However, the data available on the web isn’t necessarily representative of us as tech users. Wikipedia is a good example in that its content is primarily written by young white men. This matters with regards to the type of language that models learn,” the researchers said.
The researchers remain uncertain as to why precisely the sociolectal characteristics of young white men are represented best by the language models. But they do have an educated guess: “It correlates with the fact that they are the group that has contributed most to the data that models are trained on. A preponderance of data originates from social media. And we know from other studies that it is this demographic that contributes most in writing in these types of open, public fora.
“As computers become more efficient, with more data available, language models tend to grow and be trained on more and more data” explained the lead author. “For the most prevalent type of language used now, it seems—without us knowing why—that the larger the models, the more biases they have. So, unless something is done, the gap between certain social groups will widen.”
Fortunately, something can be done to correct for the problem. “If we are to overcome the distortion, feeding machines with more data won’t do. Instead, an obvious solution is to train the models better. This can be done by changing the algorithms so that instead of treating all data as equally important, they are particularly careful with data that emerges from a more balanced population average,” conclude the researchers.
So, what? Every week there’s more research showing the harm that social media does. We as a society must be able to exert more control over the monster before it totally controls us. Uncontrolled AI is, as most Tribe members know, one of what I call the six horsemen of the modern apocalypse, the trends that pose an existential threat to us as a species. The others are (in no particular order) climate change, unregulated human genetic engineering, overpopulation, pandemics and inequality.
The biases described here just one part of the problem. The cure suggested by the researchers would, I think, make it worse. To cure one bias by entering others seems somewhat self-defeating—especially because as the population changes, so will the bias. Better to educate people so that they understand how they might be being short-changed and show them how to work around it.
Join the discussion
More from this issue of TR
Artificial intelligence favors white men under 40
The language models we use in our daily lives while we’re utilizing search engines, machine translation, or engaging with chatbots and commanding Siri, speak the language of some groups better than others. Even a slight bias can have serious consequences in contexts where precision is key.
Praise that slays: How complimenting a competitor can drive a firm's revenues
Brands typically avoid complimenting their competitors because they don’t want to offer a rival brand free publicity. However, experiments with brand-to-brand praise led to more positive consequences for the praising brand, including greater brand engagement and higher sales.
How people understand other people
Mindreading can play a major role in social cognition, enabling us to develop an individual understanding of others and playing a crucial role in building and maintaining long-term relationships. It's more prevalent in high performing teams, with team members able to communicate using parts of the brain that others are unable to.
You might be interested inBack to Today's Research
Steps to reduce 'cybervetting' bias in hiring
Failing to better regulate the use of cybervetting can introduce bias into an organization’s hiring processes, invade the privacy of job seekers and ultimately hurt the organization’s bottom line.
Third Reich's legacy tied to present-day xenophobia and political intolerance
We cannot see the ripples from the meteor or asteroid that crashed into the Earth killing off the dinosaurs, but they’re still there. Diminishing, yes, but like the famous analogy of the frog crossing the road, whose every leap is half as long as the last, never ending. For good or ill we are still being affected by that impact.
Join our tribe
Subscribe to Dr. Bob Murray’s Today’s Research, a free weekly roundup of the latest research in a wide range of scientific disciplines. Explore leadership, strategy, culture, business and social trends, and executive health.