Artificial intelligence favors white men under 40

December 26, 2021

Listen to this article

Artificial intelligence favors white men under 40

“Insert the missing word: I closed the door to my ____.” It’s an exercise that many remember from their school days. Whereas some societal groups might fill in the space with the word “holiday home”, others may be more likely to insert “dorm room” or “garage”. To a large extent, our word choice depends on our age, where we are from in a country and our social and cultural background.

However, the language models we use in our daily lives while we’re utilizing search engines, machine translation, or engaging with chatbots and commanding Siri, speak the language of some groups better than others. This has now been demonstrated for the first time. The researchers studied whether language models favor the linguistic preferences of some demographic groups over others—referred to in the jargon as sociolectal biases. Their answer? Yes.

What the researchers say: “Across language models, we are able to observe systematic bias. Whereas white men under the age of 40 with less education are the group that language models align with best, the worst alignment is with language used by young, non-white men,” said the lead author of the study.

The analysis demonstrates that up to one in ten of the models’ predictions are significantly worse for young, non-white men compared to young white men.

“Any difference is problematic because differences creep into a wide range of technologies. Language models are important actions in our everyday lives—such as searching for information online,” the lead author continued. “When the availability of information depends on how you formulate yourself and whether your language aligns with that for which models have been trained, it means that information available to others, may not be available to you.”

The researchers add that even a slight bias in the models can have more serious consequences in contexts where precision is key.

“It could be in the insurance sector, where language models are used to group cases and perform customer risk assessments. It could also be in legal contexts, such as in public casework, where models are sometimes used to find similar cases. Under such circumstances, a minor difference can prove decisive.”

Language models are trained by feeding enormous amounts of text into them to teach AI the probability of words occurring in specific contexts. Just as with the school exercise above, models must predict the missing words from a sequence. The texts come from what is available online, most of which have been downloaded from social media and Wikipedia.

“However, the data available on the web isn’t necessarily representative of us as tech users. Wikipedia is a good example in that its content is primarily written by young white men. This matters with regards to the type of language that models learn,” the researchers said.

The researchers remain uncertain as to why precisely the sociolectal characteristics of young white men are represented best by the language models. But they do have an educated guess: “It correlates with the fact that they are the group that has contributed most to the data that models are trained on. A preponderance of data originates from social media. And we know from other studies that it is this demographic that contributes most in writing in these types of open, public fora.

“As computers become more efficient, with more data available, language models tend to grow and be trained on more and more data” explained the lead author. “For the most prevalent type of language used now, it seems—without us knowing why—that the larger the models, the more biases they have. So, unless something is done, the gap between certain social groups will widen.”

Fortunately, something can be done to correct for the problem. “If we are to overcome the distortion, feeding machines with more data won’t do. Instead, an obvious solution is to train the models better. This can be done by changing the algorithms so that instead of treating all data as equally important, they are particularly careful with data that emerges from a more balanced population average,” conclude the researchers.

So, what? Every week there’s more research showing the harm that social media does. We as a society must be able to exert more control over the monster before it totally controls us. Uncontrolled AI is, as most Tribe members know, one of what I call the six horsemen of the modern apocalypse, the trends that pose an existential threat to us as a species. The others are (in no particular order) climate change, unregulated human genetic engineering, overpopulation, pandemics and inequality.

The biases described here just one part of the problem. The cure suggested by the researchers would, I think, make it worse. To cure one bias by entering others seems somewhat self-defeating—especially because as the population changes, so will the bias. Better to educate people so that they understand how they might be being short-changed and show them how to work around it.

For more information on social media click here. For more on bias click here.

Dr Bob Murray

Bob Murray, MBA, PhD (Clinical Psychology), is an internationally recognised expert in strategy, leadership, influencing, human motivation and behavioural change.

Join the discussion

Join our tribe

Subscribe to Dr. Bob Murray’s Today’s Research, a free weekly roundup of the latest research in a wide range of scientific disciplines. Explore leadership, strategy, culture, business and social trends, and executive health.

Thank you for subscribing.
Oops! Something went wrong while submitting the form. Check your details and try again.