Artificial intelligence language models and the false fantasy of participatory language policies




Artificial intelligence neural language models learn from a corpus of online language data, often drawn directly from user-generated content through crowdsourcing or the gift economy, bypassing traditional keepers of language policy and planning (such as governments and institutions). Here lies the dream that the languages of the digital world can bend towards individual needs and wants, and not the traditional way around. Through the participatory language work of users, linguistic diversity, accessibility, personalization, and inclusion can be increased. However, the promise of a more participatory, just, and emancipatory language policy as a result of neural language models is a false fantasy. I argue that neural language models represent a covert and oppressive form of language policy that benefits the privileged and harms the marginalized. Here, I examine the ideology underpinning neural language models and investigate the harms that result from these emerging subversive regulatory bodies.


Ascher, D. (2017). The new yellow journalism: Examining the algorithmic turn in news organizations’ social media information practice through the lens of cultural time orientation. (Proquest ID: Ascher_ucla_0031D_16033) [Doctoral dissertation, University of California, Los Angeles]. eScholarship.

Awan, I., & Khan-Williams, R. (2020). Research briefing report 2020: Coronavirus, fear and how Islamophobia spreads on social media 2020. Anti-Muslim hatred working group.

Bender, E., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610-623. DOI:

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. 30th Conference on Neural Information Processing Systems, 1–9.

Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, 10(7), 1–14. DOI:

Brownlee, J. (2019). What Is Natural Language Processing? Machine learning mastery.

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.

Cameron, D. (1995). Verbal Hygiene. Routledge.

Cao, S., Jiang, W., Yang, B., Zhang, A. L., & Robinson, J. M. (2020). How to talk when a machine is listening: Corporate disclosure in the age of AI (NBER Working Paper No. 27950). National Bureau of Economic Research. DOI:

Crash course computer science. (2017, November 22). Natural language processing: Crash course computer science #36 [Video]. YouTube.

Delirrad, M., & Mohammadi, A. B. (2020). New methanol poisoning outbreaks in Iran following COVID-19 pandemic. Alcohol and Alcoholism, 55(4), 347–348. DOI:

Dickey, M. R. (2021, February 19). Google fires top AI ethics researcher Margaret Mitchell. Tech Crunch.

Erlewine, M. Y., & Kotek, H. (2016). A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory, 34(2), 481–495. DOI:

Google. (n.d.). AI for Social Good. Google AI.

Harwell, D. (2019, November 6). HireVue’s AI face-scanning algorithm increasingly decides whether you deserve the job. The Washington Post.

Hern, A. (2017, October 24). Facebook translates “good morning” into “attack them”, leading to arrest. The Guardian.

Internet World Stats. (2021). Internet world users by language: Top 10 languages. Internet World Stats: Usage and population statistics.

Ipeirotis, P. (2010). The new demographics of Mechanical Turk. A Computer Scientist in a Business School.

Kaadzi Ghansah, R. (2017, August 21). A most American terrorist: The making of Dylann Roof. GQ.

Kelly-Holmes, H. (2019). Multilingualism and technology: A review of developments in digital communication from monolingualism to idiolingualism. Annual Review of Applied Linguistics, 39, 24–39. DOI:

Klebnikov, S. (2020, August 28). U.S. tech stocks are now worth more than $9 trillion, eclipsing the entire European stock market. Forbes.

Kristiansen, T. (2003). Language attitudes and language politics in Denmark. International Journal of the Sociology of Language, 159(2003), 57–71. DOI:

La Monica, P. (2021, January 6). Proof Big Tech is way too big: It’s a quarter of your portfolio. CNN.

Manojlovic, D. (2021). Report to the Vancouver Police Board: Year-end 2020 Year-to-date key performance indicators report. Vancouver Police Department.

Martin, J. L. (2021). Spoken corpora data, automatic speech recognition, and bias against African American language: The case of habitual ‘be.’ FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 284. DOI:

McGuffie, K., & Newhouse, A. (2020). The radicalization risks of GPT-3 and advanced neural language models. ArXiv.

Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal of Sociolinguistics, 5(4), 530–555. DOI:

Monteiro, M. (2019). Ruined by design: How designers destroyed the world, and what we can do to fix it. Mule Books.

Nakov, P., & Da San Martino, G. (2020, November 19). Fact-checking, fake news, propaganda, and media bias: Truth seeking in the post-truth era [Conference presentation]. EMNLP 2020 Conference. DOI:

Noble, S. (2018). Algorithms of oppression. NYU Press. DOI:

O’Neil, C. (2016). Weapons of math destruction. Crown Books.

Pew Research Center. (2021). Internet/broadband fact sheet. Pew research center.

Project 1907. (2020). Racism incident reporting centre: A community-based reporting tool to track incidents of racism. Project 1907.

Robertson, K., Khoo, C., & Song, Y. (2020). To surveil and predict: A human rights analysis of algorithmic policing in Canada. Citizen Lab and the International Human Rights Program.

Romer, D., & Jamieson, K. H. (2020). Conspiracy theories as barriers to controlling the spread of COVID-19 in the U.S. Social Science and Medicine, 263. DOI:

Schiffler, Z. (2021, March 5). Timnit Gebru was fired from Google - then the harassers arrived. The Verge.

Shohamy, E. (2008). Language policy and language assessment: The relationship. Current Issues in Language Planning, 9(3), 363–373. DOI:

van der Linden, S., Roozenbeek, J., & Compton, J. (2020). Inoculating against fake news about COVID-19. Frontiers in Psychology, 11. DOI:

W3Techs. (2021). Historical trends in the usage statistics of content languages for websites. Web technology surveys.

Wachter-Boettcher, S. (2017). Technically wrong: Sexist apps, biased algorithms, and other threats of toxic tech. W.W. Norton & Company.

Wigglesworth, R. (2020, December 5). Robo-surveillance shifts tone of CEO earnings calls. Financial Times.

World Health Organization. (2021). Infodemic. World Health Organization.




How to Cite

Mandy Lau. (2021). Artificial intelligence language models and the false fantasy of participatory language policies. Working Papers in Applied Linguistics and Linguistics at York, 1, 4–15.