Artificial intelligence language models and the false fantasy of participatory language policies

Authors

DOI:

https://doi.org/10.25071/2564-2855.5

Abstract

Artificial intelligence neural language models learn from a corpus of online language data, often drawn directly from user-generated content through crowdsourcing or the gift economy, bypassing traditional keepers of language policy and planning (such as governments and institutions). Here lies the dream that the languages of the digital world can bend towards individual needs and wants, and not the traditional way around. Through the participatory language work of users, linguistic diversity, accessibility, personalization, and inclusion can be increased. However, the promise of a more participatory, just, and emancipatory language policy as a result of neural language models is a false fantasy. I argue that neural language models represent a covert and oppressive form of language policy that benefits the privileged and harms the marginalized. Here, I examine the ideology underpinning neural language models and investigate the harms that result from these emerging subversive regulatory bodies.

References

Ascher, D. (2017). The new yellow journalism: Examining the algorithmic turn in news organizations’ social media information practice through the lens of cultural time orientation. (Proquest ID: Ascher_ucla_0031D_16033) [Doctoral dissertation, University of California, Los Angeles]. eScholarship.

Awan, I., & Khan-Williams, R. (2020). Research briefing report 2020: Coronavirus, fear and how Islamophobia spreads on social media 2020. Anti-Muslim hatred working group. https://antimuslimhatredworkinggrouphome.files.wordpress.com/2020/04/research-briefing-report-7-1.pdf

Bender, E., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922 DOI: https://doi.org/10.1145/3442188.3445922

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. 30th Conference on Neural Information Processing Systems, 1–9. https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf

Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, 10(7), 1–14. https://doi.org/10.1038/s41467-018-07761-2 DOI: https://doi.org/10.1038/s41467-018-07761-2

Brownlee, J. (2019). What Is Natural Language Processing? Machine learning mastery. https://machinelearningmastery.com/natural-language-processing/

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

Cameron, D. (1995). Verbal Hygiene. Routledge.

Cao, S., Jiang, W., Yang, B., Zhang, A. L., & Robinson, J. M. (2020). How to talk when a machine is listening: Corporate disclosure in the age of AI (NBER Working Paper No. 27950). National Bureau of Economic Research. https://www.nber.org/papers/w27950 DOI: https://doi.org/10.3386/w27950

Crash course computer science. (2017, November 22). Natural language processing: Crash course computer science #36 [Video]. YouTube. https://www.youtube.com/watch?v=fOvTtapxa9c

Delirrad, M., & Mohammadi, A. B. (2020). New methanol poisoning outbreaks in Iran following COVID-19 pandemic. Alcohol and Alcoholism, 55(4), 347–348. https://doi.org/10.1093/alcalc/agaa036 DOI: https://doi.org/10.1093/alcalc/agaa036

Dickey, M. R. (2021, February 19). Google fires top AI ethics researcher Margaret Mitchell. Tech Crunch. https://techcrunch.com/2021/02/19/google-fires-top-ai-ethics-researcher-margaret-mitchell/

Erlewine, M. Y., & Kotek, H. (2016). A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory, 34(2), 481–495. https://doi.org/10.1007/S11049-015-9305-9 DOI: https://doi.org/10.1007/s11049-015-9305-9

Google. (n.d.). AI for Social Good. Google AI. https://ai.google/social-good/

Harwell, D. (2019, November 6). HireVue’s AI face-scanning algorithm increasingly decides whether you deserve the job. The Washington Post. https://www.washingtonpost.com/technology/2019/10/22/ai-hiring-face-scanning-algorithm-increasingly-decides-whether-you-deserve-job/

Hern, A. (2017, October 24). Facebook translates “good morning” into “attack them”, leading to arrest. The Guardian. https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-morning-attack-them-arrest

Internet World Stats. (2021). Internet world users by language: Top 10 languages. Internet World Stats: Usage and population statistics. https://www.internetworldstats.com/stats7.htm

Ipeirotis, P. (2010). The new demographics of Mechanical Turk. A Computer Scientist in a Business School. https://www.behind-the-enemy-lines.com/2010/03/new-demographics-of-mechanical-turk.html

Kaadzi Ghansah, R. (2017, August 21). A most American terrorist: The making of Dylann Roof. GQ. https://www.gq.com/story/dylann-roof-making-of-an-american-terrorist

Kelly-Holmes, H. (2019). Multilingualism and technology: A review of developments in digital communication from monolingualism to idiolingualism. Annual Review of Applied Linguistics, 39, 24–39. https://doi.org/10.1017/S0267190519000102 DOI: https://doi.org/10.1017/S0267190519000102

Klebnikov, S. (2020, August 28). U.S. tech stocks are now worth more than $9 trillion, eclipsing the entire European stock market. Forbes. https://www.forbes.com/sites/sergeiklebnikov/2020/08/28/us-tech-stocks-are-now-worth-more-than-9-trillion-eclipsing-the-entire-european-stock-market/

Kristiansen, T. (2003). Language attitudes and language politics in Denmark. International Journal of the Sociology of Language, 159(2003), 57–71. https://doi.org/10.1515/ijsl.2003.009 DOI: https://doi.org/10.1515/ijsl.2003.009

La Monica, P. (2021, January 6). Proof Big Tech is way too big: It’s a quarter of your portfolio. CNN. https://www.cnn.com/2021/01/06/investing/stocks-sp-500-tech/index.html

Manojlovic, D. (2021). Report to the Vancouver Police Board: Year-end 2020 Year-to-date key performance indicators report. Vancouver Police Department. https://vancouverpoliceboard.ca/police/policeboard/agenda/2021/0218/5-1-2102P01-Year-end-2020-KPI-Report.pdf

Martin, J. L. (2021). Spoken corpora data, automatic speech recognition, and bias against African American language: The case of habitual ‘be.’ FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 284. https://doi.org/10.1145/3442188.3445893 DOI: https://doi.org/10.1145/3442188.3445893

McGuffie, K., & Newhouse, A. (2020). The radicalization risks of GPT-3 and advanced neural language models. ArXiv. http://arxiv.org/abs/2009.06807

Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal of Sociolinguistics, 5(4), 530–555. DOI: https://doi.org/10.1111/1467-9481.00163

Monteiro, M. (2019). Ruined by design: How designers destroyed the world, and what we can do to fix it. Mule Books.

Nakov, P., & Da San Martino, G. (2020, November 19). Fact-checking, fake news, propaganda, and media bias: Truth seeking in the post-truth era [Conference presentation]. EMNLP 2020 Conference. https://virtual.2020.emnlp.org/tutorial_T2.html DOI: https://doi.org/10.18653/v1/2020.emnlp-tutorials.2

Noble, S. (2018). Algorithms of oppression. NYU Press. DOI: https://doi.org/10.2307/j.ctt1pwt9w5

O’Neil, C. (2016). Weapons of math destruction. Crown Books.

Pew Research Center. (2021). Internet/broadband fact sheet. Pew research center. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/

Project 1907. (2020). Racism incident reporting centre: A community-based reporting tool to track incidents of racism. Project 1907. https://www.project1907.org/reportingcentre

Robertson, K., Khoo, C., & Song, Y. (2020). To surveil and predict: A human rights analysis of algorithmic policing in Canada. Citizen Lab and the International Human Rights Program. https://citizenlab.ca/wp-content/uploads/2020/09/To-Surveil-and-Predict.pdf

Romer, D., & Jamieson, K. H. (2020). Conspiracy theories as barriers to controlling the spread of COVID-19 in the U.S. Social Science and Medicine, 263. https://doi.org/10.1016/j.socscimed.2020.113356 DOI: https://doi.org/10.1016/j.socscimed.2020.113356

Schiffler, Z. (2021, March 5). Timnit Gebru was fired from Google - then the harassers arrived. The Verge. https://www.theverge.com/22309962/timnit-gebru-google-harassment-campaign-jeff-dean

Shohamy, E. (2008). Language policy and language assessment: The relationship. Current Issues in Language Planning, 9(3), 363–373. https://doi.org/10.1080/14664200802139604 DOI: https://doi.org/10.1080/14664200802139604

van der Linden, S., Roozenbeek, J., & Compton, J. (2020). Inoculating against fake news about COVID-19. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.566790 DOI: https://doi.org/10.3389/fpsyg.2020.566790

W3Techs. (2021). Historical trends in the usage statistics of content languages for websites. Web technology surveys. https://w3techs.com/technologies/history_overview/content_language

Wachter-Boettcher, S. (2017). Technically wrong: Sexist apps, biased algorithms, and other threats of toxic tech. W.W. Norton & Company.

Wigglesworth, R. (2020, December 5). Robo-surveillance shifts tone of CEO earnings calls. Financial Times. https://www.ft.com/content/ca086139-8a0f-4d36-a39d-409339227832

World Health Organization. (2021). Infodemic. World Health Organization. https://www.who.int/health-topics/infodemic#tab=tab_1

Downloads

Published

2021-09-13

How to Cite

Mandy Lau. (2021). Artificial intelligence language models and the false fantasy of participatory language policies. Working Papers in Applied Linguistics and Linguistics at York, 1, 4–15. https://doi.org/10.25071/2564-2855.5

Issue

Section

Articles