Artificial intelligence language models and the false fantasy of participatory language policies
DOI:
https://doi.org/10.25071/2564-2855.5Abstract
Artificial intelligence neural language models learn from a corpus of online language data, often drawn directly from user-generated content through crowdsourcing or the gift economy, bypassing traditional keepers of language policy and planning (such as governments and institutions). Here lies the dream that the languages of the digital world can bend towards individual needs and wants, and not the traditional way around. Through the participatory language work of users, linguistic diversity, accessibility, personalization, and inclusion can be increased. However, the promise of a more participatory, just, and emancipatory language policy as a result of neural language models is a false fantasy. I argue that neural language models represent a covert and oppressive form of language policy that benefits the privileged and harms the marginalized. Here, I examine the ideology underpinning neural language models and investigate the harms that result from these emerging subversive regulatory bodies.
References
Ascher, D. (2017). The new yellow journalism: Examining the algorithmic turn in news organizations’ social media information practice through the lens of cultural time orientation. (Proquest ID: Ascher_ucla_0031D_16033) [Doctoral dissertation, University of California, Los Angeles]. eScholarship.
Awan, I., & Khan-Williams, R. (2020). Research briefing report 2020: Coronavirus, fear and how Islamophobia spreads on social media 2020. Anti-Muslim hatred working group. https://antimuslimhatredworkinggrouphome.files.wordpress.com/2020/04/research-briefing-report-7-1.pdf
Bender, E., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610-623. https://doi.org/10.1145/3442188.3445922 DOI: https://doi.org/10.1145/3442188.3445922
Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. 30th Conference on Neural Information Processing Systems, 1–9. https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf
Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, 10(7), 1–14. https://doi.org/10.1038/s41467-018-07761-2 DOI: https://doi.org/10.1038/s41467-018-07761-2
Brownlee, J. (2019). What Is Natural Language Processing? Machine learning mastery. https://machinelearningmastery.com/natural-language-processing/
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15. http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
Cameron, D. (1995). Verbal Hygiene. Routledge.
Cao, S., Jiang, W., Yang, B., Zhang, A. L., & Robinson, J. M. (2020). How to talk when a machine is listening: Corporate disclosure in the age of AI (NBER Working Paper No. 27950). National Bureau of Economic Research. https://www.nber.org/papers/w27950 DOI: https://doi.org/10.3386/w27950
Crash course computer science. (2017, November 22). Natural language processing: Crash course computer science #36 [Video]. YouTube. https://www.youtube.com/watch?v=fOvTtapxa9c
Delirrad, M., & Mohammadi, A. B. (2020). New methanol poisoning outbreaks in Iran following COVID-19 pandemic. Alcohol and Alcoholism, 55(4), 347–348. https://doi.org/10.1093/alcalc/agaa036 DOI: https://doi.org/10.1093/alcalc/agaa036
Dickey, M. R. (2021, February 19). Google fires top AI ethics researcher Margaret Mitchell. Tech Crunch. https://techcrunch.com/2021/02/19/google-fires-top-ai-ethics-researcher-margaret-mitchell/
Erlewine, M. Y., & Kotek, H. (2016). A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory, 34(2), 481–495. https://doi.org/10.1007/S11049-015-9305-9 DOI: https://doi.org/10.1007/s11049-015-9305-9
Google. (n.d.). AI for Social Good. Google AI. https://ai.google/social-good/
Harwell, D. (2019, November 6). HireVue’s AI face-scanning algorithm increasingly decides whether you deserve the job. The Washington Post. https://www.washingtonpost.com/technology/2019/10/22/ai-hiring-face-scanning-algorithm-increasingly-decides-whether-you-deserve-job/
Hern, A. (2017, October 24). Facebook translates “good morning” into “attack them”, leading to arrest. The Guardian. https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-morning-attack-them-arrest
Internet World Stats. (2021). Internet world users by language: Top 10 languages. Internet World Stats: Usage and population statistics. https://www.internetworldstats.com/stats7.htm
Ipeirotis, P. (2010). The new demographics of Mechanical Turk. A Computer Scientist in a Business School. https://www.behind-the-enemy-lines.com/2010/03/new-demographics-of-mechanical-turk.html
Kaadzi Ghansah, R. (2017, August 21). A most American terrorist: The making of Dylann Roof. GQ. https://www.gq.com/story/dylann-roof-making-of-an-american-terrorist
Kelly-Holmes, H. (2019). Multilingualism and technology: A review of developments in digital communication from monolingualism to idiolingualism. Annual Review of Applied Linguistics, 39, 24–39. https://doi.org/10.1017/S0267190519000102 DOI: https://doi.org/10.1017/S0267190519000102
Klebnikov, S. (2020, August 28). U.S. tech stocks are now worth more than $9 trillion, eclipsing the entire European stock market. Forbes. https://www.forbes.com/sites/sergeiklebnikov/2020/08/28/us-tech-stocks-are-now-worth-more-than-9-trillion-eclipsing-the-entire-european-stock-market/
Kristiansen, T. (2003). Language attitudes and language politics in Denmark. International Journal of the Sociology of Language, 159(2003), 57–71. https://doi.org/10.1515/ijsl.2003.009 DOI: https://doi.org/10.1515/ijsl.2003.009
La Monica, P. (2021, January 6). Proof Big Tech is way too big: It’s a quarter of your portfolio. CNN. https://www.cnn.com/2021/01/06/investing/stocks-sp-500-tech/index.html
Manojlovic, D. (2021). Report to the Vancouver Police Board: Year-end 2020 Year-to-date key performance indicators report. Vancouver Police Department. https://vancouverpoliceboard.ca/police/policeboard/agenda/2021/0218/5-1-2102P01-Year-end-2020-KPI-Report.pdf
Martin, J. L. (2021). Spoken corpora data, automatic speech recognition, and bias against African American language: The case of habitual ‘be.’ FAcct ’21: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 284. https://doi.org/10.1145/3442188.3445893 DOI: https://doi.org/10.1145/3442188.3445893
McGuffie, K., & Newhouse, A. (2020). The radicalization risks of GPT-3 and advanced neural language models. ArXiv. http://arxiv.org/abs/2009.06807
Milroy, J. (2001). Language ideologies and the consequences of standardization. Journal of Sociolinguistics, 5(4), 530–555. DOI: https://doi.org/10.1111/1467-9481.00163
Monteiro, M. (2019). Ruined by design: How designers destroyed the world, and what we can do to fix it. Mule Books.
Nakov, P., & Da San Martino, G. (2020, November 19). Fact-checking, fake news, propaganda, and media bias: Truth seeking in the post-truth era [Conference presentation]. EMNLP 2020 Conference. https://virtual.2020.emnlp.org/tutorial_T2.html DOI: https://doi.org/10.18653/v1/2020.emnlp-tutorials.2
Noble, S. (2018). Algorithms of oppression. NYU Press. DOI: https://doi.org/10.2307/j.ctt1pwt9w5
O’Neil, C. (2016). Weapons of math destruction. Crown Books.
Pew Research Center. (2021). Internet/broadband fact sheet. Pew research center. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/
Project 1907. (2020). Racism incident reporting centre: A community-based reporting tool to track incidents of racism. Project 1907. https://www.project1907.org/reportingcentre
Robertson, K., Khoo, C., & Song, Y. (2020). To surveil and predict: A human rights analysis of algorithmic policing in Canada. Citizen Lab and the International Human Rights Program. https://citizenlab.ca/wp-content/uploads/2020/09/To-Surveil-and-Predict.pdf
Romer, D., & Jamieson, K. H. (2020). Conspiracy theories as barriers to controlling the spread of COVID-19 in the U.S. Social Science and Medicine, 263. https://doi.org/10.1016/j.socscimed.2020.113356 DOI: https://doi.org/10.1016/j.socscimed.2020.113356
Schiffler, Z. (2021, March 5). Timnit Gebru was fired from Google - then the harassers arrived. The Verge. https://www.theverge.com/22309962/timnit-gebru-google-harassment-campaign-jeff-dean
Shohamy, E. (2008). Language policy and language assessment: The relationship. Current Issues in Language Planning, 9(3), 363–373. https://doi.org/10.1080/14664200802139604 DOI: https://doi.org/10.1080/14664200802139604
van der Linden, S., Roozenbeek, J., & Compton, J. (2020). Inoculating against fake news about COVID-19. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.566790 DOI: https://doi.org/10.3389/fpsyg.2020.566790
W3Techs. (2021). Historical trends in the usage statistics of content languages for websites. Web technology surveys. https://w3techs.com/technologies/history_overview/content_language
Wachter-Boettcher, S. (2017). Technically wrong: Sexist apps, biased algorithms, and other threats of toxic tech. W.W. Norton & Company.
Wigglesworth, R. (2020, December 5). Robo-surveillance shifts tone of CEO earnings calls. Financial Times. https://www.ft.com/content/ca086139-8a0f-4d36-a39d-409339227832
World Health Organization. (2021). Infodemic. World Health Organization. https://www.who.int/health-topics/infodemic#tab=tab_1
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Mandy Lau
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial — You may not use the material for commercial purposes.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.