In 2021, around 7000 languages were still spoken in this world. At the same time, the extinction of minority languages is occurring at a rate that rivals the loss of our biodiversity (Hammerström et al. 2021). Although the digitization of languages is often invoked as a means to halt this process, AI-based language technology is currently limited to only 3% of the world's 7000 languages (Yoshi et al. 2020). In response to this digital language divide, efforts to extend the reach of large language models are mushrooming. But rather than collaborating with the communities whose languages they capture, many of these efforts tend to adopt top-down approaches. Another, closely related problem is that almost all AI language technologies are first and/or, if multilingual, still primarily trained on English language data. This leads to "language modeling bias" - a specific form of bias where language technologies favor certain languages, dialects or sociolects over others, negatively impacting the communication opportunities of speakers of the marginalized language (Bella et al. 2024, Helm et al. 2024). How can we harness new socio-technical assemblages in the name of language diversity despite these problems and how can we ensure that the benefits of these efforts are on the side of marginalized speaker communities? This workshop will bring together interdisciplinary researchers from linguistics, computer science, ethics, and anthropology, with Indigenous leaders from the Amazon to discuss this question.
Note on Format:
The following will be informal presentations of our related work. These presentations should be short enough to leave enough place for discussions. We can interrupt each other with questions at all times. We discuss everything that comes up, as we go, to keep the joint thought process dynamic and interactive.
9.45 Introduction (Paula Helm, UvA)
10.00 Josias Sateré (Indigenous Leader of the Sateré-Mawé) with Adriano Da Silva (Linguist, University of the Amazon)
10.45 Roanne (Anthropology, UvA)
11.00 Coffee Break
11.30 Eva van Lier (Linguistics, UvA) & Kees Hengeveld (Linguistics, UvA)
12.00 Eline Visser (Linguistics, Oslo)
12.30 Gabor Bellá (Computational Linguistics, Ecolé Polytechnique Atlantique)
13.00-14.30 Lunch Break
14.30 Paola Ricaurte (Media Studies, Harvard)
15.00 Joao Sateré (Indigenous Leader, Sateré-Mawé), Translator/Commentator: Beatrice Bonami (Media Studies, UvA)
16.00 Paula Helm (Media Studies/Data Ethics, UvA)
16.30 Get Together
For those wishing to join mail to: Paula Helm p.m.helm@uva.nl