Agha Ali Raza

Research

Research

Inclusive and equitable AI for the under-served: speech and language technologies, machine learning, and human-centered design that reach the people the digital revolution has so far left behind.

Using technology to alleviate inequitable access to information and connectivity in the world.

Roughly half the world's population, around 4.4 billion people, remains offline, excluded from the socio-economic benefits of the internet. These individuals are often low-literate, low-income, and technologically marginalized, and they lack access to timely information, online services, and digital participation. My research designs speech and language technologies for development (SLT4D) and AI for development (AI4D) that circumvent the traditional barriers of literacy, connectivity, and language resources, and so enable digital inclusion for those on the far side of the digital divide. I ground every project in real-world deployments across Pakistan, India, and parts of Africa, and pursue four interrelated themes.

1 · Speech & NLP for low-resource languages

Languages such as Urdu, spoken by tens of millions, have been historically underrepresented in NLP, leaving a dearth of data and tools. Early in my career I led the development of the first medium-vocabulary spontaneous speech-recognition system for Urdu, along with publicly released corpora and an automatic pronunciation-lexicon generator that have since become a backbone for Urdu ASR research. To overcome data scarcity, I pioneered methods to gather speech corpora through voice-based social networks: by instrumenting our telephone platforms for data collection, we crowdsourced 1,200 hours of spontaneous speech from 11,000 speakers across Urdu and nine regional languages, and showed that just 10 hours of this data was enough to train a highly accurate Urdu ASR model. Recent work integrates contemporary ML: self-supervised learning and efficient fine-tuning that select representative subsets of unlabeled audio, the unsupervised Urdu text simplifier SimplifyUR, and evaluations showing that specialist models fine-tuned on Urdu significantly outperform general-purpose LLMs, which underscores the need to localize foundation models for under-served languages.

2 · Voice-based HCI for under-served populations

For billions who lack smartphones, connectivity, or literacy, voice systems reachable through any telephone can be a lifeline. My work shows that simple IVR services can overcome the primary access barrier when they are designed around users' constraints and motivations. The key idea is viral entertainment as onboarding: rather than asking a rural, low-literate user to navigate a sterile menu, we embed services in playful, social experiences that teach themselves through use. Polly, a telephone voice chatbot, lets users record a message, apply amusing voice transformations, and forward it to friends. Seeded with just five users, it reached 165,000 within a year, at one point onboarding 1,000 new callers a day, and earned a Best Paper award at ACM CHI 2013. Building on this, Sawaal turns learning into a social, game-like quiz that also lets us measure knowledge gaps and retention in a completely voice-driven, illiteracy-proof way, and Karamad opens digital crowd work to anyone with a basic phone, paying workers in mobile airtime; in six months it drew 725 workers, including women, unemployed, visually impaired, and non-literate participants, who completed nearly 4,000 tasks.

3 · Democratizing services via viral IVR platforms

Designing a good voice service is only half the problem; the other half is scale and uptake. I build viral IVR platforms, telephone-based social networks that spread person to person and carry development content. In a large field experiment comparing seven advertising channels for a maternal-health hotline, the IVR entertainment service outperformed every other channel on nearly every acquisition metric: Polly achieved a 50% conversion rate, against under 0.1% for flyers or Facebook ads and about 25% for robocalls, a result that earned an Honorable Mention at CHI 2020. I have applied this paradigm to real services: an audio job portal inside Polly drew 34,000 job-seekers and 728 listings played over 386,000 times, which led to a one-million-euro GIZ grant; and Super Abbu, a maternal-health hotline aimed at fathers, reached 21,000 users in two months, 96% of them men, backed by UNICEF's Innovation Fund and now under a randomized controlled trial supported by the NIH. To sustain engagement, I built Baang, an inclusive audio social network where users from remote, minority, and visually impaired communities create, share, vote on, and moderate content, and where they have collectively raised their voices against harassment and hate speech.

4 · Countering misinformation & deepfakes

As speech and AI technologies spread, so do their harms. During COVID-19 I co-led a voice-based social platform for credible health information, a kind of voice Facebook that featured messages from doctors and nudged users toward verified content; over six months it received more than 500,000 calls from 12,000 mostly low-literate users, who posted 35,000-plus voice messages listened to millions of times, and many began recording their own myth-dispelling content (published at The Web Conference 2022). On audio deepfakes, my collaborators and I found that synthetic speech fails to reproduce certain fine-grained phonemes accurately: focusing a self-attention model on just the 16 most informative phonemes (fricatives like /s/ and nasals like /m/, /n/) improved detection to an equal error rate near 11.98% while yielding interpretable, forensic-grade explanations. Through CIPL and a national R&D grant, I am building core Urdu speech forensics (recognition, keyword spotting, speaker ID) for law enforcement. With the University of Washington, I also studied gender-based harassment on voice forums such as Baang and Sangeet Swara, documenting female participation often below 10% and the abuse that drives it (CHI 2019), a reminder that technical inclusion alone is not enough.

Future directions: AI4D

My work continues to straddle cutting-edge AI and on-the-ground development needs. I am building societally aware speech and language tools for health, education, agriculture, and financial inclusion, contextualized for local cultures and languages and grounded in rigorous field evaluation. I am pushing multilingual and culturally adaptive AI, including an ambitious effort to use voice social networks for language preservation, inviting speakers of endangered languages to share stories and folklore and so build digital archives while giving communities a voice. And I am developing lightweight, interpretable, and sustainable AI, through model compression, knowledge distillation, and data-efficient training, so that communities with limited computational resources can still benefit from modern AI in their own languages.

Funded over the years by the Gates Foundation, the NIH, the National Academies (NAKFI), Meta, Google, the UNICEF Innovations Fund, GIZ, and the HEC. See grants and publications for detail.