The following talk summarizes the evolution of our voice-based services for developing regions focusing on underserved populations over the last decade (2011 - 2021).
If you are looking for a quick reference, please go through the following CACM article:
For more details, please refer to the following book chapter:
Next we will go over the demos of some of our services.
Polly was the first in our line of voice-based services, accessible over simple mobile phones to disseminate useful, development-related information among offline and underserved populations. Polly is an IVR service (which means that all interactions with Polly take place over a regular phone call, the output is produced in the form of audio instructions from the service while users provide input by pressing keys on the numeric keypad of their phones). As the interaction takes place over regular phone calls, no internet connectivity and dedicated smartphones apps are required.
The objective of designing Polly was to overcome the hurdles of user training, trust, motivation, and advertisement among poorly connected target populations. Polly engages low-literate and nontech savvy users in light entertainment to spread instrumental development-related information to them as they became more comfortable with the voice interface. Polly allows users to record their voice, modify the recordings using funny voice modifications and send the modified recording to their friends. This simple entertainment appeals to the target users and allows Polly to spread virally via person-to-person sharing. It acts as a soft incentive for users to train themselves and overcomes the scalability hurdle of explicit user training. It also allows Polly to organically spread among the population through word-of-mouth as well as scheduled message deliveries from one user to another. As users become more comfortable with the interface, Polly introduces them to development-related services like job search and health information.
Polly was pilot-tested in 2011 with 32 low-literate users who were handed out Polly's phone number and were asked to explore its functionality. Within 3 weeks Polly had organically reached over 2,000 users and logged more than 10,000 calls before it was shut down due to insufficient telephone capacity and unsustainable cellular airtime costs. In 2012, Polly was relaunched in Pakistan with an increased telephone capacity. The system was seeded via automated phone calls to 5 users from the pilot launch. Within one week, the 30 phone lines were saturated and usage quotas had to be imposed. A job audio browser had also been introduced in Polly that allowed users to listen to entry-level job ads from local newspapers in their local language. Callers could browse job opportunities and could even forward promising ones to friends. Polly remained online for a year and amassed 165,000 users who participated in 636,000 interactions, including over 200,000 forwarded voice messages and 22,000 forwarded job ads. At its peak, it was spreading to a thousand new users every day. The 728 job ads were listened to 386,000 times by 34,000 users. Polly was used primarily by low-educated young men for entertainment and other creative uses like voicemail, group messaging, and telemarketing. Its viral spread crossed gender and age boundaries and also attracted a large number of visually impaired users, but remained primarily in similar socio-economic strata.
Polly has been successfully used in three countries to rapidly spread useful information to underserved populations at a large scale. In 2014, at the peak of the Ebola crisis in West Africa, Polly-Santé (Polly-Health) was deployed as an emergency disaster-response service in Guinea to spread reliable information about prevention, symptoms, and cure of Ebola. The information originated from the US Centers for Disease Control (CDC) and the service was funded by the US Embassy in Conakry. A hurdle to scale information dissemination in the Guinean context was great linguistic diversity and the lack of a widely understood common language. This did not turn out to be a major impediment for voice forums and Polly-Santé was launched in eleven local languages and reached more than 7,000 users within a few months. In India, Polly was used by Babajob.com to advertise a voice directory of available jobs to thousands of low-literate job seekers, and by Jharkhand Mobile Vaani, a popular citizen-radio over-phone platform, to spread awareness about their platform using a "cross-selling" model of advertisement. These deployments highlighted the significance of committed local partners and showed that seeding via promos and advertisements has the potential to induce viral spread and that the content, mood, and tone of the promos play a vital role in influencing a user's understanding of the service and its capabilities.
Since 2016, Polly has been active in Pakistan as a gateway to maternal health information for under-connected expectant parents. Polly advertises a hotline called Super Abbu (Super Dad) that allows expectant parents to record health questions that are answered by volunteer doctors. Voice interfaces allow for creating a "curtain" of anonymity between the parents and the doctors. Such private and anonymous access to trained gynecologists allows parents to ask questions around pregnancy and childbirth that are often considered sensitive and even taboo topics in the local context. The service specifically targets fathers to promote paternal participation and allows them to share their experiences with their peers. In its initial deployment, via Polly and cold calls, Super Abbu reached 21,000 users (96% men) in just two months, uncovering a pent-up demand for maternal health information and giving the target population an agency to anonymously access culturally sensitive yet life-saving reliable information.
The following papers summarize the various deployments and aspects of Polly:
Baang was our first voice-based social platform for underserved populations. Like a voice-based "Reddit" it allows users to record audio messages and listen to messages recorded by others. After listening to a message, users can press keys to like and dislike the message, to report it for abuse, to post audio comments on it, and to share it with friends by entering their phone numbers. Baang combines a sustainable lure (social connectivity) with a Polly-like spread, creating a platform with high user retention and engagement. Deployed in Pakistan in 2015, Baang organically reached 10,000 users (69% of them blind) within 8 months who participated in nearly 270,000 calls and contributed more than 44,000 voice messages that were played more than 2.8 million times and received 340,000 votes, 124,000 audio comments, and about 95,000 shares. Unlike users of Polly that churn at a high rate as the novelty wears off, Baang saw a much lower churn rate and high user engagement throughout its deployment. The differences between the two services became more pronounced after a week when more than 20% of users returned to Baang, while Polly only retained less than 5% of its users beyond the initial week of exposure. Up to 20% of Baang's users (compared to 1% to 3% in the case of Polly) kept returning after four weeks. Quantitative analysis of usage patterns revealed that user retention was highly associated with the act of posting audio comments. Interestingly, posting of comments was found to be a better predictor of continued use compared to any other single action of the platform including the posting of messages. The viral spread of Baang was found to be largely through the message sharing feature where 60% of all new users were introduced to the platform through forwarded messages.
A thematic analysis of user interactions and the content posted by users on Baang showed that Baang created a community of users from diverse socio-economic and linguistic backgrounds including 69% blind people, 10% females, and mostly low-educated, unemployed, young men from all over Pakistan. Baang's open community included people from remote areas and linguistic minorities. Social network features like voting, content sharing, and voice comments led to viral and enthusiastic uptake of the service, high user engagement and retention, and true dialog among the community. Baang provided a window into the collective values of a community as they raised their voice against disability abuse, female harassment, foul language, hatred, terrorism and united for their rights and in support of the oppressed. Baang showed that orality-driven social platforms have the potential to provide under-connected and tech-naive individuals with a voice and social identity.
In 2020 we made COVID information available on Baang. Over the next six months, Baang received nearly half a million calls from over 12,000 users who recorded over 35,000 posts and 156,000 comments, contributed 322,000 votes, and listened to posts over 2.4 million times. Our seven COVID information posts were played 46,488 times by 4,233. About half of the users who listened to any user-generated post also listened to at least one COVID post. These posts were shared 8,629 times by 748 users with 2,951 recipients, liked 2,080 times, and disliked 397 times. Users also posted 1,425 audio comments on these posts. We found that 178 users recorded 390 posts related to COVID, of which 41 were found to contain misinformation (nearly 10% of all COVID posts and 0.19% of all tagged posts) and were immediately removed from the platform. The remaining 349 COVID posts were played 24,412 times by 1,111 users who shared them 1,499 times, liked them 2,168 times, disliked them 603 times and posted 1,454 audio comments. The recorded posts spanned 274 hours of audio data, with 5.4 hours of COVID-related content and 21 minutes of content containing misinformation. Similarly, the audio comments spanned 935 hours, with nearly four hours of COVID-related comments, and 30 minutes of comments containing misinformation.
The following papers summarize the various deployments and aspects of Baang:
Our next service, a voice-based quiz platform, Sawaal, combined community knowledge-gap discovery, information dissemination, and knowledge retention measurement into one service. The deployment of Sawaal shows that voice-based quizzes over simple mobile phones, consisting of multiple-choice questions, can be used to simultaneously measure the existing knowledge gaps as well as to disseminate targeted information. Rephrased versions of quiz questions are repeated at regular intervals to measure retention of the conveyed information. Long-term user engagement is encouraged by allowing users to contribute their questions, and via social connectivity, gamification, and spirit of competition that makes the service engaging and fun for the target audience. Sawaal allows its open community of users to post and attempt multiple-choice questions and to vote and comment on them. Sawaal was designed to spread virally like Polly as users challenge friends via shared quizzes and compete for high scores. Admin-posted questions allow discovering knowledge gaps, spreading correct information, and measuring knowledge-retention via rephrased, repeated questions. Community-contributed quiz content and the ability to play against friends for high scores, lead to inclusion and ownership, active collaborative learning, and a spirit of competition among the users. Sawaal spread organically among the target audience, received an enthusiastic response, and successfully retained a significant fraction of the users for several weeks. Within 14 weeks and with no advertisement, Sawaal reached 3,400 users (120,000 calls) in Pakistan, who contributed 13,000 questions that were attempted over 450,000 times by 2,000 users. Knowledge retention remained significant for up to two weeks. Surveys revealed that 71% of the mostly low-literate, young, male users were blind.
The following paper summarizes Sawaal:
Crowdsourcing enables the completion of large-scale and hard-to-automate tasks while allowing people to earn money. However, 3.6 billion people - a workforce comprising 46.4% of the world population - who could benefit most from this source of income lack the access and literacy to use computers, smartphones, and the internet. Our first voice-based crowdsourcing platform Karamad allows workers in low-resource regions to complete crowd work using low-end phones and receive payments as mobile airtime balance. Our work on Karamad explored the usefulness, scalability, and sustainability of a crowd-sourcing platform in Pakistan through a 6-month deployment. Without any advertising, training, or airtime subsidies, Karamad organically engaged 725 workers who completed 3,939 tasks (involving 43,006 components) including translations, dataset generation, and surveys on demographics, accessibility, disability, health, employment, and literacy. Collectively, the workers produced a valuable service market for potential customers and included female, unemployed, non-literate, and blind users.
Despite the lack of any advertisement, airtime subsidy, and viral interface features, users discovered Karamad on their own and spread the word about it to others. Our interview participants informed us that their peers had also posted about Karamad on other voice forums. Over the 6 months of deployment, Karamad reached 958 unique users out of whom 725 attempted at least one task and 671 completed at least one task. These users called Karamad 16,355 times and contributed 3,939 survey responses (on average 213 responses for each of the 19 tasks). Through these responses, we gathered 43,006 answers to 267 questions, including 16.20 hours of speech data in response to open-ended questions, and 1,360 audio translations via translation tasks.
The following paper summarizes Karamad:
Urdu Speech Recognition (speech to text)
Urdu Text to Speech