• In the telemedicine project just mentioned, with the Aravind Eye Hospital in Tamil Nadu, India, our goal is to diagnose patients that are too far from the hospital to visit in person ( www.aravind.org).
• In the village kiosks of the M.S. Swaminathan Research Foundation (MSSRF), also in southern India, our goal is to use speech recognition for semiliterate villagers ( www.mssrf.org).
• In a rural networking project in Ghana, we're looking at low-cost wireless backhauling of IP traffic (that is, carrying aggregate Internet traffic for a group, such as an ISP).
• In an Asia Foundation project in rural Cambodia, we're focusing on improved email and content distribution applications over very poor networks ( www.asiafoundation.org).
Recruiting. In Tamil Nadu, recording illiterate speakers saying digits (zero to 10) took approximately six times as long as it did for literate speakers. The discrepancy was due to difficulties explaining the task's purpose, the protocol constraints (no reading), the uneducated participants' inflexible and demanding occupations, and apprehension. In addition, illiterate speakers had limited access to quiet spaces and longer social protocols for foreign visitors.
In Ettimadai, for example, a small village with a large engineering school, our local contact and interpreter—a university student who had grown up in a nearby village—recruited (by word of mouth) and arranged short appointments with both literate and illiterate individuals. Participation was on a volunteer basis, as local contacts had suggested. On the first day, the villagers with little or no education all failed to show up for their appointments. The next day, we found that four people had missed the appointment due to work, one had to tend to a sick child, and one reported feeling afraid of what might happen to his voice. When we worked alongside trusted organizations that serve the rural poor (MSSRF and Aravind eye camps), we were much more successful in recruiting and recording villagers, especially illiterate volunteers.
Recording. Illiterate participants in speech studies require novel speech-collection techniques. How do you elicit a particular word from a speaker without saying the word yourself, thus influencing the speaker's word choice and pronunciation? For literate or semiliterate participants, bilingual flashcards with digits written in both numerical and orthographic form were randomized and shown to speakers one at a time, who were recorded while saying the number aloud. If a participant couldn't read the flashcards, a researcher or interpreter translated the flashcards into a display of fingers (see figure 4). (A fist represented zero.) Unfortunately, we failed to anticipate the cognitive difference between the task of reading numbers aloud and counting fingers, 4 which has significant effects in the speech domain, such as more disfluencies (filled pauses, false starts, and repetitions) and shouting—for example, "um, thr … no! FOUR! four." In combination with environmental conditions, the technique failed to be free of external linguistic influence: recordings often took place outside or in a crowded room, where the task of guessing the number of displayed fingers was often irresistible to "helpful" bystanders.
We had anticipated limited infrastructure and unfamiliarity with technology among rural villagers, however, and built a custom microphone embedded in a telephone handset. The handset was easy for all speakers to use, even those who had seen but never used a phone. It captured quality speech recordings in various environmental conditions (average signal-to-noise ratio was 29 dB), avoided the need to fasten equipment to the speaker's clothing, and didn't require the use of a table.
User testing. Another challenge we found in exploring speech technology for illiterate users was in user-testing a spoken dialog system that provides market prices of agricultural commodities and weather. We conducted a Wizard-of-Oz study, 5 which required one interpreter, one researcher acting as the speech recognizer, and another bilingual researcher observing participants and recording critical incidents. After a short demo by the bilingual researcher, we asked subjects to perform three tasks (for example, "find the weather for today"). The three illiterate participants who agreed to participate performed comparably to literate users. 6 However, they had more difficulty understanding the nature of a "task" and instead explored the system out of interest or correctly completed unassigned tasks.
In addition, illiteracy appears to affect a speaker's attention to linguistic form. One participant who had never been to school repeatedly used the formal form for "yes" ("amaanga"), which the system didn't recognize. Despite the system's explicit prompts for "amaam," the recognizable form, and multiple instructions from a bystander to say "amaam" instead, she continued to say "amaanga," until finally quitting the system. This experience, along with results from previous user interface studies 7 and cognitive studies that target illiteracy, 8 suggest that successful user interface solutions for illiterate users will rely on words and interactions that are meaningful and relevant to everyday language use and won't require illiterate users to memorize icons or command words. Our protocol also included a questionnaire intended to collect linguistic, educational, and regional information. The participants with very little education completed only one of the three tasks, then politely excused themselves. They didn't complete the questionnaires.
Overall, we learned that individuals with little education can navigate through a dialog system that primarily responds to local terms for "yes" and "no" with very little training. The lengthy (45 minutes) experimental protocol, however, favored literate speakers and left us unable to make quantitative conclusions about differences between the two groups. We also found that traditional data-collection and user techniques favor literate speakers. We're exploring simple, robust speech technologies that adapt to users, thus avoiding the time-consuming, artificial step of recording disjointed instances of speech, which is particularly ill-suited to an illiterate population. Instead, by integrating data collection into a spoken dialog system, we can meet the user's needs (gaining access to relevant information) and the system's needs (gathering speech instances to enable recognition) simultaneously. 9
• Plan hard, but be flexible. Our plans always change in the field because of equipment failures, power or staff problems, or even local crises, and thus agility is probably the most important goal. Nonetheless, detailed planning is worthwhile, exactly because there are so many details: shipping ahead, spare parts and tools, transportation, and time with local partners.
• Time dilation. Everything takes longer than expected, which is frustrating given the limited time students have in the field. Often our goals for a trip slip, and we must complete them from abroad or wait until we can return.
• Bulletproofing. We spend much more time than usual getting our systems to work reliably and to enable remote management when possible. The distance and lack of local IT staff place a premium on robustness of all kinds, including packaging, theft prevention, and power.
• Simple UIs. Similarly, for both end users and IT staff we need simple management UIs, especially for Linux. We try to put all the common management tasks into a simple Web-based UI.
• Local partners. We depend on strong, long-term relationships with local partners such as the Aravind Eye Hospital. Our partners provide design input, vendor selection, deployment help, long-term maintenance, and respect within the community. None of this work is possible without them, and they are generally both capable and excited.
Eric Brewer is a professor in the Computer Science Department of the University of California, Berkeley, as well as director of Intel Research Berkeley. In addition to technology for developing regions, he studies Internet systems and programming languages. He received his PhD in electrical engineering and computer science from MIT. He's a member of the IEEE and the ACM. Contact him at 623 Soda Hall, UC Berkeley, Berkeley, CA 94720-1776; firstname.lastname@example.org.
Michael Demmer is a doctoral candidate at UC Berkeley and an intern at Intel Research Berkeley. His research interests include delay- and disruption-tolerant networking, distributed systems for unusual or challenged network environments, and the application of technology in developing regions. He received his BS from Brown University. Contact him at 74 Chattanooga St., San Francisco, CA 94114; email@example.com.
Melissa Ho is a graduate student in UC Berkeley's School of Information, researching communications and healthcare infrastructure for developing countries. She received her MSc in data communications, networks, and distributed systems from University College London. Contact her at Intel Research Berkeley, 2150 Shattuck Ave., Ste. 1300, Berkeley, CA 94704; firstname.lastname@example.org.
R.J. Honicky is a doctoral student at UC Berkeley, researching environmental sensors embedded in location-aware cell phones. He received his master's in computer science from UC Santa Cruz. Contact him at 545S Cory Hall, UC Berkeley, Berkeley, CA 94720-1776; email@example.com.
Joyojeet Pal is a doctoral student in city and regional planning at UC Berkeley, researching community technology initiatives in developing regions. Contact him at 228 Wurster Hall, UC Berkeley, Berkeley, CA 94720-1850; firstname.lastname@example.org.
Madelaine Plauché is a postdoctoral researcher of linguistics at the International Computer Science Institute, where she investigates speech tools for limited-resource environments and speech disfluencies. Contact her at ICSI, 1947 Center St., Ste. 600, Berkeley, CA 94704-1198; email@example.com.
Sonesh Surana is a doctoral student in UC Berkeley's Technology and Infrastructure for Emerging Regions research group. His current work focuses on low-cost networking infrastructure, including using Wi-Fi for long distances and cellphones for rural data collection. He received a BS in computer science from Carnegie Mellon University. Contact him at firstname.lastname@example.org.