Hey—we've moved. Visit
The Keyword
for all the latest news and stories from Google
Official Blog
Insights from Googlers into our products, technology, and the Google culture
Voice Search in underrepresented languages
November 9, 2010
(Cross-posted from the
Google Research Blog
)
Welkom*!
Today we’re introducing Voice Search support for Zulu and Afrikaans, as well as South African-accented English. The addition of Zulu in particular represents our first effort in building Voice Search for
underrepresented languages
.
We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines. Underrepresented languages have also often received little attention from the speech research community. Their phonetics, grammar, acoustics, etc., haven’t been extensively studied, making the development of ASR (automatic speech recognition) voice search systems challenging.
We believe that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling
information access for everybody
. Indeed, these technologies have the potential to break language barriers.
We also think it’s important that researchers in these countries take a leading role in advancing the state of the art in their own languages. To this end, we’ve collaborated with the Multilingual Speech Technology group at
South Africa’s North-West University
led by Prof. Ettiene Barnard (also of the
Meraka Research Institute
), an authority in speech technology for South African languages. Our development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnard’s team, we collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an
Android app
specifically designed for audio data collection.
For Zulu, we faced the additional challenge of few text sources on the web. We often analyze the search queries from local versions of Google to build our lexicons and language models. However, for Zulu there weren’t enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, we had to build a truly hybrid recognizer, allowing free mixture of both languages. Our phonetic inventory covers both English and Zulu and our grammars allow natural switching from Zulu to English, emulating speaker behavior.
This is our first release of Voice Search in a native African language, and we hope that it won’t be the last. We’ll continue to work on technology for languages that have until now received little attention from the speech recognition community.
Salani kahle!**
* “Welcome” in Afrikaans
** “Stay well” in Zulu
Posted by Pedro J. Moreno, Staff Research Scientist and Johan Schalkwyk, Senior Staff Engineer
Labels
accessibility
41
acquisition
26
ads
131
Africa
19
Android
58
apps
419
April 1
4
Asia
39
books + book search
48
commerce
12
computing history
7
crisis response
33
culture
12
developers
120
diversity
35
doodles
68
education and research
144
entrepreneurs at Google
14
Europe
46
faster web
16
free expression
61
google.org
73
googleplus
50
googlers and culture
202
green
102
Latin America
18
maps and earth
194
mobile
124
online safety
19
open source
19
photos
39
policy and issues
139
politics
71
privacy
66
recruiting and hiring
32
scholarships
31
search
505
search quality
24
search trends
118
security
36
small business
31
user experience and usability
41
youtube and video
140
Archive
2016
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2005
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2004
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.