Hey—we've moved. Visit
The Keyword
for all the latest news and stories from Google
Official Blog
Insights from Googlers into our products, technology, and the Google culture
A recent improvement for Arabic searches
February 2, 2010
This post is the latest in an ongoing
series
about how we harness the data we collect to improve our products and services for our users. - Ed.
We've learned that when performing a search on Google, people sometimes forget to separate words with spaces. Moreover, people often mistakenly repeat a letter within a single word. For instance, when writing the query [amazingly beautiful poem], you might write it as [
amazingly beautiifullpoem
].
These types of errors are much more common in languages like Arabic, where most of the letters are
cursive
. That means that the shapes of the letters change, based on the position of the letter in the word (initial, middle, final or isolated). Moreover, some Arabic letters are considered word breaks, meaning that the following letter must be in an "initial" shape. In other words, if the last letter of one word is a word break, the following word may not be separated with a space.
For example, the queries [وزارةالتعليم] and [وزارة التعليم] have an identical meaning (Ministry of Education) and they're both written in a common form for Arabic documents. But they have different, albeit correct, formats — the first query is written as a single word, while the second is written as two. Google needs to understand that while they're written differently, they mean the same thing and should yield the exact same search results. In this example, both queries were written correctly, just in different formats. But sometimes people just make errors — like repeating the same letter twice. For example, you might write [راائعة الجماال], repeating the letter "ا" twice in both query words. In this case the correct spelling should be [رائعة الجمال]. It's important that Google search recognizes your query — despite spelling errors.
To address issues like this, we recently developed a search ranking improvement that targets certain Arabic queries. Our algorithm employs rules of Arabic spelling and grammar along with signals from historical search data to decide when to leave out spaces between words or when to remove unnecessarily repeated letters. Now, when you type a query leaving out spaces or repeating a letter, we'll return better results based not only on what you typed, but also on what our algorithm understands is the "correct" query. For example, here's what happens when you type [
قصيدة راائعةالجماال
] ([amazingly beautiful poem] in Arabic) with repeated letters and dropped spaces between words.
As you can see, the Google results contain the corrected query, the terms قصيدة رائعة الجمال, in bold.
For most people, this might seem like a small enhancement. But for us, it’s a big change. Our tests show we've improved search for 10% of Arabic language queries. Which, when you think about it, is a lot of people.
Posted by Moustafa Hammad and Mohamed Elhawary, Software Engineers, Search Quality Team
Labels
accessibility
41
acquisition
26
ads
131
Africa
19
Android
58
apps
419
April 1
4
Asia
39
books + book search
48
commerce
12
computing history
7
crisis response
33
culture
12
developers
120
diversity
35
doodles
68
education and research
144
entrepreneurs at Google
14
Europe
46
faster web
16
free expression
61
google.org
73
googleplus
50
googlers and culture
202
green
102
Latin America
18
maps and earth
194
mobile
124
online safety
19
open source
19
photos
39
policy and issues
139
politics
71
privacy
66
recruiting and hiring
32
scholarships
31
search
505
search quality
24
search trends
118
security
36
small business
31
user experience and usability
41
youtube and video
140
Archive
2016
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2005
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2004
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.