Hey—we've moved. Visit
The Keyword
for all the latest news and stories from Google
Official Blog
Insights from Googlers into our products, technology, and the Google culture
Google and open source OCR
June 21, 2007
Posted by T.V. Raman, Research Scientist
From time to time, our own
T.V. Raman
shares his tips on how to use Google from his perspective as a technologist who cannot see -- tips that sighted people, among others, may also find useful. - Ed.
As someone who cannot see, I prefer to live in a mostly paperless world. This means ruthlessly turning every piece of paper that enters my life into a set of bits that I can process digitally. I scan in everything. Until now, I have relied on commercial OCR packages to convert these images into readable text. OCR is perhaps one of the areas where the benefits of
Moore's Law
are most evident; today, OCR can do remarkably well when handed a page image. Until now, my only dissatisfaction with the status quo in this area has been that commercial OCR engines afford me little flexibility with respect to training them to do better on documents that are specific to me.
The advent of our own open source OCR initiative,
OCRopus
(source code:
Ocropus Sources
) is a welcome change in this regard. I introduced support for
OCRopus in Emacspeak
recently, and the HTML output this produces compares favorably with output from commercial OCR engines, provided you place the page at the right orientation on the scanner. OCRopus' extensibility, and the ability to express the OCR as a structured HTML document makes it an ideal starting point for producing rich spoken output. The possibilities are enormous for people being able to collectively train, customize and improve an OCR engine.
Labels
accessibility
41
acquisition
26
ads
131
Africa
19
Android
58
apps
419
April 1
4
Asia
39
books + book search
48
commerce
12
computing history
7
crisis response
33
culture
12
developers
120
diversity
35
doodles
68
education and research
144
entrepreneurs at Google
14
Europe
46
faster web
16
free expression
61
google.org
73
googleplus
50
googlers and culture
202
green
102
Latin America
18
maps and earth
194
mobile
124
online safety
19
open source
19
photos
39
policy and issues
139
politics
71
privacy
66
recruiting and hiring
32
scholarships
31
search
505
search quality
24
search trends
118
security
36
small business
31
user experience and usability
41
youtube and video
140
Archive
2016
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2005
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2004
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.