Google has been in the Optical Character Recognition (OCR) business for some time now, and today they’ve updated this feature to support 29 new languages. In September of 2009 Google acquired reCAPTCHA, and since then they’ve been simultaneously fighting spam while improving translation via the words that people type in. The update we’re seeing today, is fruit of the work they’ve done since bringing in the technology.
What is OCR?
If you’re just joining us, Optical Character Recognition is an automated system that translates an image of text into encoded selectable text. Google uses OCR to scan your pictures and PDF files, it then turns the scan into an editable Google Doc format. Over the past 2 years, Google has been using human input from reCAPTCHA puzzles to increase their success at identifying complex words.
What Languages were added?
Along with the additional languages, Google also improved OCR quality for the 5 previously implemented languages: English, Italian, German, Spanish, and French. The 29 new languages that have been added are the following:
3. Chinese (Simplified Han)
When uploading images or PDF files to Google Docs, be sure to Select the language that the text in your file is written in! To do so, put your file in queue to be uploaded, then Check the box for Convert text from PDF or images files to Google Docs documents. A Document Language drop-down menu will appear, there you can Select your language.
Have you tried out Google’s OCR technology for scanning old family journals, books, or whatever else you have laying around the house? You can also try it out on your iPhone or Android phone if you have the Google Goggles app!