Optical Character Recognition

OCR stands for Optical Character Recognition. It is a technology that converts different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera, into editable and searchable data.

OCR is used to electronically or mechanically convert images of typed, handwritten or printed text into machine-encoded text. This could be from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image. It’s widely used as a form of data entry from printed paper data records, whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed online, and used in machine processes such as cognitive computing, machine translation, text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision2. Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of accuracy for most fonts are now common.