Optical Character Recognition Software (OCR) Information

Optical character recognition software (OCR) translates images of handwritten or typewritten text into machine-editable text, which can then be opened and used with desktop publishing software, word processing, and other text editors. OCR software can also convert scanned images into searchable PDF files.


Typically, organizations use optical character recognition software (OCR) to reduce data-entry errors and speed processing. OCR software makes documents human-readable by converting typewritten or handwritten documents into a digital format.


Typically, optical character recognition software (OCR) isolates the textual parts of a document from other elements, such as images, charts, and tables. Most OCR software allows users to select an entire document for scanning, or to select only specific parts or chapters. Search features vary among OCR systems. Some optical character recognition software (OCR) allows the data following a search to be stored for future use. After the text is selected, the optical character recognition software (OCR) analyzes and interprets each character. Such OCR software then checks whole words and matches them against a standard and/or custom dictionary.


Product specifications for optical character recognition software (OCR) include character recognition accuracy, page layout reconstruction accuracy, support for languages, speed, and operating system (OS). Support for searchable .pdf outputs and the quality of the user interface are also important considerations.


Some OCR software and OCR systems are capable of reproducing formatted outputs that closely approximate the original document, even in terms of images, columns, and other non-textual components. Pattern recognition, artificial intelligence, and machine vision are used to convert scanned images into text that is then added to searchable databases, allowing the retrieval of scanned images based upon their content. Additional considerations when selecting optical character recognition software (OCR) include the quality and contrast of the scanned image. As a rule, images that are dirty or damaged, or printed on wrinkled paper are more difficult for OCR software to detect. The contrast between text and background should be considered. For example, documents that consist of black text against a white background provide 100% contrast, thus increasing the probability that the optical character recognition software (OCR) will interpret the text properly.