Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Optical character recognition
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Pre-processing=== OCR software often pre-processes images to improve the chances of successful recognition. Techniques include:<ref name="nicomsoft">{{cite web|url=https://www.nicomsoft.com/optical-character-recognition-ocr-how-it-works/ |title=Optical Character Recognition (OCR) β How it works |publisher=Nicomsoft.com |access-date=2013-06-16}}</ref> * De-[[Skew (fax)|skewing]]{{spaced ndash}}if the document was not aligned properly when scanned, it may need to be tilted a few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical. * [[Despeckle|Despeckling]]{{spaced ndash}}removal of positive and negative spots, smoothing edges * Binarization{{spaced ndash}}conversion of an image from color or [[greyscale]] to black-and-white (called a [[binary image]] because there are two colors). The task is performed as a simple way of separating the text (or any other desired image component) from the background.<ref name="Sezgin2004">{{cite journal|last1=Sezgin|first1=Mehmet|last2=Sankur|first2=Bulent|date=2004|title=Survey over image thresholding techniques and quantitative performance evaluation|url=http://webdocs.cs.ualberta.ca/~nray1/CMPUT605/track3_papers/Threshold_survey.pdf|journal=Journal of Electronic Imaging|volume=13|issue=1|page=146|bibcode=2004JEI....13..146S|doi=10.1117/1.1631315|archive-url=https://web.archive.org/web/20151016080410/http://webdocs.cs.ualberta.ca/~nray1/CMPUT605/track3_papers/Threshold_survey.pdf|archive-date=October 16, 2015|access-date=2 May 2015}}</ref> The task of binarization is necessary since most commercial recognition algorithms work only on binary images, as it is simpler to do so.<ref name="Gupta2007">{{cite journal|last1=Gupta|first1=Maya R.|last2=Jacobson|first2=Nathaniel P.|last3=Garcia|first3=Eric K.|date=2007|title=OCR binarisation and image pre-processing for searching historical documents.|url=http://www.rfai.li.univ-tours.fr/fr/ressources/_dh/DOC/DocOCR/OCRbinarisation.pdf|journal=Pattern Recognition|volume=40|issue=2|page=389|doi=10.1016/j.patcog.2006.04.043|bibcode=2007PatRe..40..389G|archive-url=https://web.archive.org/web/20151016080410/http://www.rfai.li.univ-tours.fr/fr/ressources/_dh/DOC/DocOCR/OCRbinarisation.pdf|archive-date=October 16, 2015|access-date=2 May 2015}}</ref> In addition, the effectiveness of binarization influences to a significant extent the quality of character recognition, and careful decisions are made in the choice of the binarization employed for a given input image type; since the quality of the method used to obtain the binary result depends on the type of image (scanned document, [[scene text]] image, degraded historical document, etc.).<ref name=Trier1995>{{cite journal|last1=Trier|first1=Oeivind Due|last2=Jain|first2=Anil K.|title=Goal-directed evaluation of binarisation methods.|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=1995|volume=17|issue=12|pages=1191β1201|url=http://heim.ifi.uio.no/inf386/trier2.pdf |archive-url=https://web.archive.org/web/20151016080411/http://heim.ifi.uio.no/inf386/trier2.pdf |archive-date=2015-10-16 |url-status=live|access-date=2 May 2015|doi=10.1109/34.476511}}</ref><ref name="Milyaev2013">{{cite book|last1=Milyaev|first1=Sergey|last2=Barinova|first2=Olga|last3=Novikova|first3=Tatiana|last4=Kohli|first4=Pushmeet|last5=Lempitsky|first5=Victor|title=2013 12th International Conference on Document Analysis and Recognition |chapter=Image Binarization for End-to-End Text Understanding in Natural Images |date=2013|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/mbnlk_icdar2013.pdf |archive-url=https://web.archive.org/web/20171113184347/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/mbnlk_icdar2013.pdf |archive-date=2017-11-13 |url-status=live |pages=128β132|doi=10.1109/ICDAR.2013.33|isbn=978-0-7695-4999-6|s2cid=8947361|access-date=2 May 2015}}</ref> * Line removal{{spaced ndash}}Cleaning up non-glyph boxes and lines * [[Document Layout Analysis|Layout analysis]] or zoning{{spaced ndash}}Identification of columns, paragraphs, captions, etc. as distinct blocks. Especially important in [[Column (typography)|multi-column layouts]] and [[Table (information)|tables]]. * Line and word detection{{spaced ndash}}Establishment of a baseline for word and character shapes, separating words as necessary. * Script recognition{{spaced ndash}}In multilingual documents, the script may change at the level of the words and hence, identification of the script is necessary, before the right OCR can be invoked to handle the specific script.<ref>{{Cite journal |last1=Pati |first1=P.B. |last2= Ramakrishnan |first2=A.G. |title=Word Level Multi-script Identification |date=1987-05-29 |journal=Pattern Recognition Letters |volume=29 |issue=9 |pages=1218β1229 |doi=10.1016/j.patrec.2008.01.027|bibcode=2008PaReL..29.1218P }}</ref> * Character isolation or segmentation{{spaced ndash}}For per-character OCR, multiple characters that are connected due to image artifacts must be separated; single characters that are broken into multiple pieces due to artifacts must be connected. * Normalization of [[aspect ratio]] and [[Scale (ratio)|scale]]<ref>{{cite web|url=http://blog.damiles.com/2008/11/20/basic-ocr-in-opencv.html |title=Basic OCR in OpenCV | Damiles |publisher=Blog.damiles.com |access-date=2013-06-16|date=2008-11-20 }}</ref> Segmentation of [[fixed-pitch font]]s is accomplished relatively simply by aligning the image to a uniform grid based on where vertical grid lines will least often intersect black areas. For [[proportional font]]s, more sophisticated techniques are needed because whitespace between letters can sometimes be greater than that between words, and vertical lines can intersect more than one character.<ref name="Tesseract overview" />
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Optical character recognition
(section)
Add topic