What is OCR?

OCR stand for Optical Character Recognition is the process of identify and extract the text from the image using artificial neural network .

How it works ?

There many algorithms of OCR but of of them is Image-based Sequence Recognition . In the paper they use Architecture called CRNN, consist of three parts: 1) convolutional layers, which extract a feature sequence from the input image; 2) recurrent layers, which predict a label distribution for each frame, meaning we have set of all the characters we want our model to know like : a, b, c, …z?!^$ ,etc. 3) transcription layer : translate the per-frame prediction into the final label sequence (final text) .

Feature Extraction

The first layer in the CRNN model, feature extraction construct of convolution and max-pooling layers. These components used to extract a sequencial feature representation from the input image .

The input image first resized to fixed dimenstion before feeding it to the network. Convolution and max-pooling layers extract feature vectors that will be feeded to the next layer (recurrent layers)

Sequence Labeling

A deep bidirectional recurrent layer is used in this phase (LSTM).

This layer predict label distrubition for each frame in the feature sequence by convolution layers.

Why LSTM ?

LSTM is type of recurrent neural network , which is a powerful for capturing contextual information within a sequence . LSTM allows the previous outputs to be used as inputs while having hidden state that decides what to store and what to forget since we have sequence of features that needs the context to perfom better at label distrubition .

Transciption

After having sequence of label distributions, we need something that convert this inputs to text . By finding the label sequence with highest probability conditioned on the per-frame prediction , remove blanks and repeated characters, then we decode the character index to its character name until we have the final results .

And that is how OCR works .

Resources:

How it works ?

Feature Extraction

Sequence Labeling

Why LSTM ?

Transciption

Table of Contents