Fast and accurate scene text understanding with image binarization and off-the-shelf OCR

Sergey Milyaev, Olga Barinova, Tatiana Novikova, Pushmeet Kohli, Victor Lempitsky

    Research output: Contribution to journalArticlepeer-review

    16 Citations (Scopus)

    Abstract

    While modern off-the-shelf OCR engines show particularly high accuracy on scanned text, text detection and recognition in natural images still remain a challenging problem. Here, we demonstrate that OCR engines can still perform well on this harder task as long as an appropriate image binarization is applied to input photographs. We propose a new binarization algorithm that is particularly suitable for scene text and systematically evaluate its performance along with 12 existing binarization methods. While most existing binarization techniques are designed specifically either for text detection or for recognition of localized text, our method shows very similar results for both large images and localized text regions. Therefore, it can be applied to large images directly with no need for re-binarization of localized text regions. We also propose the real-time variant of this method based on linear-time bilateral filtering. Evaluation across different metrics on established natural image text recognition benchmarks (ICDAR 2003 and ICDAR 2011) shows that our simple and fast image binarization method combined with off-the-shelf OCR engine achieves state-of-the-art performance for end-to-end text understanding in natural images and outperforms recent fancy methods.

    Original languageEnglish
    Pages (from-to)169-182
    Number of pages14
    JournalInternational Journal on Document Analysis and Recognition
    Volume18
    Issue number2
    DOIs
    Publication statusPublished - 15 Jun 2015

    Keywords

    • Document binarization
    • Natural scene binarization
    • Scene text localization

    Fingerprint

    Dive into the research topics of 'Fast and accurate scene text understanding with image binarization and off-the-shelf OCR'. Together they form a unique fingerprint.

    Cite this