SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Optical Character Recognition (OCR)
Introduction & Overview

Michael Fuchs
Senior Product Marketing Manager
ABBYY Europe

fuchs@abbyy.com
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Agenda
          ABBYY Technology in the IMPACT project
          Who is ABBYY?
           Company Overview
           Product Overview
           How is OCR used in real-life scenarios?

          Optical Character Recognition - Basics
           What is OCR?
           How does OCR work inside?
           OCR = Only Character Recognition?
           IMPACT – the areas of improvement

          Questions & Answers


IMPACT + ABBYY - OCR Introduction & Overview                                                                                                             2
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




                                 IMPACT & ABBYY




IMPACT + ABBYY - OCR Introduction & Overview                                                                                                             3
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




Improving Access to Text
      Mission of IMPACT: It aims to significantly improve access to historical text and
       remove the barriers that stand in the way of the mass digitisation of the
       European cultural heritage.

      Partners:
        Koninklijke Bibliotheek, The British Library, Österreichische Nationalbibliothek, Universität Innsbruck,

        Deutsche Nationalbibliothek, Bayerische Staatsbibliothek, Staats- und Universitätsbibliothek Göttingen

        ABBYY, IBM Israel – Science and Technology Ltd, Instituut voor Nederlandse Lexicologie

        National Centre for Scientific Research "Demokritos“,

        Centrum für Informations- und Sprachverarbeitung, University of Munich

        University of Bath, University of Salford, Bibliothèque Nationale de France

      Web: www.impact-project.eu



IMPACT + ABBYY - OCR Introduction & Overview                                                                                                             4
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




IMPACT & ABBYY
     ABBYY is the OCR technology provider for IMPACT members

     IMPACT members work with ABBYYs OCR SDK (FineReader Engine),
      because:
              Only development toolkits allow developers to combine new/different modules,
               for example: complex dictionaries
              Scientific research & tests have to be implemented in custom modules




IMPACT + ABBYY - OCR Introduction & Overview                                                                                                             5
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.




IMPACT & ABBYY
     ABBYY improves the OCR core technologies for the recognition
      of old documents, current focus areas are
              Image pre-processing
              Character recognition

     IMPACT currently focuses on research and not in setting up
      a production system ;o)

     Improvements in ABBYY recognition technologies that are a
      result of the IMPACT project will be added to future products
              Important: ABBYY FineReader 8/9/10 Professional (Box) has NO Fraktur OCR
              Fraktur OCR is only available in Recognition Server und FineReader Engine



IMPACT + ABBYY - OCR Introduction & Overview                                                                                                             6
ABBYY – an Overview




IMPACT + ABBYY – OCR Introduction & Overview
Who is ABBYY?

   Leading developer of artificial intelligence software in document
    recognition, data capture and linguistics

   Headquartered in Moscow, Russia

   Founded in 1989 by Mr. David Yang as BIT Software

   More than 880 employees worldwide

   8 offices worldwide

   Established sales and distribution network in more than
    130 countries worldwide



ABBYY & OCR for IMPACT
ABBYY Worldwide




                                                   ABBYY Headquarters/ ABBYY Russia
                           ABBYY Europe UK         Moscow
               Fremont
               ABBYY USA    ABBYY Europe GmbH   ABBYY Ukraine              ABBYY Japan
                            Munich, Germany     Kiev
                                                                       ABBYY Taiwan




ABBYY & OCR for IMPACT
ABBYY in Western Europe
 ABBYY Europe GmbH

    Located in Munich, Germany

    Established in 2001

    Serves partners and customers in Western European countries

    Sales and Marketing
           Sales
              ●     Distribution, channel development, partner management

           Marketing
              ●     Product marketing, channel marketing, outbound marketing (PR, advertising, direct)


    More than 50 employees today




ABBYY & OCR for IMPACT
Product Overview




ABBYY & OCR for IMPACT
ABBYY Product Brands
Mainline Distribution
                         “Box” products:
                            ABBYY FineReader
                             Optical character recognition (OCR)/text processing
                             end user products

                            ABBYY FotoReader
                             Conversion of texts taken with digital cameras

                            ABBYY PDF Transformer
                             PDF conversion and creation for end users

                            ABBYY Lingvo
                             Electronic dictionaries, Russian and European languages




ABBYY & OCR for IMPACT
ABBYY Product Brands
Direct Sales and VAR Distribution
                         Licensing and integration products:
                            ABBYY Recognition Server
                             Server-based OCR

                            ABBYY FormReader and ABBYY FlexiCapture
                             Form processing, unstructured document processing, document
                             assembly

                            ABBYY FineReader Engine SDK
                             Comprehensive toolkit for integrating recognition and
                             data capture technologies into third-party applications

                            ABBYY Mobile OCR Engine
                             OCR for thin clients such as mobile phones, PDAs and Web
                             applications




ABBYY & OCR for IMPACT
ABBYY OCR Products – Usage View

                 Desktop/Workgroup              Server/Backend                     SDK/Integration

                 User driven processing,      Automated processing,              Automated processing,
                      Ready to use                Ready to use                    Development needed
OCR & Document




                 FineReader                   Recognition Server                  FineReader Engines
  Conversion




                  (Professional, Corporate,   (Professional, Extended Edition)    (Windows, Linux, Mac OS X,
                   Site Licence Edition)                                           Free BSD, Embedded Systems)

                 PDF Transformer
                                                                                  Mobile OCR Engine
                 FotoReader                                                       (Android, Symbian, Linux,
                                                                                   Windows, Windows Mobile,
                 ScreenshotReader                                                  iPhone )




                         End Users,               Companies,                          Developers,
Users
 are:




                        Companies,            Scan Service Provider,              Scan Service Provider
                         (Libraries)                Libraries                       IMPACT Research

       ABBYY & OCR for IMPACT
OCR Basics




ABBYY & OCR for IMPACT
Designed to be not OCRed




ABBYY & OCR for IMPACT     10
What (ABBYY) OCR can read...

        Recognition Languages
       >191 languages altogether
       Alphabets: Cyrillic, Latin, Greek, Armenian, Hebrew, Thai
       34 languages with dictionary support and spell check
       Chinese, Japanese, Korean (CJK) - 4 sets of hieroglyphs
        (Chinese (traditional and simplified), Japanese, Korean)
       5 languages in FineReader XIX (Gothic and other 17-20 century fonts)
       6 programming languages (Basic, C/C++, COBOL, Java, etc.)
       4 artificial languages (Esperanto, Interlingua, etc.)
       Simple chemical formulas

        Font Types
       Recognition of mixed font types
        (dot-matrix printer, typewriter, Gothic, etc.)
       OCR-A
       OCR-B
       MICR (E13B)
       CMC-7

ABBYY & OCR for IMPACT                                                         11
OCR Processing Steps

        Step 1. Scanning, Image Loading, Pre-Processing and Modification
       Compensating image defects and making the document better viewable and suited for
        automatic OCR

        Step 2. Document Layout Analysis
       Detect sections of a document, analyze layout and find barcodes

        Step 3. Character Recognition
       Automatic recognition of characters, apply selected recognition languages, dictionaries
        and other settings

        Step 4. Verification by Operators (optional)
       Manual validation of suspicious characters and words

        Step 5. Document Synthesis and Export
       Generating an output document in the selected format




ABBYY & OCR for IMPACT                                                                            12
OCR Processing Steps

     Step 1. Image Loading, Pre-Processing and Modification
    Images from existing files or captured with a scanner

       Splitting images
       Scaling (e.g. low resolution images can be digitally magnified)
       Rotation (on 90, 180, or 270 degrees)
       Flipping and inverting images
       Cropping (selecting rectangular areas)
       Creating previews (small images for previews)
       Changing text colour and background in rectangular areas




ABBYY & OCR for IMPACT                                                    13
ABBYY OCR Processing Steps

     Step 1. Image Loading, Pre-Processing and Modification
    Compensating for scanning defects

       Automatic de-skew to proper
        straight position
       Straightening text lines
       Controlled de-speckle
        (cleaning garbage dots)




ABBYY & OCR for IMPACT                                         14
OCR Processing Steps

       Step 1. Image Loading, Pre-Processing and Modification

       Intelligent background filtering




       Adaptive Binarisation




    General binarisation on an image level can not
    deliver good results for OCR

ABBYY & OCR for IMPACT                                           15
OCR Processing Steps

       Step 1. Image Loading, Pre-Processing and Modification

       Success during IMPACT

         Original               State of Art                 New




                                                      No text from the other page


ABBYY & OCR for IMPACT                                                        16
New Binarization Examples

                                  Original scan




Prev. binarization




               New binarization


 ABBYY & OCR for IMPACT                           23
Camera OCR
 Automatic correction of 3D perspective distortions




Before




           After




 ABBYY & OCR for IMPACT                               24
Camera OCR
    ISO noise reduction




Before




   After




    ABBYY & OCR for IMPACT   25
OCR Processing Steps

     Step 2. Document Layout Analysis
     Detecting sections of a document, analyze layout and find barcodes




ABBYY & OCR for IMPACT                                                    20
OCR Processing Steps

       Step 3. Character Recognition
    After line detection, character recognition is applied with different classifiers
         Raster classifier                                 Contour classifier




         Structure classifier                     Feature differentiating classifier




ABBYY & OCR for IMPACT                                                                  21
OCR Optimization

       Step 3. Character Recognition – learn new symbols
    Own Pattern Training to learn special characters on a pixel level




ABBYY & OCR for IMPACT                                                  22
OCR Optimization

       Step 3. Character Recognition – back to the word level
    Applying selected recognition languages and dictionaries

       Own languages and dictionaries can be defined




ABBYY & OCR for IMPACT                                           23
OCR Processing Steps

    Step 4. Verification by Operators (optional)
    Manual validation or correction of
       Layout Analysis Results
        ● Text blocks
        ● Image blocks
        ● Table blocks

       Suspicious characters and word
        corrections using dictionaries

       Re-Recognition with other
        language settings

       Recognition Server allows one to set
        quality level and also to log
        processing results in a
        XML file



ABBYY & OCR for IMPACT                              24
ABBYY OCR Processing Steps

     Step 5. Document Synthesis and Export
    Generating an output document in the selected format

       TXT, Office formats, PDF, etc.

       From version 9.0 on ADRT
        (Adaptive Document
        Recognition Technology) included.
        Goal: Understanding the
        document structure and detecting
        e.g. headers, footers, footnotes.
        V10: table of contents

       SDKs and Recognition Server
        offer more export formats, e.g.
        ● XML
        ● Internal
          FineReader Engine Format




ABBYY & OCR for IMPACT                                     25
OCR in General
                                 &
                         IMPACT in Particular




ABBYY & OCR for IMPACT
OCR = Only Character Recognition?

        Recreates the same layout as in the original document
       Resulting document looks just like the scanned original
       Information captured during Layout Analysis is used here

        Supports popular document formats
       ABBYY products support all popular output formats the customer needs
        PDF, PDF/A, XML, HTML, TXT/CSV, Word, Excel, PowerPoint and DBF

        Supports image output
       BMP, PCX, JPEG, JPEG 2000, TIFF, PNG

        Compliance with the regulations
       Support for selective access password protection, document encryption,
        support for PDF/A format, etc.




ABBYY & OCR for IMPACT                                                           27
IMPACT = „Step by Step“ Optimisation
        Step 1. Image Quality
       Problem areas: Scans of microfilms, distortions, shine through characters
       Optimisation approach: Image pre-processing, e.g: Binarisation

        Step 2. Document Analysis
       Problem areas : Layout of old print material, e.g. narrow columns in old newspapers,
       Optimisation approach: improved Layout/Document Analysis

        Step 3. Character recognition & Languages
       Problem areas : Used Fonts, old language (grammar & spelling)
       Optimisation approach: Optimised patterns, adaptive OCR, creation of special dictionaries

        Step 4. Validation & Correction
       Problem areas : often recurring errors during Fraktur OCR, Scalability of correction
       Optimisation approach: New approaches for mass verification

        Step 5. Document Synthesises, Export & Rating
       Problem areas : Content classification, Meta data generation, “reliable ”formats
       Optimisation approach: XML, AltoXML, XML analysis, PDF/A, …

ABBYY & OCR for IMPACT                                                                         28
Thank you for your attention!

                                     Questions?




         Michael Fuchs
         Senior Product Marketing Manager
         ABBYY Europe
         fuchs@abbyy.com


ABBYY & OCR for IMPACT

Más contenido relacionado

La actualidad más candente

Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization wordDhana K
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
 
Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Vidyut Singhania
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using PythonYogeshIJTSRD
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionRahul Mallik
 
OCR vs. Urjanet
OCR vs. UrjanetOCR vs. Urjanet
OCR vs. UrjanetUrjanet
 
OCR 's Functions
OCR 's FunctionsOCR 's Functions
OCR 's Functionsprithvi764
 
OCR speech using Labview
OCR speech using LabviewOCR speech using Labview
OCR speech using LabviewBharat Thakur
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition) IstiaqueBinIslam
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionGhufran Ataie
 
Ocr algorithm for ge’ez characters
Ocr algorithm for ge’ez charactersOcr algorithm for ge’ez characters
Ocr algorithm for ge’ez charactersNegash Desalegn
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 

La actualidad más candente (20)

Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization word
 
OCR Text Extraction
OCR Text ExtractionOCR Text Extraction
OCR Text Extraction
 
OCR2
OCR2OCR2
OCR2
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
 
Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using Python
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
OCR vs. Urjanet
OCR vs. UrjanetOCR vs. Urjanet
OCR vs. Urjanet
 
OCR
OCROCR
OCR
 
OCR 's Functions
OCR 's FunctionsOCR 's Functions
OCR 's Functions
 
OCR speech using Labview
OCR speech using LabviewOCR speech using Labview
OCR speech using Labview
 
CRC Final Report
CRC Final ReportCRC Final Report
CRC Final Report
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition)
 
Text reader [OCR]
Text reader [OCR]Text reader [OCR]
Text reader [OCR]
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Ocr algorithm for ge’ez characters
Ocr algorithm for ge’ez charactersOcr algorithm for ge’ez characters
Ocr algorithm for ge’ez characters
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 

Similar a Bratislava WS - Fuchs - Abbyy - OCR overview_pdf

Pariksha mobile applications
Pariksha mobile applicationsPariksha mobile applications
Pariksha mobile applicationsparikshalabs.com
 
The Best Python IDEs and Code Editors.pdf
The Best Python IDEs and Code Editors.pdfThe Best Python IDEs and Code Editors.pdf
The Best Python IDEs and Code Editors.pdfAppdeveloper10
 
Adobe digital publishing cmbo - ktukker
Adobe digital publishing   cmbo - ktukkerAdobe digital publishing   cmbo - ktukker
Adobe digital publishing cmbo - ktukkerPeter Luit
 
RPA and AI empowered Digital Transformation
RPA and AI empowered Digital TransformationRPA and AI empowered Digital Transformation
RPA and AI empowered Digital TransformationAndrei Oros
 
FabriQate credentials-2011
FabriQate credentials-2011FabriQate credentials-2011
FabriQate credentials-2011PRDA
 
TCS Digital World Portfolio
TCS Digital World PortfolioTCS Digital World Portfolio
TCS Digital World PortfolioLucy Setian
 
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentation
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentationSpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentation
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentationSpagoWorld
 
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009eLiberatica
 
Reading System for the Blind PPT
Reading System for the Blind PPTReading System for the Blind PPT
Reading System for the Blind PPTBinayak Ghosh
 
Quark app studio
Quark app studioQuark app studio
Quark app studioReet Singh
 
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...Michael Kozloff
 
Belgium Outsystems user group speech recognition ocr
Belgium Outsystems user group speech recognition   ocrBelgium Outsystems user group speech recognition   ocr
Belgium Outsystems user group speech recognition ocrProvidit
 
Qt everywhere
Qt everywhereQt everywhere
Qt everywhereNokia
 
EclipseCon 2010 API Design and Evolution (Tutorial)
EclipseCon 2010 API Design and Evolution (Tutorial)EclipseCon 2010 API Design and Evolution (Tutorial)
EclipseCon 2010 API Design and Evolution (Tutorial)moberhuber
 
Define iPhone Introduction
Define iPhone IntroductionDefine iPhone Introduction
Define iPhone Introductionsaurwad
 
Presentation
PresentationPresentation
PresentationTony Vo
 

Similar a Bratislava WS - Fuchs - Abbyy - OCR overview_pdf (20)

IMPACT Final Conference - Michael Fuchs
IMPACT Final Conference - Michael FuchsIMPACT Final Conference - Michael Fuchs
IMPACT Final Conference - Michael Fuchs
 
Pariksha mobile applications
Pariksha mobile applicationsPariksha mobile applications
Pariksha mobile applications
 
The Best Python IDEs and Code Editors.pdf
The Best Python IDEs and Code Editors.pdfThe Best Python IDEs and Code Editors.pdf
The Best Python IDEs and Code Editors.pdf
 
Mark Szulc
Mark SzulcMark Szulc
Mark Szulc
 
Adobe digital publishing cmbo - ktukker
Adobe digital publishing   cmbo - ktukkerAdobe digital publishing   cmbo - ktukker
Adobe digital publishing cmbo - ktukker
 
RPA and AI empowered Digital Transformation
RPA and AI empowered Digital TransformationRPA and AI empowered Digital Transformation
RPA and AI empowered Digital Transformation
 
Seminario MovilyTactil
Seminario MovilyTactilSeminario MovilyTactil
Seminario MovilyTactil
 
FabriQate credentials-2011
FabriQate credentials-2011FabriQate credentials-2011
FabriQate credentials-2011
 
TCS Digital World Portfolio
TCS Digital World PortfolioTCS Digital World Portfolio
TCS Digital World Portfolio
 
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentation
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentationSpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentation
SpagoBI Open Day 2012 in Sao Paulo, Brazil - SpagoBI 3.4 presentation
 
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009
"IBMs Open Source Strategy" by Adam Jollans @ eLiberatica 2009
 
Reading System for the Blind PPT
Reading System for the Blind PPTReading System for the Blind PPT
Reading System for the Blind PPT
 
Quark app studio
Quark app studioQuark app studio
Quark app studio
 
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...
Octopod Mobile Development Platform for rapid cross-platform Enterprise IT Mo...
 
Belgium Outsystems user group speech recognition ocr
Belgium Outsystems user group speech recognition   ocrBelgium Outsystems user group speech recognition   ocr
Belgium Outsystems user group speech recognition ocr
 
Qt everywhere
Qt everywhereQt everywhere
Qt everywhere
 
EclipseCon 2010 API Design and Evolution (Tutorial)
EclipseCon 2010 API Design and Evolution (Tutorial)EclipseCon 2010 API Design and Evolution (Tutorial)
EclipseCon 2010 API Design and Evolution (Tutorial)
 
Define iPhone Introduction
Define iPhone IntroductionDefine iPhone Introduction
Define iPhone Introduction
 
Bp209
Bp209Bp209
Bp209
 
Presentation
PresentationPresentation
Presentation
 

Más de IMPACT Centre of Competence

Más de IMPACT Centre of Competence (20)

Session6 01.helmut schmid
Session6 01.helmut schmidSession6 01.helmut schmid
Session6 01.helmut schmid
 
Session1 03.hsian-an wang
Session1 03.hsian-an wangSession1 03.hsian-an wang
Session1 03.hsian-an wang
 
Session7 03.katrien depuydt
Session7 03.katrien depuydtSession7 03.katrien depuydt
Session7 03.katrien depuydt
 
Session7 02.peter kiraly
Session7 02.peter kiralySession7 02.peter kiraly
Session7 02.peter kiraly
 
Session6 04.giuseppe celano
Session6 04.giuseppe celanoSession6 04.giuseppe celano
Session6 04.giuseppe celano
 
Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 03.juri opitz
Session2 03.juri opitzSession2 03.juri opitz
Session2 03.juri opitz
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 

Último

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Bratislava WS - Fuchs - Abbyy - OCR overview_pdf

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Optical Character Recognition (OCR) Introduction & Overview Michael Fuchs Senior Product Marketing Manager ABBYY Europe fuchs@abbyy.com
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Agenda  ABBYY Technology in the IMPACT project  Who is ABBYY?  Company Overview  Product Overview  How is OCR used in real-life scenarios?  Optical Character Recognition - Basics  What is OCR?  How does OCR work inside?  OCR = Only Character Recognition?  IMPACT – the areas of improvement  Questions & Answers IMPACT + ABBYY - OCR Introduction & Overview 2
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT & ABBYY IMPACT + ABBYY - OCR Introduction & Overview 3
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Improving Access to Text  Mission of IMPACT: It aims to significantly improve access to historical text and remove the barriers that stand in the way of the mass digitisation of the European cultural heritage.  Partners: Koninklijke Bibliotheek, The British Library, Österreichische Nationalbibliothek, Universität Innsbruck, Deutsche Nationalbibliothek, Bayerische Staatsbibliothek, Staats- und Universitätsbibliothek Göttingen ABBYY, IBM Israel – Science and Technology Ltd, Instituut voor Nederlandse Lexicologie National Centre for Scientific Research "Demokritos“, Centrum für Informations- und Sprachverarbeitung, University of Munich University of Bath, University of Salford, Bibliothèque Nationale de France  Web: www.impact-project.eu IMPACT + ABBYY - OCR Introduction & Overview 4
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT & ABBYY  ABBYY is the OCR technology provider for IMPACT members  IMPACT members work with ABBYYs OCR SDK (FineReader Engine), because:  Only development toolkits allow developers to combine new/different modules, for example: complex dictionaries  Scientific research & tests have to be implemented in custom modules IMPACT + ABBYY - OCR Introduction & Overview 5
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT & ABBYY  ABBYY improves the OCR core technologies for the recognition of old documents, current focus areas are  Image pre-processing  Character recognition  IMPACT currently focuses on research and not in setting up a production system ;o)  Improvements in ABBYY recognition technologies that are a result of the IMPACT project will be added to future products  Important: ABBYY FineReader 8/9/10 Professional (Box) has NO Fraktur OCR  Fraktur OCR is only available in Recognition Server und FineReader Engine IMPACT + ABBYY - OCR Introduction & Overview 6
  • 7. ABBYY – an Overview IMPACT + ABBYY – OCR Introduction & Overview
  • 8. Who is ABBYY?  Leading developer of artificial intelligence software in document recognition, data capture and linguistics  Headquartered in Moscow, Russia  Founded in 1989 by Mr. David Yang as BIT Software  More than 880 employees worldwide  8 offices worldwide  Established sales and distribution network in more than 130 countries worldwide ABBYY & OCR for IMPACT
  • 9. ABBYY Worldwide ABBYY Headquarters/ ABBYY Russia ABBYY Europe UK Moscow Fremont ABBYY USA ABBYY Europe GmbH ABBYY Ukraine ABBYY Japan Munich, Germany Kiev ABBYY Taiwan ABBYY & OCR for IMPACT
  • 10. ABBYY in Western Europe ABBYY Europe GmbH  Located in Munich, Germany  Established in 2001  Serves partners and customers in Western European countries  Sales and Marketing  Sales ● Distribution, channel development, partner management  Marketing ● Product marketing, channel marketing, outbound marketing (PR, advertising, direct)  More than 50 employees today ABBYY & OCR for IMPACT
  • 11. Product Overview ABBYY & OCR for IMPACT
  • 12. ABBYY Product Brands Mainline Distribution “Box” products:  ABBYY FineReader Optical character recognition (OCR)/text processing end user products  ABBYY FotoReader Conversion of texts taken with digital cameras  ABBYY PDF Transformer PDF conversion and creation for end users  ABBYY Lingvo Electronic dictionaries, Russian and European languages ABBYY & OCR for IMPACT
  • 13. ABBYY Product Brands Direct Sales and VAR Distribution Licensing and integration products:  ABBYY Recognition Server Server-based OCR  ABBYY FormReader and ABBYY FlexiCapture Form processing, unstructured document processing, document assembly  ABBYY FineReader Engine SDK Comprehensive toolkit for integrating recognition and data capture technologies into third-party applications  ABBYY Mobile OCR Engine OCR for thin clients such as mobile phones, PDAs and Web applications ABBYY & OCR for IMPACT
  • 14. ABBYY OCR Products – Usage View Desktop/Workgroup Server/Backend SDK/Integration User driven processing, Automated processing, Automated processing, Ready to use Ready to use Development needed OCR & Document FineReader Recognition Server FineReader Engines Conversion (Professional, Corporate, (Professional, Extended Edition) (Windows, Linux, Mac OS X, Site Licence Edition) Free BSD, Embedded Systems) PDF Transformer Mobile OCR Engine FotoReader (Android, Symbian, Linux, Windows, Windows Mobile, ScreenshotReader iPhone ) End Users, Companies, Developers, Users are: Companies, Scan Service Provider, Scan Service Provider (Libraries) Libraries IMPACT Research ABBYY & OCR for IMPACT
  • 15. OCR Basics ABBYY & OCR for IMPACT
  • 16. Designed to be not OCRed ABBYY & OCR for IMPACT 10
  • 17. What (ABBYY) OCR can read...  Recognition Languages  >191 languages altogether  Alphabets: Cyrillic, Latin, Greek, Armenian, Hebrew, Thai  34 languages with dictionary support and spell check  Chinese, Japanese, Korean (CJK) - 4 sets of hieroglyphs (Chinese (traditional and simplified), Japanese, Korean)  5 languages in FineReader XIX (Gothic and other 17-20 century fonts)  6 programming languages (Basic, C/C++, COBOL, Java, etc.)  4 artificial languages (Esperanto, Interlingua, etc.)  Simple chemical formulas  Font Types  Recognition of mixed font types (dot-matrix printer, typewriter, Gothic, etc.)  OCR-A  OCR-B  MICR (E13B)  CMC-7 ABBYY & OCR for IMPACT 11
  • 18. OCR Processing Steps  Step 1. Scanning, Image Loading, Pre-Processing and Modification  Compensating image defects and making the document better viewable and suited for automatic OCR  Step 2. Document Layout Analysis  Detect sections of a document, analyze layout and find barcodes  Step 3. Character Recognition  Automatic recognition of characters, apply selected recognition languages, dictionaries and other settings  Step 4. Verification by Operators (optional)  Manual validation of suspicious characters and words  Step 5. Document Synthesis and Export  Generating an output document in the selected format ABBYY & OCR for IMPACT 12
  • 19. OCR Processing Steps  Step 1. Image Loading, Pre-Processing and Modification Images from existing files or captured with a scanner  Splitting images  Scaling (e.g. low resolution images can be digitally magnified)  Rotation (on 90, 180, or 270 degrees)  Flipping and inverting images  Cropping (selecting rectangular areas)  Creating previews (small images for previews)  Changing text colour and background in rectangular areas ABBYY & OCR for IMPACT 13
  • 20. ABBYY OCR Processing Steps  Step 1. Image Loading, Pre-Processing and Modification Compensating for scanning defects  Automatic de-skew to proper straight position  Straightening text lines  Controlled de-speckle (cleaning garbage dots) ABBYY & OCR for IMPACT 14
  • 21. OCR Processing Steps  Step 1. Image Loading, Pre-Processing and Modification  Intelligent background filtering  Adaptive Binarisation General binarisation on an image level can not deliver good results for OCR ABBYY & OCR for IMPACT 15
  • 22. OCR Processing Steps  Step 1. Image Loading, Pre-Processing and Modification  Success during IMPACT  Original  State of Art  New  No text from the other page ABBYY & OCR for IMPACT 16
  • 23. New Binarization Examples Original scan Prev. binarization New binarization ABBYY & OCR for IMPACT 23
  • 24. Camera OCR Automatic correction of 3D perspective distortions Before After ABBYY & OCR for IMPACT 24
  • 25. Camera OCR ISO noise reduction Before After ABBYY & OCR for IMPACT 25
  • 26. OCR Processing Steps  Step 2. Document Layout Analysis Detecting sections of a document, analyze layout and find barcodes ABBYY & OCR for IMPACT 20
  • 27. OCR Processing Steps  Step 3. Character Recognition After line detection, character recognition is applied with different classifiers Raster classifier Contour classifier Structure classifier Feature differentiating classifier ABBYY & OCR for IMPACT 21
  • 28. OCR Optimization  Step 3. Character Recognition – learn new symbols Own Pattern Training to learn special characters on a pixel level ABBYY & OCR for IMPACT 22
  • 29. OCR Optimization  Step 3. Character Recognition – back to the word level Applying selected recognition languages and dictionaries  Own languages and dictionaries can be defined ABBYY & OCR for IMPACT 23
  • 30. OCR Processing Steps  Step 4. Verification by Operators (optional) Manual validation or correction of  Layout Analysis Results ● Text blocks ● Image blocks ● Table blocks  Suspicious characters and word corrections using dictionaries  Re-Recognition with other language settings  Recognition Server allows one to set quality level and also to log processing results in a XML file ABBYY & OCR for IMPACT 24
  • 31. ABBYY OCR Processing Steps  Step 5. Document Synthesis and Export Generating an output document in the selected format  TXT, Office formats, PDF, etc.  From version 9.0 on ADRT (Adaptive Document Recognition Technology) included. Goal: Understanding the document structure and detecting e.g. headers, footers, footnotes. V10: table of contents  SDKs and Recognition Server offer more export formats, e.g. ● XML ● Internal FineReader Engine Format ABBYY & OCR for IMPACT 25
  • 32. OCR in General & IMPACT in Particular ABBYY & OCR for IMPACT
  • 33. OCR = Only Character Recognition?  Recreates the same layout as in the original document  Resulting document looks just like the scanned original  Information captured during Layout Analysis is used here  Supports popular document formats  ABBYY products support all popular output formats the customer needs PDF, PDF/A, XML, HTML, TXT/CSV, Word, Excel, PowerPoint and DBF  Supports image output  BMP, PCX, JPEG, JPEG 2000, TIFF, PNG  Compliance with the regulations  Support for selective access password protection, document encryption, support for PDF/A format, etc. ABBYY & OCR for IMPACT 27
  • 34. IMPACT = „Step by Step“ Optimisation  Step 1. Image Quality  Problem areas: Scans of microfilms, distortions, shine through characters  Optimisation approach: Image pre-processing, e.g: Binarisation  Step 2. Document Analysis  Problem areas : Layout of old print material, e.g. narrow columns in old newspapers,  Optimisation approach: improved Layout/Document Analysis  Step 3. Character recognition & Languages  Problem areas : Used Fonts, old language (grammar & spelling)  Optimisation approach: Optimised patterns, adaptive OCR, creation of special dictionaries  Step 4. Validation & Correction  Problem areas : often recurring errors during Fraktur OCR, Scalability of correction  Optimisation approach: New approaches for mass verification  Step 5. Document Synthesises, Export & Rating  Problem areas : Content classification, Meta data generation, “reliable ”formats  Optimisation approach: XML, AltoXML, XML analysis, PDF/A, … ABBYY & OCR for IMPACT 28
  • 35. Thank you for your attention! Questions? Michael Fuchs Senior Product Marketing Manager ABBYY Europe fuchs@abbyy.com ABBYY & OCR for IMPACT