SlideShare una empresa de Scribd logo
1 de 19
Real time OCR
using Tesseract
12BCE094
SHOBHIT CHITTORA
Brief History Of Tesseract
 Open Source OCR engine sponsored by Google since 2006.
 One of the most accurate open source OCR engines currently
available.
 Originally developed by HP between 1985-1994.
 Lot of it is written in C and C++.
TessOCR Architecture
Adaptive Thresholding is Essential
Baselines are rarely perfectly
straight
Spaces between words are tricky
too
 Italics, digits, punctuation all create special-case font-dependent
spacing.
 Fully justified text in narrow columns can have vastly varying spacing
on different lines.
Tesseract Word Recognizer
Outline Approximation
 Polygonal approximation is a double-edged sword.
 Noise and some pertinent information are both lost.
Why it’s called Tesseract?
 Elements of the polygonal approximation, clustered within a
character/font combination.
 x, y position, direction, and length (as a multiple of feature length)
Character Classifier (Features and
Matching)
 Static classifier uses outline fragments as features. Broken characters are
easily recognizable by a small->large matching process in classifier. (This is
slow.)
 Adaptive classifier uses the same technique!
Classifier as Histogram of Gradients
 Quantize character area.
 Compute gradients within.
 Histograms of gradients map to fixed dimension feature vector.
Character Segmentation
 Segmentation Graphs
Rating and Certainty
 Rating = Distance * Outline length
○ Total rating over a word (or line if you prefer) is normalized
○ Different length transcriptions are fairly comparable
 Certainty = -20 * Distance
○ Measures the absolute classification confidence
○ Surrogate for log probability and is used to decide what needs
more work.
Tesseract Training
Implementation using Tess-two( Tess
port for Android)
 The Tess-two library is an open source port of Tesseract engine for
Android.
 Only the most basic and popular functionalities are ported.
 Things such as deep neutral nets are not ported.
 A lot of tweaking is required to produce desired results.
DEMO
Implementing Real Time OCR and
challenges
 Image processing on memory limited devices is difficult.
 Limited clock speeds to process huge matrices.
 Running the Camera Surface Holder in MainUI and preprocessing
and OCR on user threads.
 Maintaining huge Bitmaps for preprocessing and sending to multiple
threads.
 Avoiding Garbage Collection of important preprocessed data.
Thank You

Más contenido relacionado

La actualidad más candente

State-of-Art Optical Character Recognition case
State-of-Art Optical Character Recognition caseState-of-Art Optical Character Recognition case
State-of-Art Optical Character Recognition caseKristina Piltyay
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
5 character classifiers
5 character classifiers5 character classifiers
5 character classifiersSolin TEM
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Understanding Autoencoder  (Deep Learning Book, Chapter 14)Understanding Autoencoder  (Deep Learning Book, Chapter 14)
Understanding Autoencoder (Deep Learning Book, Chapter 14)Entrepreneur / Startup
 
BrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter ApplicationBrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter Applicationpijush15
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Introduction to phython programming
Introduction to phython programmingIntroduction to phython programming
Introduction to phython programmingASIT Education
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
basics of object oriented programming
basics of object oriented programmingbasics of object oriented programming
basics of object oriented programmingRaghav Gupta
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
Object Oriented Programming Languages
Object Oriented Programming LanguagesObject Oriented Programming Languages
Object Oriented Programming LanguagesMannu Khani
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to AutoencodersYan Xu
 
arithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingarithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingAyush Gupta
 

La actualidad más candente (20)

State-of-Art Optical Character Recognition case
State-of-Art Optical Character Recognition caseState-of-Art Optical Character Recognition case
State-of-Art Optical Character Recognition case
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
5 character classifiers
5 character classifiers5 character classifiers
5 character classifiers
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Understanding Autoencoder  (Deep Learning Book, Chapter 14)Understanding Autoencoder  (Deep Learning Book, Chapter 14)
Understanding Autoencoder (Deep Learning Book, Chapter 14)
 
BrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter ApplicationBrailleOCR: An Open Source Document to Braille Converter Application
BrailleOCR: An Open Source Document to Braille Converter Application
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
3 training
3 training3 training
3 training
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Introduction to phython programming
Introduction to phython programmingIntroduction to phython programming
Introduction to phython programming
 
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
basics of object oriented programming
basics of object oriented programmingbasics of object oriented programming
basics of object oriented programming
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Object Oriented Programming Languages
Object Oriented Programming LanguagesObject Oriented Programming Languages
Object Oriented Programming Languages
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
arithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingarithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic coding
 

Destacado

OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructuresSolin TEM
 
CS_rapport_final_fr_v3_1
CS_rapport_final_fr_v3_1CS_rapport_final_fr_v3_1
CS_rapport_final_fr_v3_1Solin TEM
 
Gps Navigation Survey
Gps Navigation SurveyGps Navigation Survey
Gps Navigation SurveyDaniel Kim
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners Sujith Kumar
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python amiable_indian
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey
 
Déposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALDéposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALOAccsd
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonNowell Strite
 

Destacado (13)

ocr
ocrocr
ocr
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
project indesh
project indeshproject indesh
project indesh
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructures
 
CS_rapport_final_fr_v3_1
CS_rapport_final_fr_v3_1CS_rapport_final_fr_v3_1
CS_rapport_final_fr_v3_1
 
Gps Navigation Survey
Gps Navigation SurveyGps Navigation Survey
Gps Navigation Survey
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
 
Raspberry pi
Raspberry pi Raspberry pi
Raspberry pi
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
Déposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALDéposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HAL
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 

Similar a OCR using Tesseract

Script Identification Using MATLAB
Script Identification Using MATLABScript Identification Using MATLAB
Script Identification Using MATLABAnimesh Mishra
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate RecognitionGilbert
 
Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxJanagi Raman S
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...Dataconomy Media
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET Journal
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...IndicThreads
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftTalentica Software
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015József Makai
 
Scripting in InduSoft Web Studio
Scripting in InduSoft Web StudioScripting in InduSoft Web Studio
Scripting in InduSoft Web StudioAVEVA
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Developing Actors in Azure with .net
Developing Actors in Azure with .netDeveloping Actors in Azure with .net
Developing Actors in Azure with .netMarco Parenzan
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image ProcessingIJERA Editor
 

Similar a OCR using Tesseract (20)

Script Identification Using MATLAB
Script Identification Using MATLABScript Identification Using MATLAB
Script Identification Using MATLAB
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptx
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
 
Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using Android
 
Modern C++
Modern C++Modern C++
Modern C++
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015
 
Scripting in InduSoft Web Studio
Scripting in InduSoft Web StudioScripting in InduSoft Web Studio
Scripting in InduSoft Web Studio
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Developing Actors in Azure with .net
Developing Actors in Azure with .netDeveloping Actors in Azure with .net
Developing Actors in Azure with .net
 
Ary Mouse for Image Processing
Ary Mouse for Image ProcessingAry Mouse for Image Processing
Ary Mouse for Image Processing
 

Último

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 

Último (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

OCR using Tesseract

  • 1. Real time OCR using Tesseract 12BCE094 SHOBHIT CHITTORA
  • 2. Brief History Of Tesseract  Open Source OCR engine sponsored by Google since 2006.  One of the most accurate open source OCR engines currently available.  Originally developed by HP between 1985-1994.  Lot of it is written in C and C++.
  • 5. Baselines are rarely perfectly straight
  • 6. Spaces between words are tricky too  Italics, digits, punctuation all create special-case font-dependent spacing.  Fully justified text in narrow columns can have vastly varying spacing on different lines.
  • 8. Outline Approximation  Polygonal approximation is a double-edged sword.  Noise and some pertinent information are both lost.
  • 9. Why it’s called Tesseract?  Elements of the polygonal approximation, clustered within a character/font combination.  x, y position, direction, and length (as a multiple of feature length)
  • 10. Character Classifier (Features and Matching)  Static classifier uses outline fragments as features. Broken characters are easily recognizable by a small->large matching process in classifier. (This is slow.)  Adaptive classifier uses the same technique!
  • 11. Classifier as Histogram of Gradients  Quantize character area.  Compute gradients within.  Histograms of gradients map to fixed dimension feature vector.
  • 13.
  • 14. Rating and Certainty  Rating = Distance * Outline length ○ Total rating over a word (or line if you prefer) is normalized ○ Different length transcriptions are fairly comparable  Certainty = -20 * Distance ○ Measures the absolute classification confidence ○ Surrogate for log probability and is used to decide what needs more work.
  • 16. Implementation using Tess-two( Tess port for Android)  The Tess-two library is an open source port of Tesseract engine for Android.  Only the most basic and popular functionalities are ported.  Things such as deep neutral nets are not ported.  A lot of tweaking is required to produce desired results.
  • 17. DEMO
  • 18. Implementing Real Time OCR and challenges  Image processing on memory limited devices is difficult.  Limited clock speeds to process huge matrices.  Running the Camera Surface Holder in MainUI and preprocessing and OCR on user threads.  Maintaining huge Bitmaps for preprocessing and sending to multiple threads.  Avoiding Garbage Collection of important preprocessed data.