SlideShare una empresa de Scribd logo
1 de 52
How I built a ML-human hybrid
workflow using Computer Vision
Amir Shitrit
Software Architect
amirs@codevalue.net
@amir_shitrit
http://codevalue.net
2
What to (not) expect
3
About Me
Amir Shitrit
 Software Architect
 Love the cloud and also distributed systems
 And animals!
4
5
6
A short story
7
The “catalog”
8
Also
9
Selling on Simania
11
Photo by Ed Robertson on Unsplash
What do I have to do with it?
12
Me, when I was younger
First things first
13
 my own digital catalog
Then, search the book in the catalog
 But how?
14
15
Option 1
 BARCODE
 DANACODE
 WHATEVERCODE
Option 2
 OCR
17
Some history about OCR
18
The challenge
19
Using Google’s
20
OCR Services
 TEXT_DETECTION
21
 DOCUMENT_TEXT_DETECTION
Problems with OCR
 Not exactly accurate
 Photographing each book individually
22
23
24
The best one!
25
About to give up
26
Photo by Steve Johnson on Unsplash
Aha!
27
When you come to think of it …
 Who needs accuracy anyway?
28
Photo by Katerina Holmes from Pexels
Some classification-related terms
29
accuracy =
TP + TN
total
precision =
TP
actual results
recall =
TP
predicted results
Some classification
-related terms
 Accuracy
 Precision
 Recall
30
Precision over accuracy
31
Search for in “broken” text
32
Still some differences
 ‫ספורי‬ ‫אוצר‬
6
‫לפני‬ Y
 ‫הענה‬ ‫לפני‬ ‫ספורים‬ ‫אוצר‬
33
Fuzzy search in Elasticsearch
 https://en.wikipedia.org/wiki/Levenshtein_distance
34
Fuzzy search in Elasticsearch
 https://en.wikipedia.org/wiki/Levenshtein_distance
35
And if that doesn’t work?
 HaaS = Human as a Service
 HITL
36
What about multiple books?
37
Again, Vision API to the rescue
38
Back to the metrics
 Object Detection
 OCR
 Fuzzy Search
39
Which metrics should I use for: Object Detection
40
 Precision
41
Which metrics should I use for: OCR
Which metrics should I use? (Fuzzy) Search in catalog
42
Demo
 Demo
43
What we just saw
44
Code
45
Costs
46
 Google Cloud Vision API
 1K OBJECT_DETECTIONS = free
 1K OCR = also free
 Every next 1K is 1.5$
 Azure
 Storage – $0.0196 per GB
 (hot LRS standard storage)
 Egress - $0.05 per GB
 (for serving the beautiful web UI)
 Compute – currently on-prem
So…
47
Key takeaways
 Ain’t no need a math wiz
 Cloud services are easy to use, but
 Choose the right metrics for the right steps
 AI + NI = Better together
 If you can’t join them, beat (the crap out of) them
48
What’s next?
 Taking on Amazon
49
Resources
 Google Cloud Vision API
 https://cloud.google.com/vision
 Elasticsearch – Fuzzy query
 https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-
query.html
 HITL on Wikipedia
 https://en.wikipedia.org/wiki/Human-in-the-loop
 Binary classification metrics
 https://en.wikipedia.org/wiki/Binary_classification
 https://medium.com/@shrutisaxena0617/precision-vs-recall-386cf9f89488
50
Q
A
51
Amir Shitrit
Software Architect
amirs@codevalue.net
@amir_Shitrit
http://codevalue.net

Más contenido relacionado

La actualidad más candente

Azure - The Good Parts
Azure - The Good PartsAzure - The Good Parts
Azure - The Good PartsMark Allan
 
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...NetworkCollaborators
 
Google Cloud IoT Core
Google Cloud IoT CoreGoogle Cloud IoT Core
Google Cloud IoT CoreIdo Flatow
 
Intelligent Integrations with Azure, Logic Apps and BizTalk
Intelligent Integrations with Azure, Logic Apps and BizTalkIntelligent Integrations with Azure, Logic Apps and BizTalk
Intelligent Integrations with Azure, Logic Apps and BizTalkAdam Walhout
 
Serverless Logging Architecture
Serverless Logging ArchitectureServerless Logging Architecture
Serverless Logging ArchitectureNarendran R
 
Iot meets Serverless
Iot meets ServerlessIot meets Serverless
Iot meets ServerlessNarendran R
 
Digital transformation buzzword or reality - Alon Fliess
Digital transformation buzzword or reality - Alon FliessDigital transformation buzzword or reality - Alon Fliess
Digital transformation buzzword or reality - Alon FliessCodeValue
 
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSSLessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSSMauricio (Salaboy) Salatino
 
Integrate 2017 unlock azure hybrid integration with biz talk - ws
Integrate 2017   unlock azure hybrid integration with biz talk - wsIntegrate 2017   unlock azure hybrid integration with biz talk - ws
Integrate 2017 unlock azure hybrid integration with biz talk - wsWagner Silveira
 
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE
 
Activiti Cloud Overview & BluePrint: Trending Topic Campaigns
Activiti Cloud Overview & BluePrint: Trending Topic CampaignsActiviti Cloud Overview & BluePrint: Trending Topic Campaigns
Activiti Cloud Overview & BluePrint: Trending Topic CampaignsMauricio (Salaboy) Salatino
 
SnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic
 
10 predictions for cloud native in 2021
10 predictions for cloud native in 202110 predictions for cloud native in 2021
10 predictions for cloud native in 2021Cheryl Hung
 
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...tdc-globalcode
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Mark Heckler
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackMathieu Herbert
 

La actualidad más candente (20)

Azure - The Good Parts
Azure - The Good PartsAzure - The Good Parts
Azure - The Good Parts
 
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
 
Google Cloud IoT Core
Google Cloud IoT CoreGoogle Cloud IoT Core
Google Cloud IoT Core
 
Spring Cloud Kubernetes
Spring Cloud KubernetesSpring Cloud Kubernetes
Spring Cloud Kubernetes
 
Intelligent Integrations with Azure, Logic Apps and BizTalk
Intelligent Integrations with Azure, Logic Apps and BizTalkIntelligent Integrations with Azure, Logic Apps and BizTalk
Intelligent Integrations with Azure, Logic Apps and BizTalk
 
Serverless Logging Architecture
Serverless Logging ArchitectureServerless Logging Architecture
Serverless Logging Architecture
 
Iot meets Serverless
Iot meets ServerlessIot meets Serverless
Iot meets Serverless
 
Digital transformation buzzword or reality - Alon Fliess
Digital transformation buzzword or reality - Alon FliessDigital transformation buzzword or reality - Alon Fliess
Digital transformation buzzword or reality - Alon Fliess
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Google cloud
Google cloudGoogle cloud
Google cloud
 
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSSLessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS
Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS
 
3 slides 1
3 slides 13 slides 1
3 slides 1
 
Integrate 2017 unlock azure hybrid integration with biz talk - ws
Integrate 2017   unlock azure hybrid integration with biz talk - wsIntegrate 2017   unlock azure hybrid integration with biz talk - ws
Integrate 2017 unlock azure hybrid integration with biz talk - ws
 
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures
 
Activiti Cloud Overview & BluePrint: Trending Topic Campaigns
Activiti Cloud Overview & BluePrint: Trending Topic CampaignsActiviti Cloud Overview & BluePrint: Trending Topic Campaigns
Activiti Cloud Overview & BluePrint: Trending Topic Campaigns
 
SnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic Live: Workday Integration
SnapLogic Live: Workday Integration
 
10 predictions for cloud native in 2021
10 predictions for cloud native in 202110 predictions for cloud native in 2021
10 predictions for cloud native in 2021
 
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stack
 

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning Nikolaos Vasiloglou
 
On Machine Learning Readiness
On Machine Learning ReadinessOn Machine Learning Readiness
On Machine Learning ReadinessAnne-Marie Tousch
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionAnne-Marie Tousch
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionKarthik Murugesan
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...Amazon Web Services
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningAngelo Simone Scotto
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Data Science London
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech
 
Anything data (revisited)
Anything data (revisited)Anything data (revisited)
Anything data (revisited)Ahmet Akyol
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxAnkitMishra616883
 
Welcome-to-AI-Focused-CourseLast.pptx
Welcome-to-AI-Focused-CourseLast.pptxWelcome-to-AI-Focused-CourseLast.pptx
Welcome-to-AI-Focused-CourseLast.pptxMohamedSaied316569
 
AI @ Microsoft, How we do it and how you can too!
AI @ Microsoft, How we do it and how you can too!AI @ Microsoft, How we do it and how you can too!
AI @ Microsoft, How we do it and how you can too!Microsoft Tech Community
 
MLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSMLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSAntonChernov9
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler Data Tech
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit (20)

Ml3
Ml3Ml3
Ml3
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning
 
On Machine Learning Readiness
On Machine Learning ReadinessOn Machine Learning Readiness
On Machine Learning Readiness
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER Introduction
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
Anything data (revisited)
Anything data (revisited)Anything data (revisited)
Anything data (revisited)
 
AI | Now + Next
AI | Now + NextAI | Now + Next
AI | Now + Next
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptx
 
Welcome-to-AI-Focused-CourseLast.pptx
Welcome-to-AI-Focused-CourseLast.pptxWelcome-to-AI-Focused-CourseLast.pptx
Welcome-to-AI-Focused-CourseLast.pptx
 
Machine Learning & AI
Machine Learning & AIMachine Learning & AI
Machine Learning & AI
 
AI @ Microsoft, How we do it and how you can too!
AI @ Microsoft, How we do it and how you can too!AI @ Microsoft, How we do it and how you can too!
AI @ Microsoft, How we do it and how you can too!
 
MLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSMLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWS
 
seminar_HITEC.pptx
seminar_HITEC.pptxseminar_HITEC.pptx
seminar_HITEC.pptx
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive models
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 

Más de CodeValue

The IDF's journey to the cloud - Merav
The IDF's journey to the cloud - MeravThe IDF's journey to the cloud - Merav
The IDF's journey to the cloud - MeravCodeValue
 
When your release plan is concluded at the HR office - Hanan Zakai
When your release plan is concluded at the HR office - Hanan  ZakaiWhen your release plan is concluded at the HR office - Hanan  Zakai
When your release plan is concluded at the HR office - Hanan ZakaiCodeValue
 
We come in peace hybrid development with web assembly - Maayan Hanin
We come in peace hybrid development with web assembly - Maayan HaninWe come in peace hybrid development with web assembly - Maayan Hanin
We come in peace hybrid development with web assembly - Maayan HaninCodeValue
 
State in stateless serverless functions - Alex Pshul
State in stateless serverless functions - Alex PshulState in stateless serverless functions - Alex Pshul
State in stateless serverless functions - Alex PshulCodeValue
 
Will the Real Public API Please Stand Up? Amir Zuker
Will the Real Public API Please Stand Up? Amir ZukerWill the Real Public API Please Stand Up? Amir Zuker
Will the Real Public API Please Stand Up? Amir ZukerCodeValue
 
Application evolution strategy - Eran Stiller
Application evolution strategy - Eran StillerApplication evolution strategy - Eran Stiller
Application evolution strategy - Eran StillerCodeValue
 
Designing products in the digital transformation era - Eyal Livne
Designing products in the digital transformation era - Eyal LivneDesigning products in the digital transformation era - Eyal Livne
Designing products in the digital transformation era - Eyal LivneCodeValue
 
Eerez Pedro: Product thinking 101 - Architecture Next
Eerez Pedro: Product thinking 101 - Architecture NextEerez Pedro: Product thinking 101 - Architecture Next
Eerez Pedro: Product thinking 101 - Architecture NextCodeValue
 
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20CodeValue
 
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...CodeValue
 
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...Magnus Mårtensson: The Cloud challenge is more than just technical – people a...
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...CodeValue
 
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...CodeValue
 
Vered Flis: Because performance matters! Architecture Next 20
Vered Flis: Because performance matters! Architecture Next 20Vered Flis: Because performance matters! Architecture Next 20
Vered Flis: Because performance matters! Architecture Next 20CodeValue
 
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?CodeValue
 
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20CodeValue
 
Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20
Moaid Hathot: Dapr  the glue to your microservices - Architecture Next 20Moaid Hathot: Dapr  the glue to your microservices - Architecture Next 20
Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20CodeValue
 
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20CodeValue
 
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...CodeValue
 
Eran Stiller: API design in the modern era - architecture next 2020
Eran Stiller: API design in the modern era - architecture next 2020 Eran Stiller: API design in the modern era - architecture next 2020
Eran Stiller: API design in the modern era - architecture next 2020 CodeValue
 
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...CodeValue
 

Más de CodeValue (20)

The IDF's journey to the cloud - Merav
The IDF's journey to the cloud - MeravThe IDF's journey to the cloud - Merav
The IDF's journey to the cloud - Merav
 
When your release plan is concluded at the HR office - Hanan Zakai
When your release plan is concluded at the HR office - Hanan  ZakaiWhen your release plan is concluded at the HR office - Hanan  Zakai
When your release plan is concluded at the HR office - Hanan Zakai
 
We come in peace hybrid development with web assembly - Maayan Hanin
We come in peace hybrid development with web assembly - Maayan HaninWe come in peace hybrid development with web assembly - Maayan Hanin
We come in peace hybrid development with web assembly - Maayan Hanin
 
State in stateless serverless functions - Alex Pshul
State in stateless serverless functions - Alex PshulState in stateless serverless functions - Alex Pshul
State in stateless serverless functions - Alex Pshul
 
Will the Real Public API Please Stand Up? Amir Zuker
Will the Real Public API Please Stand Up? Amir ZukerWill the Real Public API Please Stand Up? Amir Zuker
Will the Real Public API Please Stand Up? Amir Zuker
 
Application evolution strategy - Eran Stiller
Application evolution strategy - Eran StillerApplication evolution strategy - Eran Stiller
Application evolution strategy - Eran Stiller
 
Designing products in the digital transformation era - Eyal Livne
Designing products in the digital transformation era - Eyal LivneDesigning products in the digital transformation era - Eyal Livne
Designing products in the digital transformation era - Eyal Livne
 
Eerez Pedro: Product thinking 101 - Architecture Next
Eerez Pedro: Product thinking 101 - Architecture NextEerez Pedro: Product thinking 101 - Architecture Next
Eerez Pedro: Product thinking 101 - Architecture Next
 
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20
Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20
 
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...
Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...
 
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...Magnus Mårtensson: The Cloud challenge is more than just technical – people a...
Magnus Mårtensson: The Cloud challenge is more than just technical – people a...
 
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...
Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...
 
Vered Flis: Because performance matters! Architecture Next 20
Vered Flis: Because performance matters! Architecture Next 20Vered Flis: Because performance matters! Architecture Next 20
Vered Flis: Because performance matters! Architecture Next 20
 
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?
Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?
 
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20
Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20
 
Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20
Moaid Hathot: Dapr  the glue to your microservices - Architecture Next 20Moaid Hathot: Dapr  the glue to your microservices - Architecture Next 20
Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20
 
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20
Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20
 
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...
Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...
 
Eran Stiller: API design in the modern era - architecture next 2020
Eran Stiller: API design in the modern era - architecture next 2020 Eran Stiller: API design in the modern era - architecture next 2020
Eran Stiller: API design in the modern era - architecture next 2020
 
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...
Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Notas del editor

  1. שלום לכולם ותודה רבה שבאתם עד לכאן לשמוע את ההרצאה שלי  אני הולך לדבר על איך אפשר לשלב בינה מלאכותית עם בינה אנושית בתהליכים עיסקיים שונים.
  2. TEASER – show WEB UI
  3. חשוב לי גם לעשות תיאום ציפיות. לא תראו פה אלגוריתמים ומודלים מתחוכמים של ML וזה בגלל שאת העבודה הקשה כבר עשו בשבילנו. מה שאני כן הולך להדגים זה איך אפשר לבנות תהליך מורכב ע"י שילוב נכון של שירותים קיימים
  4. אני אמיר ואני ארכיטקט תוכנה בקודוליו מתעסק לא מעט בעולמות הענן ובמערכות מבוזרות באופן כללי
  5. אני מניח שכולנו נדרשים מדי פעם לבצע עבודה , מונוטונית ורפיטטיבית . למשל להכין הרבה מאוד כדים או סתם לספור כסף. פעולה משעמעמת במיוחד – תסכימו איתי.
  6. אני לא יודע מה איתכם, אבל בתור עצלן מומחה עם תעודות, אני הייתי מעדיף לעשות משהו אחר עם הזמן שלי. ולתת לתוכנה לעבוד בשבילי ותכף תראו למה זה חשוב
  7. אני רוצה לספר לכם סיפור קצר על חתולים וכלבים ואישה נחמדה בשם אורטל שגרה בבאר שבע. אני לא יודע אם אתם יודעים, אבל המצב של חתולי רחוב בב"ש והסביבה הוא לא משהו. ומדי פעם יש גם כלבים נטושים שזקוקים לעזרה. ולכן, אורטל החליטה שהיא רוצה לשנות קצת את המצב והקימה פרויקט שנקרא ספרים ומצילים. במסגרת הפרויקט, אנשים תורמים לאורטל ספרים משומשים שאותם היא תמכור עבור 10 ש"ח לספר. מה שנהדר בפרויקט הזה זה שהוא גם חברתי, גם אקולוגי וגם למען בעלי חיים.
  8. לצורך כך, אורטל הקימה דף פייסבוק שבו קישור לקובץ אקסל עם רשימת הספרים שבמלאי . אין מיון וסינון אין התראות על ספר חדש די מינימליסטי
  9. בנוסף לקובץ האקסל, אפשר גם למצוא את הספרים באתר סימניה – שמאפשר לכל אחד למכור ספרים משומשים.
  10. כך נראה התהליך של רישום ספר למכירה : חיפוש סימון עדכון פרטים
  11. בכל מקרה, הפרויקט רץ כבר כמעט שנה ולאורטל הולך יותר מדי טוב, כי הרבה אנשים פונים אליה לגבי תרומות של ספרים והיא פשוט לא עומדת בקצב, כי גם ככה זה הכל על חשבון הזמן הפנוי שלה
  12. אוקיי, סיפור מעניין, אבל איך אני קשור לזה? מה לי ולפרויקט הזה. אני בכלל לא מהאזור! אז, התשובה הקצרה היא שהתוודעתי לפרויקט ולאורטל דרך הבת זוג שלי שהיא אקטיביסטית בכל הנוגע לבע"ח ושנינו חשבנו שאולי אפשר לייעל קצת התהליך הזה בעזרת תוכנה שתקצר קצת תהליכים.
  13. דבר ראשון, נצטרך דאטא בייס עצום של כל הספרים בארץ, שישמש כרפרנס לחיפוש הספרים יש כל מיני דרכים להשיג את הרשימה הזו – פחות חשוב כרגע איך.
  14. עכשיו, ברגע שהרשימה ברשותנו, כל ספר שמתקבל, מחפשים אותו מול הרשימה הזו, ומעדכנים את המלאי. ושתי שאלות שנשאלות פה, הן: א. איך מוצאים אותו במאגר? ב. איך אפשר להזין כמה ספרים במכה כמו שעינת עושה
  15. אפשר לעשות הקלדה ידנית של הספר אפילו עם השלמה אוטומטית, אבל זה כבר יש לנו עם סימניה ואנחנו רוצים להשתפר.
  16. אז האופציה המאוד מתבקשת היא לסרוק את ה- מסת"ב של הספר. בדקתי את הנושא והבנתי שזה לא טריוואלי בכלל, בגלל שיש ספרים שבכלל יש להם דאנא קוד או פורמט אחר ויש כאלה שאין להם בכלל בחלק מהספרים הקוד בכלל נמצא בכריכה הפנימית של הספר ומה שלא יהיה, השיטה הזו עדיין מחייבת מעבר על כל ספר בנפרד
  17. האופציה השנייה – גם כן מאוד מתבקשת היא .... מי מנחש? כמובן OCR Part of ML בשיטה הזו אני מצלם את הכריכה של הספר וע"י שימוש ב- OCR אני מחלץ את הכותרת ומחפש אותה בעמודת הכותרת ברשימה שהכינותי מראש.
  18. OCR נמצא איתנו כבר הרבה מאוד שנים – הרבה לפני שהתחלנו לראות ML בכל מקום. מה שחדש, יחסית בתחום הזה הוא השיטה. בשיטות הישנות לא השתמשו באלגוריתמים לומדים ובשיטות החדשות, שגם עובדות טוב יותר עבור סוגי כתב ופונטים שונים, כן משתמשים ב- ML מה שגרם לפריחה מחודשת של כל התחום הזה.
  19. האתגר הספיציפי שאני מתמודד איתו בהקשר הזה הוא המגוון הגדול של הפונטים, צבעים, זויות, רקע, טקסטורה לפעמים הכותרת מגיעה עם ניקוד ולפעמים לא. גם תנאי הצילום משפיעים על הזיהוי, כמובן
  20. בדקתי את הנושא ושחקתי קצת עם Google Cloud Vision API שמציע הרבה שירותים שקשורים לעיבוד תמונה ובינהם גם OCR הסיבה שבחרתי בו זה שיש לו תמיכה בעברית – שזה מרבית הספרים שאנשים תורמים. והוא גם יודע לזהות את השפה עצמה וכמובן שלגוגל יש טריינינג סט עצום עם אין ספור צורות כתב
  21. עכשיו, ה- Vision API מציע שתי וריאציות של OCR זיהוי טקסט בכל תמונה זיהוי טקסט במסמכים אני בדקתי את שתי השיטות, ולמרבה הפלא השיטה של המסמכים נתנה תוצאות יותר טובות.
  22. היו שני בעיות מרכזיות: הטקסט שקיבלתי לא היה לגמרי מדויק, כמו שרואים בתמונה פה, ורשימת הספרים שלי מכילה כותרות מדויקות. הבעייה השניה היא שעדיין צריך לצלם כל ספר בנפרד
  23. עוד דוגמה לטקסט שחילץ מנוע ה- OCR
  24. ועוד אחד
  25. ואת זה אני אישית הכי אוהב אבל תכלס אפשר להבין למה ה OCR התקשה פה – בגלל ההשתקפות של האור
  26. בקיצור, אחרי לא מעט ניסוי ותעייה, הייתי ממש קרוב לוותר על כל עניין ה- OCR ובכל זאת, אותי לימדו ש אם קצת קשה אז מוותרים ובלי קשר, עודד אמר להרים ידיים
  27. אבל אז הגיע רגע האהה!
  28. מה שהתגלה בפני, אני לא ממש חייב שזיהוי הכותרת יתן תוצאות מדויקות.
  29. Accuracy – בכמה צדקתי (חיובי או שלילי) באופן יחסי Precision – מכל הניחושים שעשיתי, כמה היו נכונים Recall – כמה פספסתי ביחס לכל מה שהייתי צריך למצוא F1 score - irrelevant
  30. Accuracy – how many did I get right in terms of TP + TN Precision – out of those I got, how many were right Recall – out of all right ones, how many did If get F1 score - irrelevant
  31. עכשיו, למה אמרתי ש Accuracy פחות חשוב מ- precision? מה אם, במקום לחפש את הטקסט שזיהה ה- OCR בעמודת הכותרות שברשימת כל הספרים, אני יעבור על כל הספרים במאגר, יבצע גם עבורם תהליך של OCR ואת התוצאה אוסיף כעמודה נוספת וכך זה נראה!
  32. ואז, כשמגיע ספר חדש, אני כרגיל מצלם אותו ומחלץ ממנו את הכותרת אבל בנגוד למקודם, אני מחפש את הכותרת בעמודה החדשה הזו ואז בעצם אני משווה תפוחים לתפוחים ניסיתי את זה, אבל פה נתקלתי בבעיה אחרת והיא שגם תוצאות ה- OCR לא היו עקביות – בגלל שהתמונה הרשמית של ספר שונה מזו שאני מצלם ואז הכותרת המזוהה יכולה להיות קצת שונה ו/או עם שגיאות כתיב שונות.
  33. ולכן בשביל מצבים כאלה השתמשתי ב Elasticsearch ספיציפית בפיצ'ר של Fuzzy search שמאפשר חיפוש עם שגיאות כתיב ואי-דיוקים אחרים השיטה שבה אלסטיק עושה שימוש נקראת Levenshtein distance
  34. האלגוריתם של לבנשטיין עובד ע"י ספירת כמו השינויים שמילה צריכה לעבור בשביל להפוך למילה אחרת. Elasticsearch מאפשר לעשות שאילתה מהסוג הזה תוך כדי ציון המרחק המקסימלי הנסבל, ותוצאות החיפוש מגיעות עם Score שנו אני משתמש אח"כ בשביל למיין את התוצאות
  35. ועם כל זה, עדיין היו טעויות פה ושם, וכאן נכנסת הבינה האנושית. כלומר, חלק מהתהליך של זיהוי ספרים יערב בני אדם. לזה יש שקוראים HITL= Human In The Loop והרעיון הזה לא חדש בכלל – למשל בכביש 6 הוא קיים מאז ומעולם בשביל זיהוי לוחיות רישוי.
  36. מה לגבי הבעיה של לצלם כל ספר בנפרד? למה בעצם לא לצלם ערימה של ספרים במכה? זה בעייתי כי ה-או-סי-אר יבלבל בין הכותרות. אבל יש פתרון והוא שימוש בשירות אחר של Google Cloud Vision שמאפשר לי למצוא אובייקטים בודדים בתמונה אחת
  37. ואז בעצם אני מפרק תמונה אחת להרבה תמונות קטנות ומשם אני מבצע את התהליך שכיסינו עד עכשיו לשירות הזה קוראים Object Detection
  38. דיברנו קצת על מטריקות ביחס לשלבי הזיהוי שלנו, עכשיו זה הזמן לראות איזה מטריקה מתאימה לאיזה שלב
  39. I care less about FP as long as the TP is high, because I can always ignore FP + the OCR and Elastic won't find text and books respectively anyway!
  40. Here we need precision – as long as we get consistent results with the catalog
  41. We need high recall, because we don't want to miss the actual book, even if it means more HITL! The more precise, the less we need HITL 999 TN doesn't help much, if the actual book was not returned! I can increase the FUZZINESS parameter => lots of FP A too low value => FN
  42. בוא נראה איך זה עובד... Open the IMAGE file, then Copy it to the folder Show how the process works Show the BLOB containers Show the Jobs index – POSTMAN Open the email and follow the link to Open the WEB UI
  43. זה בקצרה מה שקורא ... בסוף, אני כמובן משתמש ב- HaaS=Human as a Service
  44. Detect individual books – how are images uploaded Textify books Search in Elasticsearch
  45. מבחינת עלויות, הפרויקט הזה לא הרבה עולה. למשל, בגוגל קלאוד אפשר לבצע 1000 פעולות OCR בחודש בחינם בשביל יותר מזה, אני מתחיל לשלם 1.5$ עבור כל 1000 נוספים וזה לא נורא בכלל לגבי אז'ור, עיקר השימוש שלי הוא ב- Storage & Network וגם שם אני לא עובר את המכסה החינמית
  46. ומה שאני אהבתי לראות בפרויקט הזה הוא איך השילוב של בינה מלאכותית עם כלים שונים וכמובן, עם בינה טבעית, יכול לתת כיסוי מלא יותר ממה ש- pure ML היה נותן לבד כלומר, מעבר לחשיבות הגדולה שיש בבני אדם בתהליך ה- Training השילוב של בני אדם בתהליך עסקי הוא חשוב לא פחות
  47. לא צריך להיות אשף מתימטיקה שירותי הענן הם פשוטים מאוד לשימוש ולעיתים זולים, אבל חשוב לבחור לאיזה מטריקות להתייחס ובקונטקסט הנכון וכמובן, השילוב של שתי סוגי האינטיליגנציה הוא שילוב מנצח
  48. אז מה היה לנו? התחלנו בחיפוש ידני וקצת קופי פייסט וסיימנו בתהליך כמעט אוטומטי עם קצת מעורבות אנושית. מבחינת המשכיות הפרויקט, מן הסתם ננסה להתחרות ב- Amazon ונקנה אותם ולהקים חנות אינטרנטית שלמה וזה הזמן להגיד שאנחנו מחפשים מתנדבים שייעזרו בפיתוח - אז צרו איתי קשר אם אתם בעניין. יש מלא עבודה לעשות.
  49. Not using Google, because: 1. I don’t get scores for each result 2. I’ll need to scrape Google (and later the source site) anyway, which I’m not sure is legal. Anyhow, see this regarding bot detection and blocking. 3. Still need to maintain my own catalog 4. Side-covers are photographed + inaccurate results to the point of it being inefficient