How I built a ml human hybrid workflow using computer vision - Amir Shitrit

•Descargar como PPTX, PDF•

0 recomendaciones•1,745 vistas

While not new at all, Machine Learning has been on the rise of the past years, both because of the ubiquity of data and because of the increase in adoption of Cloud Computing. In recent years, however, ML has become more prevalent than ever - mainly due to its ease of use and its accessibility to non-mathematicians. In some cases, ML can do things that would’ve been extremely difficult, if not impossible, for us to achieve in the past. In other cases, however, ML is here to assist us, rather than replace us, by relieving us of our most boring and repetitive tasks, and this often has to do with the low accuracy in which ML models operate. In this talk we are going to build business workflows using the joint effort of humans and software to automate those boring tasks, while compensating for the inaccuracy of ML with human intervention.

Tecnología

How I built a ML-human hybrid
workflow using Computer Vision
Amir Shitrit
Software Architect
amirs@codevalue.net
@amir_shitrit
http://codevalue.net

About Me
Amir Shitrit
 Software Architect
 Love the cloud and also distributed systems
 And animals!
4

What do I have to do with it?
12
Me, when I was younger

First things first
13
 my own digital catalog

Then, search the book in the catalog
 But how?
14

Option 1
 BARCODE
 DANACODE
 WHATEVERCODE

OCR Services
 TEXT_DETECTION
21
 DOCUMENT_TEXT_DETECTION

Problems with OCR
 Not exactly accurate
 Photographing each book individually
22

About to give up
26
Photo by Steve Johnson on Unsplash

When you come to think of it …
 Who needs accuracy anyway?
28
Photo by Katerina Holmes from Pexels

Some classification-related terms
29
accuracy =
TP + TN
total
precision =
TP
actual results
recall =
TP
predicted results

Some classification
-related terms
 Accuracy
 Precision
 Recall
30

Still some differences
 ‫ספורי‬ ‫אוצר‬
6
‫לפני‬ Y
 ‫הענה‬ ‫לפני‬ ‫ספורים‬ ‫אוצר‬
33

Fuzzy search in Elasticsearch
 https://en.wikipedia.org/wiki/Levenshtein_distance
34

Fuzzy search in Elasticsearch
 https://en.wikipedia.org/wiki/Levenshtein_distance
35

And if that doesn’t work?
 HaaS = Human as a Service
 HITL
36

Back to the metrics
 Object Detection
 OCR
 Fuzzy Search
39

Which metrics should I use for: Object Detection
40

 Precision
41
Which metrics should I use for: OCR

Which metrics should I use? (Fuzzy) Search in catalog
42

Costs
46
 Google Cloud Vision API
 1K OBJECT_DETECTIONS = free
 1K OCR = also free
 Every next 1K is 1.5$
 Azure
 Storage – $0.0196 per GB
 (hot LRS standard storage)
 Egress - $0.05 per GB
 (for serving the beautiful web UI)
 Compute – currently on-prem

Key takeaways
 Ain’t no need a math wiz
 Cloud services are easy to use, but
 Choose the right metrics for the right steps
 AI + NI = Better together
 If you can’t join them, beat (the crap out of) them
48

Resources
 Google Cloud Vision API
 https://cloud.google.com/vision
 Elasticsearch – Fuzzy query
 https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-
query.html
 HITL on Wikipedia
 https://en.wikipedia.org/wiki/Human-in-the-loop
 Binary classification metrics
 https://en.wikipedia.org/wiki/Binary_classification
 https://medium.com/@shrutisaxena0617/precision-vs-recall-386cf9f89488
50

Amir Shitrit
Software Architect
amirs@codevalue.net
@amir_Shitrit
http://codevalue.net

Más contenido relacionado

La actualidad más candente

Azure - The Good PartsMark Allan

Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...NetworkCollaborators

Google Cloud IoT CoreIdo Flatow

Spring Cloud KubernetesMauricio (Salaboy) Salatino

Intelligent Integrations with Azure, Logic Apps and BizTalkAdam Walhout

Serverless Logging ArchitectureNarendran R

Iot meets ServerlessNarendran R

Digital transformation buzzword or reality - Alon FliessCodeValue

Cloud computingKrista Godfrey

Google cloudvipin ojha

Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSSMauricio (Salaboy) Salatino

3 slides 1PitchWorx

Integrate 2017 unlock azure hybrid integration with biz talk - wsWagner Silveira

FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” ArchitecturesFIWARE

Activiti Cloud Overview & BluePrint: Trending Topic CampaignsMauricio (Salaboy) Salatino

SnapLogic Live: Workday IntegrationSnapLogic

10 predictions for cloud native in 2021Cheryl Hung

TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...tdc-globalcode

Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Mark Heckler

REX: Cloud Native Apps on a K8S stackMathieu Herbert

La actualidad más candente (20)

Azure - The Good Parts

Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...

Google Cloud IoT Core

Spring Cloud Kubernetes

Intelligent Integrations with Azure, Logic Apps and BizTalk

Serverless Logging Architecture

Iot meets Serverless

Digital transformation buzzword or reality - Alon Fliess

Cloud computing

Google cloud

Lessons Learned: From Java EE to Spring Cloud in the context of Activiti OSS

3 slides 1

Integrate 2017 unlock azure hybrid integration with biz talk - ws

FIWARE Global Summit - Edge/Fog Computing in “Powered by FIWARE” Architectures

Activiti Cloud Overview & BluePrint: Trending Topic Campaigns

SnapLogic Live: Workday Integration

10 predictions for cloud native in 2021

TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...

Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...

REX: Cloud Native Apps on a K8S stack

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Ml3poovarasu maniandan

Overview of Machine Learning and Feature EngineeringTuri, Inc.

A gentle introduction to relational learning Nikolaos Vasiloglou

On Machine Learning ReadinessAnne-Marie Tousch

From DevOps to MLOps: practical steps for a smooth transitionAnne-Marie Tousch

Microsoft AI Platform - AETHER IntroductionKarthik Murugesan

AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...Amazon Web Services

An Introduction to Machine LearningAngelo Simone Scotto

Autonomous Discovery: The New Interface?Data Science London

Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech

Anything data (revisited)Ahmet Akyol

AI | Now + NextAnkit Sharma

antrikshindutrialmachinelearningPPT.pptxAnkitMishra616883

Welcome-to-AI-Focused-CourseLast.pptxMohamedSaied316569

Machine Learning & AIDtech Systems Co.

AI @ Microsoft, How we do it and how you can too!Microsoft Tech Community

MLOps for living: Infrastructure-as-Code on AWSAntonChernov9

seminar_HITEC.pptxMuhammadAttiquekhan1

Keepler | Understanding your own predictive modelsKeepler Data Tech

Künstlich intelligent?inovex GmbH

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit (20)

Ml3

Overview of Machine Learning and Feature Engineering

A gentle introduction to relational learning

On Machine Learning Readiness

From DevOps to MLOps: practical steps for a smooth transition

Microsoft AI Platform - AETHER Introduction

AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...

An Introduction to Machine Learning

Autonomous Discovery: The New Interface?

Keepler Data Tech | Entendiendo tus propios modelos predictivos

Anything data (revisited)

AI | Now + Next

antrikshindutrialmachinelearningPPT.pptx

Welcome-to-AI-Focused-CourseLast.pptx

Machine Learning & AI

AI @ Microsoft, How we do it and how you can too!

MLOps for living: Infrastructure-as-Code on AWS

seminar_HITEC.pptx

Keepler | Understanding your own predictive models

Künstlich intelligent?

Más de CodeValue

The IDF's journey to the cloud - MeravCodeValue

When your release plan is concluded at the HR office - Hanan ZakaiCodeValue

We come in peace hybrid development with web assembly - Maayan HaninCodeValue

State in stateless serverless functions - Alex PshulCodeValue

Will the Real Public API Please Stand Up? Amir ZukerCodeValue

Application evolution strategy - Eran StillerCodeValue

Designing products in the digital transformation era - Eyal LivneCodeValue

Eerez Pedro: Product thinking 101 - Architecture NextCodeValue

Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20CodeValue

Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...CodeValue

Magnus Mårtensson: The Cloud challenge is more than just technical – people a...CodeValue

Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...CodeValue

Vered Flis: Because performance matters! Architecture Next 20CodeValue

Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?CodeValue

Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20CodeValue

Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20CodeValue

Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20CodeValue

Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...CodeValue

Eran Stiller: API design in the modern era - architecture next 2020 CodeValue

Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...CodeValue

Más de CodeValue (20)

The IDF's journey to the cloud - Merav

When your release plan is concluded at the HR office - Hanan Zakai

We come in peace hybrid development with web assembly - Maayan Hanin

State in stateless serverless functions - Alex Pshul

Will the Real Public API Please Stand Up? Amir Zuker

Application evolution strategy - Eran Stiller

Designing products in the digital transformation era - Eyal Livne

Eerez Pedro: Product thinking 101 - Architecture Next

Alon Fliess: APM – What Is It, and Why Do I Need It? - Architecture Next 20

Amir Zuker: Building web apps with web assembly and blazor - Architecture Nex...

Magnus Mårtensson: The Cloud challenge is more than just technical – people a...

Nir Doboviski: In Space No One Can Hear Microservices Scream – a Microservice...

Vered Flis: Because performance matters! Architecture Next 20

Vitali zaidman Do You Need Server Side Rendering? What Are The Alternatives?

Ronen Levinson: Unified policy enforcement with opa - Architecture Next 20

Moaid Hathot: Dapr the glue to your microservices - Architecture Next 20

Eyal Ellenbogen: Building a UI Foundation for Scalability - Architecture Next 20

Michael Donkhin: Java Turns 25 - How Is It Faring and What Is Yet to Come Arc...

Eran Stiller: API design in the modern era - architecture next 2020

Alex Pshul: What We Learned by Testing Execution of 300K Messages/Min in a Se...

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Developing An App To Navigate The Roads of BrazilV3cube

How I built a ml human hybrid workflow using computer vision - Amir Shitrit

1. How I built a ML-human hybrid workflow using Computer Vision Amir Shitrit Software Architect amirs@codevalue.net @amir_shitrit http://codevalue.net

2. 2

3. What to (not) expect 3

4. About Me Amir Shitrit  Software Architect  Love the cloud and also distributed systems  And animals! 4

5. 5

6. 6

7. A short story 7

8. The “catalog” 8

9. Also 9

10. Selling on Simania

11. 11 Photo by Ed Robertson on Unsplash

12. What do I have to do with it? 12 Me, when I was younger

13. First things first 13  my own digital catalog

14. Then, search the book in the catalog  But how? 14

15. 15

16. Option 1  BARCODE  DANACODE  WHATEVERCODE

17. Option 2  OCR 17

18. Some history about OCR 18

19. The challenge 19

20. Using Google’s 20

21. OCR Services  TEXT_DETECTION 21  DOCUMENT_TEXT_DETECTION

22. Problems with OCR  Not exactly accurate  Photographing each book individually 22

23. 23

24. 24

25. The best one! 25

26. About to give up 26 Photo by Steve Johnson on Unsplash

27. Aha! 27

28. When you come to think of it …  Who needs accuracy anyway? 28 Photo by Katerina Holmes from Pexels

29. Some classification-related terms 29 accuracy = TP + TN total precision = TP actual results recall = TP predicted results

30. Some classification -related terms  Accuracy  Precision  Recall 30

31. Precision over accuracy 31

32. Search for in “broken” text 32

33. Still some differences  ‫ספורי‬ ‫אוצר‬ 6 ‫לפני‬ Y  ‫הענה‬ ‫לפני‬ ‫ספורים‬ ‫אוצר‬ 33

34. Fuzzy search in Elasticsearch  https://en.wikipedia.org/wiki/Levenshtein_distance 34

35. Fuzzy search in Elasticsearch  https://en.wikipedia.org/wiki/Levenshtein_distance 35

36. And if that doesn’t work?  HaaS = Human as a Service  HITL 36

37. What about multiple books? 37

38. Again, Vision API to the rescue 38

39. Back to the metrics  Object Detection  OCR  Fuzzy Search 39

40. Which metrics should I use for: Object Detection 40

41.  Precision 41 Which metrics should I use for: OCR

42. Which metrics should I use? (Fuzzy) Search in catalog 42

43. Demo  Demo 43

44. What we just saw 44

45. Code 45

46. Costs 46  Google Cloud Vision API  1K OBJECT_DETECTIONS = free  1K OCR = also free  Every next 1K is 1.5$  Azure  Storage – $0.0196 per GB  (hot LRS standard storage)  Egress - $0.05 per GB  (for serving the beautiful web UI)  Compute – currently on-prem

47. So… 47

48. Key takeaways  Ain’t no need a math wiz  Cloud services are easy to use, but  Choose the right metrics for the right steps  AI + NI = Better together  If you can’t join them, beat (the crap out of) them 48

49. What’s next?  Taking on Amazon 49

50. Resources  Google Cloud Vision API  https://cloud.google.com/vision  Elasticsearch – Fuzzy query  https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy- query.html  HITL on Wikipedia  https://en.wikipedia.org/wiki/Human-in-the-loop  Binary classification metrics  https://en.wikipedia.org/wiki/Binary_classification  https://medium.com/@shrutisaxena0617/precision-vs-recall-386cf9f89488 50

51. Q A 51

52. Amir Shitrit Software Architect amirs@codevalue.net @amir_Shitrit http://codevalue.net

Notas del editor

שלום לכולם ותודה רבה שבאתם עד לכאן לשמוע את ההרצאה שלי  אני הולך לדבר על איך אפשר לשלב בינה מלאכותית עם בינה אנושית בתהליכים עיסקיים שונים.
TEASER – show WEB UI
חשוב לי גם לעשות תיאום ציפיות. לא תראו פה אלגוריתמים ומודלים מתחוכמים של ML וזה בגלל שאת העבודה הקשה כבר עשו בשבילנו. מה שאני כן הולך להדגים זה איך אפשר לבנות תהליך מורכב ע"י שילוב נכון של שירותים קיימים
אני אמיר ואני ארכיטקט תוכנה בקודוליו מתעסק לא מעט בעולמות הענן ובמערכות מבוזרות באופן כללי
אני מניח שכולנו נדרשים מדי פעם לבצע עבודה , מונוטונית ורפיטטיבית . למשל להכין הרבה מאוד כדים או סתם לספור כסף. פעולה משעמעמת במיוחד – תסכימו איתי.
אני לא יודע מה איתכם, אבל בתור עצלן מומחה עם תעודות, אני הייתי מעדיף לעשות משהו אחר עם הזמן שלי. ולתת לתוכנה לעבוד בשבילי ותכף תראו למה זה חשוב
אני רוצה לספר לכם סיפור קצר על חתולים וכלבים ואישה נחמדה בשם אורטל שגרה בבאר שבע. אני לא יודע אם אתם יודעים, אבל המצב של חתולי רחוב בב"ש והסביבה הוא לא משהו. ומדי פעם יש גם כלבים נטושים שזקוקים לעזרה. ולכן, אורטל החליטה שהיא רוצה לשנות קצת את המצב והקימה פרויקט שנקרא ספרים ומצילים. במסגרת הפרויקט, אנשים תורמים לאורטל ספרים משומשים שאותם היא תמכור עבור 10 ש"ח לספר. מה שנהדר בפרויקט הזה זה שהוא גם חברתי, גם אקולוגי וגם למען בעלי חיים.
לצורך כך, אורטל הקימה דף פייסבוק שבו קישור לקובץ אקסל עם רשימת הספרים שבמלאי . אין מיון וסינון אין התראות על ספר חדש די מינימליסטי
בנוסף לקובץ האקסל, אפשר גם למצוא את הספרים באתר סימניה – שמאפשר לכל אחד למכור ספרים משומשים.
כך נראה התהליך של רישום ספר למכירה : חיפוש סימון עדכון פרטים
בכל מקרה, הפרויקט רץ כבר כמעט שנה ולאורטל הולך יותר מדי טוב, כי הרבה אנשים פונים אליה לגבי תרומות של ספרים והיא פשוט לא עומדת בקצב, כי גם ככה זה הכל על חשבון הזמן הפנוי שלה
אוקיי, סיפור מעניין, אבל איך אני קשור לזה? מה לי ולפרויקט הזה. אני בכלל לא מהאזור! אז, התשובה הקצרה היא שהתוודעתי לפרויקט ולאורטל דרך הבת זוג שלי שהיא אקטיביסטית בכל הנוגע לבע"ח ושנינו חשבנו שאולי אפשר לייעל קצת התהליך הזה בעזרת תוכנה שתקצר קצת תהליכים.
דבר ראשון, נצטרך דאטא בייס עצום של כל הספרים בארץ, שישמש כרפרנס לחיפוש הספרים יש כל מיני דרכים להשיג את הרשימה הזו – פחות חשוב כרגע איך.
עכשיו, ברגע שהרשימה ברשותנו, כל ספר שמתקבל, מחפשים אותו מול הרשימה הזו, ומעדכנים את המלאי. ושתי שאלות שנשאלות פה, הן: א. איך מוצאים אותו במאגר? ב. איך אפשר להזין כמה ספרים במכה כמו שעינת עושה
אפשר לעשות הקלדה ידנית של הספר אפילו עם השלמה אוטומטית, אבל זה כבר יש לנו עם סימניה ואנחנו רוצים להשתפר.
אז האופציה המאוד מתבקשת היא לסרוק את ה- מסת"ב של הספר. בדקתי את הנושא והבנתי שזה לא טריוואלי בכלל, בגלל שיש ספרים שבכלל יש להם דאנא קוד או פורמט אחר ויש כאלה שאין להם בכלל בחלק מהספרים הקוד בכלל נמצא בכריכה הפנימית של הספר ומה שלא יהיה, השיטה הזו עדיין מחייבת מעבר על כל ספר בנפרד
האופציה השנייה – גם כן מאוד מתבקשת היא .... מי מנחש? כמובן OCR Part of ML בשיטה הזו אני מצלם את הכריכה של הספר וע"י שימוש ב- OCR אני מחלץ את הכותרת ומחפש אותה בעמודת הכותרת ברשימה שהכינותי מראש.
OCR נמצא איתנו כבר הרבה מאוד שנים – הרבה לפני שהתחלנו לראות ML בכל מקום. מה שחדש, יחסית בתחום הזה הוא השיטה. בשיטות הישנות לא השתמשו באלגוריתמים לומדים ובשיטות החדשות, שגם עובדות טוב יותר עבור סוגי כתב ופונטים שונים, כן משתמשים ב- ML מה שגרם לפריחה מחודשת של כל התחום הזה.
האתגר הספיציפי שאני מתמודד איתו בהקשר הזה הוא המגוון הגדול של הפונטים, צבעים, זויות, רקע, טקסטורה לפעמים הכותרת מגיעה עם ניקוד ולפעמים לא. גם תנאי הצילום משפיעים על הזיהוי, כמובן
בדקתי את הנושא ושחקתי קצת עם Google Cloud Vision API שמציע הרבה שירותים שקשורים לעיבוד תמונה ובינהם גם OCR הסיבה שבחרתי בו זה שיש לו תמיכה בעברית – שזה מרבית הספרים שאנשים תורמים. והוא גם יודע לזהות את השפה עצמה וכמובן שלגוגל יש טריינינג סט עצום עם אין ספור צורות כתב
עכשיו, ה- Vision API מציע שתי וריאציות של OCR זיהוי טקסט בכל תמונה זיהוי טקסט במסמכים אני בדקתי את שתי השיטות, ולמרבה הפלא השיטה של המסמכים נתנה תוצאות יותר טובות.
היו שני בעיות מרכזיות: הטקסט שקיבלתי לא היה לגמרי מדויק, כמו שרואים בתמונה פה, ורשימת הספרים שלי מכילה כותרות מדויקות. הבעייה השניה היא שעדיין צריך לצלם כל ספר בנפרד
עוד דוגמה לטקסט שחילץ מנוע ה- OCR
ועוד אחד
ואת זה אני אישית הכי אוהב אבל תכלס אפשר להבין למה ה OCR התקשה פה – בגלל ההשתקפות של האור
בקיצור, אחרי לא מעט ניסוי ותעייה, הייתי ממש קרוב לוותר על כל עניין ה- OCR ובכל זאת, אותי לימדו ש אם קצת קשה אז מוותרים ובלי קשר, עודד אמר להרים ידיים
אבל אז הגיע רגע האהה!
מה שהתגלה בפני, אני לא ממש חייב שזיהוי הכותרת יתן תוצאות מדויקות.
Accuracy – בכמה צדקתי (חיובי או שלילי) באופן יחסי Precision – מכל הניחושים שעשיתי, כמה היו נכונים Recall – כמה פספסתי ביחס לכל מה שהייתי צריך למצוא F1 score - irrelevant
Accuracy – how many did I get right in terms of TP + TN Precision – out of those I got, how many were right Recall – out of all right ones, how many did If get F1 score - irrelevant
עכשיו, למה אמרתי ש Accuracy פחות חשוב מ-precision? מה אם, במקום לחפש את הטקסט שזיהה ה- OCR בעמודת הכותרות שברשימת כל הספרים, אני יעבור על כל הספרים במאגר, יבצע גם עבורם תהליך של OCR ואת התוצאה אוסיף כעמודה נוספת וכך זה נראה!
ואז, כשמגיע ספר חדש, אני כרגיל מצלם אותו ומחלץ ממנו את הכותרת אבל בנגוד למקודם, אני מחפש את הכותרת בעמודה החדשה הזו ואז בעצם אני משווה תפוחים לתפוחים ניסיתי את זה, אבל פה נתקלתי בבעיה אחרת והיא שגם תוצאות ה- OCR לא היו עקביות – בגלל שהתמונה הרשמית של ספר שונה מזו שאני מצלם ואז הכותרת המזוהה יכולה להיות קצת שונה ו/או עם שגיאות כתיב שונות.
ולכן בשביל מצבים כאלה השתמשתי ב Elasticsearch ספיציפית בפיצ'ר של Fuzzy search שמאפשר חיפוש עם שגיאות כתיב ואי-דיוקים אחרים השיטה שבה אלסטיק עושה שימוש נקראת Levenshtein distance
האלגוריתם של לבנשטיין עובד ע"י ספירת כמו השינויים שמילה צריכה לעבור בשביל להפוך למילה אחרת. Elasticsearch מאפשר לעשות שאילתה מהסוג הזה תוך כדי ציון המרחק המקסימלי הנסבל, ותוצאות החיפוש מגיעות עם Score שנו אני משתמש אח"כ בשביל למיין את התוצאות
ועם כל זה, עדיין היו טעויות פה ושם, וכאן נכנסת הבינה האנושית. כלומר, חלק מהתהליך של זיהוי ספרים יערב בני אדם. לזה יש שקוראים HITL= Human In The Loop והרעיון הזה לא חדש בכלל – למשל בכביש 6 הוא קיים מאז ומעולם בשביל זיהוי לוחיות רישוי.
מה לגבי הבעיה של לצלם כל ספר בנפרד? למה בעצם לא לצלם ערימה של ספרים במכה? זה בעייתי כי ה-או-סי-אר יבלבל בין הכותרות. אבל יש פתרון והוא שימוש בשירות אחר של Google Cloud Vision שמאפשר לי למצוא אובייקטים בודדים בתמונה אחת
ואז בעצם אני מפרק תמונה אחת להרבה תמונות קטנות ומשם אני מבצע את התהליך שכיסינו עד עכשיו לשירות הזה קוראים Object Detection
דיברנו קצת על מטריקות ביחס לשלבי הזיהוי שלנו, עכשיו זה הזמן לראות איזה מטריקה מתאימה לאיזה שלב
I care less about FP as long as the TP is high, because I can always ignore FP + the OCR and Elastic won't find text and books respectively anyway!
Here we need precision – as long as we get consistent results with the catalog
We need high recall, because we don't want to miss the actual book, even if it means more HITL! The more precise, the less we need HITL 999 TN doesn't help much, if the actual book was not returned! I can increase the FUZZINESS parameter => lots of FP A too low value => FN
בוא נראה איך זה עובד... Open the IMAGE file, then Copy it to the folder Show how the process works Show the BLOB containers Show the Jobs index – POSTMAN Open the email and follow the link to Open the WEB UI
זה בקצרה מה שקורא ... בסוף, אני כמובן משתמש ב- HaaS=Human as a Service
Detect individual books – how are images uploaded Textify books Search in Elasticsearch
מבחינת עלויות, הפרויקט הזה לא הרבה עולה. למשל, בגוגל קלאוד אפשר לבצע 1000 פעולות OCR בחודש בחינם בשביל יותר מזה, אני מתחיל לשלם 1.5$ עבור כל 1000 נוספים וזה לא נורא בכלל לגבי אז'ור, עיקר השימוש שלי הוא ב- Storage & Network וגם שם אני לא עובר את המכסה החינמית
ומה שאני אהבתי לראות בפרויקט הזה הוא איך השילוב של בינה מלאכותית עם כלים שונים וכמובן, עם בינה טבעית, יכול לתת כיסוי מלא יותר ממה ש- pure ML היה נותן לבד כלומר, מעבר לחשיבות הגדולה שיש בבני אדם בתהליך ה- Training השילוב של בני אדם בתהליך עסקי הוא חשוב לא פחות
לא צריך להיות אשף מתימטיקה שירותי הענן הם פשוטים מאוד לשימוש ולעיתים זולים, אבל חשוב לבחור לאיזה מטריקות להתייחס ובקונטקסט הנכון וכמובן, השילוב של שתי סוגי האינטיליגנציה הוא שילוב מנצח
אז מה היה לנו? התחלנו בחיפוש ידני וקצת קופי פייסט וסיימנו בתהליך כמעט אוטומטי עם קצת מעורבות אנושית. מבחינת המשכיות הפרויקט, מן הסתם ננסה להתחרות ב- Amazon ונקנה אותם ולהקים חנות אינטרנטית שלמה וזה הזמן להגיד שאנחנו מחפשים מתנדבים שייעזרו בפיתוח - אז צרו איתי קשר אם אתם בעניין. יש מלא עבודה לעשות.
Not using Google, because: 1. I don’t get scores for each result 2. I’ll need to scrape Google (and later the source site) anyway, which I’m not sure is legal. Anyhow, see this regarding bot detection and blocking. 3. Still need to maintain my own catalog 4. Side-covers are photographed + inaccurate results to the point of it being inefficient

How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Similar a How I built a ml human hybrid workflow using computer vision - Amir Shitrit (20)

Más de CodeValue

Más de CodeValue (20)

Último

Último (20)

How I built a ml human hybrid workflow using computer vision - Amir Shitrit

Notas del editor