SlideShare una empresa de Scribd logo
1 de 36
Choosing the Right Document
Processing Solution for
Healthcare Organizations
Presented by:
Iskandar Sitdikov, ML Solutions Architect @ Provectus
Stepan Pushkarev, CTO @ Provectus
Webinar Objectives
1. Provide an overview of the market for document processing solutions
2. Outline critical factors for choosing the right document processing solution
for your healthcare use case
1. Strategize on whether you should look for a ready-made solution to purchase,
or to build a custom solution of your own
1. Get qualified for the Provectus IDP Solution Discovery Program
Agenda
1. Introduction
2. Healthcare use cases
3. Document processing in 60 seconds
4. Solutions map, advantages, and problems
5. Evaluation
Introductions
Iskandar Sitdikov
ML Solutions Architect
Provectus
Stepan Pushkarev
Chief Technology Officer
Provectus
AI-first Consultancy & Solutions Provider
500 employees and
growing
Established in 2010
HQ in Palo Alto
Offices in North America,
LATAM, and Europe
Machine Learning DevOps
Big Data Analytics
We are obsessed about leveraging cloud, data, and AI to reimagine the way
businesses operate, compete, and deliver customer value
Our Clients
Innovative Tech Vendors
Seeking for niche expertise to
differentiate and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation,
achieve operational excellence
Healthcare Use Cases
Document processing 101
Use cases:
Clinical notes, medical records,
insurance medical claims, clinical
studies, medical imaging reports, lab
reports, and transfers. Administrative
overhead to process data from these
types of documents is huge.
Main benefits:
Operational speed and cost reduction. In
our practice, we see 2-8x сost reduction
compared to a fully manual process and
30%+ savings in comparison to legacy
OCR solutions.
Healthcare Use Cases
Use cases:
Clinical notes, medical records, insurance medical
claims, clinical studies, medical imaging reports,
lab reports, and transfers. Administrative
overhead to process data from these types of
documents is huge.
Main benefits:
Operational speed and cost reduction. In our
practice, we see 2-8x сost reduction compared to
a fully manual process and 30%+ savings in
comparison to legacy OCR solutions.
Healthcare Use Cases
Clinician notes
Claims
Transfer summaries Medical imaging reports Lab reports
Medical record
Clinical studies
General goal is to spot main entities in the
document (paragraphs, forms, tables, etc.)
and then successfully identify written text
in them (segmentation and OCR).
Both problems can be resolved separately or
using end-to-end networks.
IDP / CV
Context search on data from OCR + segmentation
Forms and tables greatly impact overall performance. Data extraction from forms is resolved (due to a
straightforward key-value structure). Tables are still a pain point for all data extractors. For unstructured texts,
deep networks are a solution at this point. Ex: BERT — good for finding key-value (question / answer) pairs
in context.
IDP / Data Extraction
Evaluation of the document
processing model is a task in
progress.
Results with a low-confidence
score and missing information
are forwarded to human experts.
Samples of successfully extracted
information are also forwarded to
human experts for evaluation.
IDP / Evaluation and Monitoring
Data lake + Ontology specifications
Fast Healthcare Interoperability Resources (FHIR)
is a standard describing data formats and
elements and an application programming
interface for exchanging electronic health records.
The standard was created by the Health Level
Seven International healthcare standards
organization.
IDP / Storage
Data lake + Ontology specifications
Fast Healthcare Interoperability Resources (FHIR)
is a standard describing data formats and
elements and an application programming
interface for exchanging electronic health records.
The standard was created by the Health Level
Seven International healthcare standards
organization.
IDP / Storage
Storage
Hospitals
Providers
Pharmaceutical
companies
Patients
Labs
Health plans
Automation encapsulates all processes mentioned above
and unites them into one single product, featuring:
● Document capture
● Model lifecycle
○ Labeling
○ (Re)Training
○ Evaluation
○ Monitoring
● Human-in-the-loop
● Integrations
● System monitoring
IDP / Automation
IDP is more than just OCR. To resolve the problem in-house, you need
to take care of data capture, data ingestion, preprocessing, OCR, data
extraction, evaluation, and further integrations to destination systems.
Bottleneck: Tables and unstructured text
IDP / Takeaways
Solutions Landscape
Market Overview
Documents are everywhere... and solutions for document processing are everywhere, too!
Competitive Landscape
Major technology platforms offer general-
purpose technology components for
document processing, such as:
● Amazon Textract + Comprehend
● Google Document AI
● Microsoft Azure Form Recognizer
Solutions: Cloud Vendors
Pros:
● Cloud infrastructure and integration
● Long lifespan and support
● Constant development
Cons:
● General purpose a.k.a require
additional work to extract necessary
information and integrate with current
workflows
These are emerging use case-focused vendors
that offer solutions using AI-native platforms to
tackle the most demanding automation
challenges. They can handle more complex
documents with a greater variability. As a result,
they often deliver a better business impact than
obsolete technologies. Since they are free from
legacy technical debt, it is easier for them to
build next-gen, future-oriented solutions.
Solutions: Startups
Pros:
● Modern tech
● Constant development
● More focused applications
● Support — For a new independent player, support is
one of the highest priorities to gain customer loyalty
Cons:
● Only few startups in this market can survive
competition with big vendors
● Challenging to customize
● May not align with your cloud strategy
● Support — On the other hand, new startups might
struggle with support
Legacy vendors typically build IDP
solutions on top of legacy platforms.
Niche vendors that are focused on limited
types of documents and use cases. You
might find hidden gems here!
Vendors that restructure your documents
workflow by introducing standard types of
documents, which are really easy to
process.
Solutions: Other Vendors
Pros:
● Wide variety of integrations
● Niche use cases
● Large portfolio of clients
Cons:
● In some cases, they rely on outdated,
less performant technologies
● Document flow restructure
System Integrators may offer IDP
as part of their portfolio of
solutions. Their IDP offering may
be a solution from another IDP
vendor or developed in-house.
Solutions: System Integrators
Vendors Evaluation Methodology
What to Choose?
Now, you have all the information about
possible go-to solutions in your market
segment. What’s next?
You need to fairly compare each and every
solution to choose one that fits and aligns
with your use case the most.
Deep evaluation is key to making the right
decision.
Data
● EDA (exploratory data analysis) — Knowing your
data is the key to success
● Sample data based on EDA
● Use this data as the evaluation dataset for
measuring performance of solutions on the
market / in the segment
Composite Index
● F1, Accuracy, Recall, etc.
● Robustness
● Key, value extraction
● Table data
● Language, character recognition, spelling,
handwritten text
Provectus Evaluation Methodology
Evaluation / Composite Index
Name Score
Provider 1 0.64
Provider 2 0.81
Provider 3 0.78
Composite Index
Dimensions
Evaluation / Text Index
Text index
Evaluation / Robustness Index
Spacing index
Noise index
TCO and Case Study: Under NDA Client
General TCO structure:
● Infrastructure (data pipelines, storage, control panel)
● CV, NLP, Human-in-the-loop
● R&D costs (if building in house)
● Support
TCO targets for end-to-end solution:
~20-30 cents per document for simple use cases and 50+
cents for more complex documents
Result:
The cost of processing one document was reduced from 24
to 11 cents, since the right OCR/CV vendor was selected (it
saved almost 10 cents per document). Also, serverless
architecture was leveraged to reduce infrastructure costs.
OCR/CV solutions performance vs. cost:
For a given use case, the most expensive
solution delivered the worst result. A second
to best result was demonstrated by the vendor
with the second to cheapest solution.
Performance vs. price
Buy vs. Customize vs. Build
Cloud OCR + extraction APIs
vs. Custom model
In cases with high volume of documents, it’s
worth investing in an in-house built custom
model to reduce costs of extra services (ex.
form and table API) in the long run.
~8th month is a break-even point on average
for the IDP custom extraction model vs. APIs
Takeaways
1. Ecosystem matters: Data integration with built-in industry specific connectors, data
pipelines, OCR, NLP, security, storage, and a human-in-the-loop workflow — All these
elements should be integrated with each other for optimal performance.
1. Use unbiased benchmarking framework for evaluating real performance of different
providers, based on your use case and datasets.
1. Work with Provectus to reduce your Document Processing costs
a. By 2-8x comparing to manual workflows
b. By 30%+ comparing to legacy OCR solutions
c. By 10%+ comparing to modern cloud solutions.
Getting Started:
Unbiased Evaluation for IDP
by Provectus
Commitments & Deliverables
Helping businesses choose the right document processing solution for their healthcare
use cases. A fully funded engagement for qualified customers.
IDP Solution Discovery Program. Unbiased!
Schedule a 30 min. pre-assessment session here:
IDP Solution Discovery Program
You provide:
1. Business use cases overview
2. Access to datasets
3. Commitment to support
the engagement
We deliver:
1. Solutions evaluation report
based on your unique data
2. Solution architecture
3. TCO estimate
125 University Avenue
Suite 295, Palo Alto
California, 94301
provectus.com
Questions, details?
We would be happy to answer!
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.

Más contenido relacionado

La actualidad más candente

Unlocking the Power of UiPath: A Journey into Document Understanding
Unlocking the Power of UiPath: A Journey into Document UnderstandingUnlocking the Power of UiPath: A Journey into Document Understanding
Unlocking the Power of UiPath: A Journey into Document Understanding
DianaGray10
 
Tracxn - Smart Cars Startup Landscape
Tracxn - Smart Cars Startup LandscapeTracxn - Smart Cars Startup Landscape
Tracxn - Smart Cars Startup Landscape
Tracxn
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

La actualidad más candente (20)

Building a Data Analytics Center of Excellence - Digital Transformation
Building a Data Analytics Center of Excellence - Digital TransformationBuilding a Data Analytics Center of Excellence - Digital Transformation
Building a Data Analytics Center of Excellence - Digital Transformation
 
Procure-to-Pay (P2P) Outsourcing: Unlocking Value from End-to-End Process Out...
Procure-to-Pay (P2P) Outsourcing: Unlocking Value from End-to-End Process Out...Procure-to-Pay (P2P) Outsourcing: Unlocking Value from End-to-End Process Out...
Procure-to-Pay (P2P) Outsourcing: Unlocking Value from End-to-End Process Out...
 
Unlocking the Power of UiPath: A Journey into Document Understanding
Unlocking the Power of UiPath: A Journey into Document UnderstandingUnlocking the Power of UiPath: A Journey into Document Understanding
Unlocking the Power of UiPath: A Journey into Document Understanding
 
Embracing Intelligent Automation
Embracing Intelligent AutomationEmbracing Intelligent Automation
Embracing Intelligent Automation
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Cloud PoV
Cloud PoVCloud PoV
Cloud PoV
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
AI: From Data to ROI
AI: From Data to ROIAI: From Data to ROI
AI: From Data to ROI
 
Tracxn - Smart Cars Startup Landscape
Tracxn - Smart Cars Startup LandscapeTracxn - Smart Cars Startup Landscape
Tracxn - Smart Cars Startup Landscape
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Text summarization
Text summarizationText summarization
Text summarization
 
How to optimize the supply chain with ai
How to optimize the supply chain with ai How to optimize the supply chain with ai
How to optimize the supply chain with ai
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 
The Industrialist: Trends & Innovations - February 2023
The Industrialist: Trends & Innovations - February 2023The Industrialist: Trends & Innovations - February 2023
The Industrialist: Trends & Innovations - February 2023
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Similar a Intelligent Document Processing in Healthcare. Choosing the Right Solutions.

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
PoojaPatidar11
 
Human Factors In Groupware Applications
Human Factors In Groupware ApplicationsHuman Factors In Groupware Applications
Human Factors In Groupware Applications
ESS
 
PPT_Management of Large and Complex Software Projects
PPT_Management of Large and Complex Software ProjectsPPT_Management of Large and Complex Software Projects
PPT_Management of Large and Complex Software Projects
Sudipta Das
 

Similar a Intelligent Document Processing in Healthcare. Choosing the Right Solutions. (20)

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
Choosing The Right Data Annotation Option: Pros And Cons
Choosing The Right Data Annotation Option: Pros And ConsChoosing The Right Data Annotation Option: Pros And Cons
Choosing The Right Data Annotation Option: Pros And Cons
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Frequently Asked Questions About IDP
Frequently Asked Questions About IDPFrequently Asked Questions About IDP
Frequently Asked Questions About IDP
 
Abdul ETL Resume
Abdul ETL ResumeAbdul ETL Resume
Abdul ETL Resume
 
eBook-DataSciencePlatform
eBook-DataSciencePlatformeBook-DataSciencePlatform
eBook-DataSciencePlatform
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
A technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsA technical Introduction to Big Data Analytics
A technical Introduction to Big Data Analytics
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
 
ARC_MPS & AIM 3.7MB
ARC_MPS & AIM 3.7MBARC_MPS & AIM 3.7MB
ARC_MPS & AIM 3.7MB
 
Pmac It Project Management 2010
Pmac It Project Management 2010Pmac It Project Management 2010
Pmac It Project Management 2010
 
Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern Analytics
 
James hall ch 14
James hall ch 14James hall ch 14
James hall ch 14
 
Human Factors In Groupware Applications
Human Factors In Groupware ApplicationsHuman Factors In Groupware Applications
Human Factors In Groupware Applications
 
MIS 18 Enterprise Management System
MIS 18 Enterprise Management SystemMIS 18 Enterprise Management System
MIS 18 Enterprise Management System
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
PPT_Management of Large and Complex Software Projects
PPT_Management of Large and Complex Software ProjectsPPT_Management of Large and Complex Software Projects
PPT_Management of Large and Complex Software Projects
 

Más de Provectus

AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
Provectus
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 

Más de Provectus (20)

MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
 
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupAndrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Intelligent Document Processing in Healthcare. Choosing the Right Solutions.

  • 1. Choosing the Right Document Processing Solution for Healthcare Organizations Presented by: Iskandar Sitdikov, ML Solutions Architect @ Provectus Stepan Pushkarev, CTO @ Provectus
  • 2. Webinar Objectives 1. Provide an overview of the market for document processing solutions 2. Outline critical factors for choosing the right document processing solution for your healthcare use case 1. Strategize on whether you should look for a ready-made solution to purchase, or to build a custom solution of your own 1. Get qualified for the Provectus IDP Solution Discovery Program
  • 3. Agenda 1. Introduction 2. Healthcare use cases 3. Document processing in 60 seconds 4. Solutions map, advantages, and problems 5. Evaluation
  • 4. Introductions Iskandar Sitdikov ML Solutions Architect Provectus Stepan Pushkarev Chief Technology Officer Provectus
  • 5. AI-first Consultancy & Solutions Provider 500 employees and growing Established in 2010 HQ in Palo Alto Offices in North America, LATAM, and Europe Machine Learning DevOps Big Data Analytics We are obsessed about leveraging cloud, data, and AI to reimagine the way businesses operate, compete, and deliver customer value
  • 6. Our Clients Innovative Tech Vendors Seeking for niche expertise to differentiate and win the market Midsize to Large Enterprises Seeking to accelerate innovation, achieve operational excellence
  • 8. Use cases: Clinical notes, medical records, insurance medical claims, clinical studies, medical imaging reports, lab reports, and transfers. Administrative overhead to process data from these types of documents is huge. Main benefits: Operational speed and cost reduction. In our practice, we see 2-8x сost reduction compared to a fully manual process and 30%+ savings in comparison to legacy OCR solutions. Healthcare Use Cases
  • 9. Use cases: Clinical notes, medical records, insurance medical claims, clinical studies, medical imaging reports, lab reports, and transfers. Administrative overhead to process data from these types of documents is huge. Main benefits: Operational speed and cost reduction. In our practice, we see 2-8x сost reduction compared to a fully manual process and 30%+ savings in comparison to legacy OCR solutions. Healthcare Use Cases Clinician notes Claims Transfer summaries Medical imaging reports Lab reports Medical record Clinical studies
  • 10.
  • 11. General goal is to spot main entities in the document (paragraphs, forms, tables, etc.) and then successfully identify written text in them (segmentation and OCR). Both problems can be resolved separately or using end-to-end networks. IDP / CV
  • 12. Context search on data from OCR + segmentation Forms and tables greatly impact overall performance. Data extraction from forms is resolved (due to a straightforward key-value structure). Tables are still a pain point for all data extractors. For unstructured texts, deep networks are a solution at this point. Ex: BERT — good for finding key-value (question / answer) pairs in context. IDP / Data Extraction
  • 13. Evaluation of the document processing model is a task in progress. Results with a low-confidence score and missing information are forwarded to human experts. Samples of successfully extracted information are also forwarded to human experts for evaluation. IDP / Evaluation and Monitoring
  • 14. Data lake + Ontology specifications Fast Healthcare Interoperability Resources (FHIR) is a standard describing data formats and elements and an application programming interface for exchanging electronic health records. The standard was created by the Health Level Seven International healthcare standards organization. IDP / Storage
  • 15. Data lake + Ontology specifications Fast Healthcare Interoperability Resources (FHIR) is a standard describing data formats and elements and an application programming interface for exchanging electronic health records. The standard was created by the Health Level Seven International healthcare standards organization. IDP / Storage Storage Hospitals Providers Pharmaceutical companies Patients Labs Health plans
  • 16. Automation encapsulates all processes mentioned above and unites them into one single product, featuring: ● Document capture ● Model lifecycle ○ Labeling ○ (Re)Training ○ Evaluation ○ Monitoring ● Human-in-the-loop ● Integrations ● System monitoring IDP / Automation
  • 17. IDP is more than just OCR. To resolve the problem in-house, you need to take care of data capture, data ingestion, preprocessing, OCR, data extraction, evaluation, and further integrations to destination systems. Bottleneck: Tables and unstructured text IDP / Takeaways
  • 19. Documents are everywhere... and solutions for document processing are everywhere, too! Competitive Landscape
  • 20. Major technology platforms offer general- purpose technology components for document processing, such as: ● Amazon Textract + Comprehend ● Google Document AI ● Microsoft Azure Form Recognizer Solutions: Cloud Vendors Pros: ● Cloud infrastructure and integration ● Long lifespan and support ● Constant development Cons: ● General purpose a.k.a require additional work to extract necessary information and integrate with current workflows
  • 21. These are emerging use case-focused vendors that offer solutions using AI-native platforms to tackle the most demanding automation challenges. They can handle more complex documents with a greater variability. As a result, they often deliver a better business impact than obsolete technologies. Since they are free from legacy technical debt, it is easier for them to build next-gen, future-oriented solutions. Solutions: Startups Pros: ● Modern tech ● Constant development ● More focused applications ● Support — For a new independent player, support is one of the highest priorities to gain customer loyalty Cons: ● Only few startups in this market can survive competition with big vendors ● Challenging to customize ● May not align with your cloud strategy ● Support — On the other hand, new startups might struggle with support
  • 22. Legacy vendors typically build IDP solutions on top of legacy platforms. Niche vendors that are focused on limited types of documents and use cases. You might find hidden gems here! Vendors that restructure your documents workflow by introducing standard types of documents, which are really easy to process. Solutions: Other Vendors Pros: ● Wide variety of integrations ● Niche use cases ● Large portfolio of clients Cons: ● In some cases, they rely on outdated, less performant technologies ● Document flow restructure
  • 23. System Integrators may offer IDP as part of their portfolio of solutions. Their IDP offering may be a solution from another IDP vendor or developed in-house. Solutions: System Integrators
  • 25. What to Choose? Now, you have all the information about possible go-to solutions in your market segment. What’s next? You need to fairly compare each and every solution to choose one that fits and aligns with your use case the most. Deep evaluation is key to making the right decision.
  • 26. Data ● EDA (exploratory data analysis) — Knowing your data is the key to success ● Sample data based on EDA ● Use this data as the evaluation dataset for measuring performance of solutions on the market / in the segment Composite Index ● F1, Accuracy, Recall, etc. ● Robustness ● Key, value extraction ● Table data ● Language, character recognition, spelling, handwritten text Provectus Evaluation Methodology
  • 27. Evaluation / Composite Index Name Score Provider 1 0.64 Provider 2 0.81 Provider 3 0.78 Composite Index Dimensions
  • 28. Evaluation / Text Index Text index
  • 29. Evaluation / Robustness Index Spacing index Noise index
  • 30. TCO and Case Study: Under NDA Client General TCO structure: ● Infrastructure (data pipelines, storage, control panel) ● CV, NLP, Human-in-the-loop ● R&D costs (if building in house) ● Support TCO targets for end-to-end solution: ~20-30 cents per document for simple use cases and 50+ cents for more complex documents Result: The cost of processing one document was reduced from 24 to 11 cents, since the right OCR/CV vendor was selected (it saved almost 10 cents per document). Also, serverless architecture was leveraged to reduce infrastructure costs. OCR/CV solutions performance vs. cost: For a given use case, the most expensive solution delivered the worst result. A second to best result was demonstrated by the vendor with the second to cheapest solution. Performance vs. price
  • 31. Buy vs. Customize vs. Build Cloud OCR + extraction APIs vs. Custom model In cases with high volume of documents, it’s worth investing in an in-house built custom model to reduce costs of extra services (ex. form and table API) in the long run. ~8th month is a break-even point on average for the IDP custom extraction model vs. APIs
  • 32. Takeaways 1. Ecosystem matters: Data integration with built-in industry specific connectors, data pipelines, OCR, NLP, security, storage, and a human-in-the-loop workflow — All these elements should be integrated with each other for optimal performance. 1. Use unbiased benchmarking framework for evaluating real performance of different providers, based on your use case and datasets. 1. Work with Provectus to reduce your Document Processing costs a. By 2-8x comparing to manual workflows b. By 30%+ comparing to legacy OCR solutions c. By 10%+ comparing to modern cloud solutions.
  • 33. Getting Started: Unbiased Evaluation for IDP by Provectus
  • 34. Commitments & Deliverables Helping businesses choose the right document processing solution for their healthcare use cases. A fully funded engagement for qualified customers. IDP Solution Discovery Program. Unbiased! Schedule a 30 min. pre-assessment session here: IDP Solution Discovery Program You provide: 1. Business use cases overview 2. Access to datasets 3. Commitment to support the engagement We deliver: 1. Solutions evaluation report based on your unique data 2. Solution architecture 3. TCO estimate
  • 35. 125 University Avenue Suite 295, Palo Alto California, 94301 provectus.com Questions, details? We would be happy to answer!

Notas del editor

  1. Hello everyone! I’m excited to welcome you on our webinar dedicated to the topic of Document Processing Solutions for healthcare providers, health insurance companies, and pharmaceutical companies. This webinar is brought to you by Provectus – AI-first consultancy and solutions provider.
  2. To set your expectations: We will keep the level of details on 200 feet, to provide you a detailed strategy moving forward and to continue our conversation after this webinar. At the end of the webinar, we will tell you more about the unique opportunity to apply for a fully-funded IDP Solution Discovery Program by Provectus.
  3. Let's quickly go through the agenda. The webinar is grouped into 5 blocks. Each of them is a logical continuation of the previous one. We will start with introduction and setting up the context for this webinar. Then, we will proceed to the main section: First, we will take a look at Healthcare use cases that can be addressed with the IDP solution. Second, we will reiterate what IDP really is and what building blocks it encapsulates in. Third, with a clean understanding of the IDP solution components, we will review how not to get lost in the ocean of available solutions. We will segment the market and compare pros and cons of each of them. We will talk about positioning yourself within the boundaries of those markets and figuring out exact solutions you need. And finally, we will discuss how to choose the right solution for you.
  4. Quick introduction. I’m Iskandar, a ML SA at Provectus. For last 5 year I’ve been working at Provectus on variety of ML use cases. Stepan is a CTO at Provectus who brings deep expertise in cloud, distributed systems, and machine learning to this webinar. Please do not hesitate to ask questions in the Questions Tab as we move through the webinar content.
  5. A quick overview of Provectus. We are obsessed about generating value from data through Artificial Intelligence and deploying it to the real world. We work on hardest problems in the world like Simulating turbulence in Supernova explosion using deep learning models or predicting binding sites for protein-peptide interactions as well as on Applied AI solutions, for example computer vision based disease screening or intelligent document processing for Healthcare and Insurance industries. Quick facts — We are headquartered in Silicon Valley, have 500 employees world wide, and serve clients ranging from cutting-edge startups to large Fortune 500 enterprises.
  6. Some of our clients are listed here. We are comfortable working with big Enterprises and driving multi-million transformational projects with AI, as well as helping startups, partnering with them from the early stages through all of the funding rounds and successful acquisition or IPO. I’m proud to notice a wide range of clients from Healthcare and Life Sciences industry.
  7. Health data is frequently incomplete and inconsistent, and is often unstructured, with information contained in clinical notes, laboratory reports, insurance claims, medical images, and time series data across disparate document formats and systems. Every health care provider, payer, and life sciences company is trying to solve the problem of processing documents and structuring the data, because if they do, they can make better patient support decisions, design better clinical trials, and operate more efficiently. Document processing and healthcare are a long running couple. So, you probably know all of these use cases: .... All of them share the same framework for processing the documents. Main benefits: operational speed and cost reduction. In our practice we see 2-8x+ cost reduction comparing to fully manual process and 30%+ reduction comparing to legacy OCR solutions Stepan: An important note: we are not going to discuss details, KPIs and business ROI of each particular business use case for healthcare. Although we can help you discover and prioritize a use case, we expect that you know your business really well and you have already justified and identified a particular project for a 2021 roadmap. In that sense, this webinar is really biased to action and execution rather than outlining the strategy. Let's move to what document processing is.
  8. Health data is frequently incomplete and inconsistent, and is often unstructured, with information contained in clinical notes, laboratory reports, insurance claims, medical images, and time series data across disparate document formats and systems. Every health care provider, payer, and life sciences company is trying to solve the problem of processing documents and structuring the data, because if they do, they can make better patient support decisions, design better clinical trials, and operate more efficiently. Document processing and healthcare are a long running couple. So, you probably know all of these use cases: .... All of them share the same framework for processing the documents. Main benefits: operational speed and cost reduction. In our practice we see 2-8x+ cost reduction comparing to fully manual process and 30%+ reduction comparing to legacy OCR solutions Stepan: An important note: we are not going to discuss details, KPIs and business ROI of each particular business use case for healthcare. Although we can help you discover and prioritize a use case, we expect that you know your business really well and you have already justified and identified a particular project for a 2021 roadmap. In that sense, this webinar is really biased to action and execution rather than outlining the strategy. Let's move to what document processing is.
  9. Here you can see a short demo of how our document processing solution works, from document upload to review of actual results.
  10. First part of the system is Computer Vision (CV). It's usually called OCR, but it's an outdated term which stands for optical character recognition. CV is a broader term and it includes OCR as one of its components. So, general goal is to identify main entities within document (paragraphs, forms, tables, etc.) and successfully identify written text in them (segmentation and OCR).
  11. Next, we move to data extraction. Here we are solving a context search problem, where we are trying to extract knowledge from semi-structured data from the CV/OCR step and form structured answers out of them. The bottleneck here is tables and unstructured text… ---- Keep in mind that CV and extraction steps can be resolved using an end-to-end approach. For example, a picked model can resolve both tasks within one model (but actually inside it's still separated in those logical blocks).
  12. Next step is evaluation, which is somehow missed in a lot of the cases. Machines are really good at reducing processing time of repetitive operations, but humans are great in complex context tasks like data extraction. So, the solution here will be to build a hybrid system. Plus, we should always have verification mechanisms, especially in highly regulated areas like healthcare. By hybrid approach and verification, I mean that low confidence results and missing information are forwarded to human experts for review. Also, a sample of successfully fetched information is forwarded to human experts, because you have to always monitor performance of your system.
  13. Storage is another topic which is often missed in document processing solutions. Moreover, the requirements for storage for the healthcare industry are different in a good way. Yes, we have the same object storage, relational databases, etc., but alongside that we have to blend ontology specifications. a.k.a standard describing data formats and elements and application programming interface for exchanging records. What it essentially means is that data should be annotated with a specific format to be queryable and discoverable by other machines. Think of it as a knowledge graph. And the healthcare industry has its format which is called FHIR.
  14. Storage is another topic which is often missed in document processing solutions. Moreover, the requirements for storage for the healthcare industry are different in a good way. Yes, we have the same object storage, relational databases, etc., but alongside that we have to blend ontology specifications. a.k.a standard describing data formats and elements and application programming interface for exchanging records. What it essentially means is that data should be annotated with a specific format to be queryable and discoverable by other machines. Think of it as a knowledge graph. And the healthcare industry has its format which is called FHIR.
  15. Last but not least is automation. Automation is a glue for all of the pieces we’ve mentioned before, as it helps build a final product out of these pieces. Automation encapsulates: data capture, monitoring …
  16. As you can tell, IDP goes way beyond just OCR. To resolve a document processing problem, you should take care of lots of things like data capture, crowdsourcing, etc. One of the weak points in any document processing solution is algorithms for extracting information. And most of the struggle is in unstructured text and table information.
  17. Now, let’s move to next section of our webinar — Solutions map, where we will provide an overview of the market of possible solutions for IDP.
  18. Documents are everywhere, right? For healthcare, it’s clinical records, lab reports, medical charts. For insurance, it’s claims. For HR, its resumes or CVs. For finance, it’s invoices. And there are solutions everywhere! You may not know it, but there are more than 200 different vendors offering IDP solutions. As you can imagine, it’s super easy to get lost in the variety of all of these options. A good way to start is to refer to something like Gartner quadrants or Forrester waves. They are really good for giving you a glance on a market. But in 90% of cases, you have to double click on specific areas of your use case, as leaders of those reports are not necessary best for your specific use cases and datasets. So let’s dive in one step deeper and segment the document processing market.
  19. The first market segment is big cloud vendors. You all know them. Technological giants. They provide general-purpose technology components for document processing. AWS has Textract and Comprehend. Google has Document AI. Microsoft has Form Recognizer. Working with these types of solutions has their own pros and cons: …
  20. This is a younger group of up-and-coming vendors who have built solutions using AI-native platforms. Generally, startups can handle documents that are more complex or have greater variation. As far as advantages go: … On the other hand, startups have disadvantages: …
  21. Legacy vendors typically build IDP solutions on top of a legacy platform. Niche vendors focused on limited types of documents/use cases. You might find hidden gems here! Vendors that restructure your documents workflow by introducing standard types of documents which are really easy to process. Pros: .. Cons: …
  22. It is what it sounds like. Integrators help you glue different pieces of your document processing solution together. System Integrators may offer IDP as part of their portfolio of solutions. Their IDP offering may be a solution from another IDP vendor or developed in house.
  23. There are more than 200 different vendors offering similar capabilities... at least based on vendor’s marketing web sites :) So, let’s find out how you can evaluate their offerings and choose the right document processing solution for your healthcare organization.
  24. How would you move from a high-level, Gartner-style PowerPoint based comparison to a real, metrics-based evaluation that compares apples to apples? How would you make a decision between build vs. buy vs. hybrid approach when you buy components and then integrate/customize? As a reminder, we have cloud vendors, independent startups, niche and legacy vendors, as well as System Integrators.
  25. Introducing Provectus Evaluation Methodology — an UNBIASED benchmarking tool for choosing the right Intelligent Documents Processing Solution for your business. Some vendors demonstrate better quality when processing lab reports, others do really well with handwritten notes in clinical records. We need to consider all of these factors, and we at Provectus have developed a methodology to make it easier to choose the best option. Steps are pretty simple to understand. Define dataset -> define metrics -> evaluate -> rank -> pick or repeat Any metrics should be based on a specific dataset. It’s extremely important not only to know what metrics to use, but also the data they are computed on. Exploratory Data Analysis is the must. We always start with data and not from vendors or ML Models. Knowing type of documents, ratio between handwritten and printed texts, tables, forms and paragraphs, images, noise levels, and also assessing a long tail of custom edge cases is crucial. Based on EDA, we generate evaluation dataset, which should be a representative example of data, so we can make a fair assumption on target performance metrics. Once you have an evaluation dataset, we pick metrics to measure that will be used for our final composite index. Exact list of metrics is usually based on specific business use case you would like to optimize through AI.
  26. Let’s double click on composite index we use for solution evaluation. We compose the index out of numerous indexes. For example: text index characterize how solution extracts metadata from unstructured text: paragraphs and notes Handwritten index - how good we can extract handwritten text Form index - how good the solution is at recognizing key value pairs Price index may seem straightforward but it has multiple components: Variable price per document, maintenance cost as well customization and integration cost. Let’s double click on a couple of subindexes.
  27. Text index is responsible for quantifying how well solution fixes grammar errors, deals with noise characters, and finally maps entities to predefined ontology. And also some provides may deal better with one languages versus another, so it’s is important to evaluate and quantify.
  28. Another interesting index is a robustness index. What it’s essentially doing is measuring how good the solutions are at handling different edge case situations. Like large spaces between key-value pairs or noise in document which might be caused by poor scan or age of document etc. It usually impact the cost of handling of long tail distribution of documents. Obviously if IDP solution is not robust enough, it puts pressure on your total cost of business process that includes Human experts in the loop.
  29. Now, I want to show you a case study and TCO of one of our clients, where we were using this evaluation methodology.
  30. Build - does not include initial investments in R&D.
  31. Alright, hope now you have a solid strategy for moving forward. Here is the next practical step for you to get started.
  32. Provectus has designed a dedicated program to help you choose the best vendor and solution for your Documents Processing use case. Here is how the Program works: You provide a business use case and an access to your datasets as well as executive sponsorship to support the engagement. We explore your dataset and run it through Provectus benchmarking framework against different OCR/IDP vendors. As a result, we will determine the most cost-efficient solution with the highest quality output for your specific healthcare use case. The offering is available to a limited number of customers and is a fully funded engagement. Apply now, so we can help you learn more about the program and get you qualified. The application link is available in the downloadable handout slides to this webinar and will also be sent to you in the follow-up email after the webinar. https://provectus.com/intelligent-document-processing-discovery-program/
  33. Why should I opt for building a custom solution vs. buying a ready-made one? (buy vs. build) - You can’t avoid customization one way or another. What other criteria should I consider when choosing the document processing solution? - Specific metrics. Consider TCO. How do you handle sensitive information for model training? - Entire system is sealed within boundaries of you private cloud. We want to process medical charts for extracting patient data. How can we get started and what else should we consider? - Data analysis, evaluation. How long does it take to set up a solution? - if you have data -> POC 1-2 weeks; usually 2-4 weeks, but setting up business processes may prolong this time