Healthcare organizations generate piles of documents and forms in different formats, making it difficult to achieve operational excellence and streamline business processes. Manual entry and OCR are no longer viable, and healthcare entities are looking for new solutions to handle documents.
In this presentation you can learn about:
- Healthcare document types and use cases
- IDP framework: building blocks for document processing solutions
- The document processing market landscape
- Methodology for solution evaluation: comparing apples to apples
Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
1. Choosing the Right Document
Processing Solution for
Healthcare Organizations
Presented by:
Iskandar Sitdikov, ML Solutions Architect @ Provectus
Stepan Pushkarev, CTO @ Provectus
2. Webinar Objectives
1. Provide an overview of the market for document processing solutions
2. Outline critical factors for choosing the right document processing solution
for your healthcare use case
1. Strategize on whether you should look for a ready-made solution to purchase,
or to build a custom solution of your own
1. Get qualified for the Provectus IDP Solution Discovery Program
5. AI-first Consultancy & Solutions Provider
500 employees and
growing
Established in 2010
HQ in Palo Alto
Offices in North America,
LATAM, and Europe
Machine Learning DevOps
Big Data Analytics
We are obsessed about leveraging cloud, data, and AI to reimagine the way
businesses operate, compete, and deliver customer value
6. Our Clients
Innovative Tech Vendors
Seeking for niche expertise to
differentiate and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation,
achieve operational excellence
8. Use cases:
Clinical notes, medical records,
insurance medical claims, clinical
studies, medical imaging reports, lab
reports, and transfers. Administrative
overhead to process data from these
types of documents is huge.
Main benefits:
Operational speed and cost reduction. In
our practice, we see 2-8x сost reduction
compared to a fully manual process and
30%+ savings in comparison to legacy
OCR solutions.
Healthcare Use Cases
9. Use cases:
Clinical notes, medical records, insurance medical
claims, clinical studies, medical imaging reports,
lab reports, and transfers. Administrative
overhead to process data from these types of
documents is huge.
Main benefits:
Operational speed and cost reduction. In our
practice, we see 2-8x сost reduction compared to
a fully manual process and 30%+ savings in
comparison to legacy OCR solutions.
Healthcare Use Cases
Clinician notes
Claims
Transfer summaries Medical imaging reports Lab reports
Medical record
Clinical studies
10.
11. General goal is to spot main entities in the
document (paragraphs, forms, tables, etc.)
and then successfully identify written text
in them (segmentation and OCR).
Both problems can be resolved separately or
using end-to-end networks.
IDP / CV
12. Context search on data from OCR + segmentation
Forms and tables greatly impact overall performance. Data extraction from forms is resolved (due to a
straightforward key-value structure). Tables are still a pain point for all data extractors. For unstructured texts,
deep networks are a solution at this point. Ex: BERT — good for finding key-value (question / answer) pairs
in context.
IDP / Data Extraction
13. Evaluation of the document
processing model is a task in
progress.
Results with a low-confidence
score and missing information
are forwarded to human experts.
Samples of successfully extracted
information are also forwarded to
human experts for evaluation.
IDP / Evaluation and Monitoring
14. Data lake + Ontology specifications
Fast Healthcare Interoperability Resources (FHIR)
is a standard describing data formats and
elements and an application programming
interface for exchanging electronic health records.
The standard was created by the Health Level
Seven International healthcare standards
organization.
IDP / Storage
15. Data lake + Ontology specifications
Fast Healthcare Interoperability Resources (FHIR)
is a standard describing data formats and
elements and an application programming
interface for exchanging electronic health records.
The standard was created by the Health Level
Seven International healthcare standards
organization.
IDP / Storage
Storage
Hospitals
Providers
Pharmaceutical
companies
Patients
Labs
Health plans
16. Automation encapsulates all processes mentioned above
and unites them into one single product, featuring:
● Document capture
● Model lifecycle
○ Labeling
○ (Re)Training
○ Evaluation
○ Monitoring
● Human-in-the-loop
● Integrations
● System monitoring
IDP / Automation
17. IDP is more than just OCR. To resolve the problem in-house, you need
to take care of data capture, data ingestion, preprocessing, OCR, data
extraction, evaluation, and further integrations to destination systems.
Bottleneck: Tables and unstructured text
IDP / Takeaways
19. Documents are everywhere... and solutions for document processing are everywhere, too!
Competitive Landscape
20. Major technology platforms offer general-
purpose technology components for
document processing, such as:
● Amazon Textract + Comprehend
● Google Document AI
● Microsoft Azure Form Recognizer
Solutions: Cloud Vendors
Pros:
● Cloud infrastructure and integration
● Long lifespan and support
● Constant development
Cons:
● General purpose a.k.a require
additional work to extract necessary
information and integrate with current
workflows
21. These are emerging use case-focused vendors
that offer solutions using AI-native platforms to
tackle the most demanding automation
challenges. They can handle more complex
documents with a greater variability. As a result,
they often deliver a better business impact than
obsolete technologies. Since they are free from
legacy technical debt, it is easier for them to
build next-gen, future-oriented solutions.
Solutions: Startups
Pros:
● Modern tech
● Constant development
● More focused applications
● Support — For a new independent player, support is
one of the highest priorities to gain customer loyalty
Cons:
● Only few startups in this market can survive
competition with big vendors
● Challenging to customize
● May not align with your cloud strategy
● Support — On the other hand, new startups might
struggle with support
22. Legacy vendors typically build IDP
solutions on top of legacy platforms.
Niche vendors that are focused on limited
types of documents and use cases. You
might find hidden gems here!
Vendors that restructure your documents
workflow by introducing standard types of
documents, which are really easy to
process.
Solutions: Other Vendors
Pros:
● Wide variety of integrations
● Niche use cases
● Large portfolio of clients
Cons:
● In some cases, they rely on outdated,
less performant technologies
● Document flow restructure
23. System Integrators may offer IDP
as part of their portfolio of
solutions. Their IDP offering may
be a solution from another IDP
vendor or developed in-house.
Solutions: System Integrators
25. What to Choose?
Now, you have all the information about
possible go-to solutions in your market
segment. What’s next?
You need to fairly compare each and every
solution to choose one that fits and aligns
with your use case the most.
Deep evaluation is key to making the right
decision.
26. Data
● EDA (exploratory data analysis) — Knowing your
data is the key to success
● Sample data based on EDA
● Use this data as the evaluation dataset for
measuring performance of solutions on the
market / in the segment
Composite Index
● F1, Accuracy, Recall, etc.
● Robustness
● Key, value extraction
● Table data
● Language, character recognition, spelling,
handwritten text
Provectus Evaluation Methodology
27. Evaluation / Composite Index
Name Score
Provider 1 0.64
Provider 2 0.81
Provider 3 0.78
Composite Index
Dimensions
30. TCO and Case Study: Under NDA Client
General TCO structure:
● Infrastructure (data pipelines, storage, control panel)
● CV, NLP, Human-in-the-loop
● R&D costs (if building in house)
● Support
TCO targets for end-to-end solution:
~20-30 cents per document for simple use cases and 50+
cents for more complex documents
Result:
The cost of processing one document was reduced from 24
to 11 cents, since the right OCR/CV vendor was selected (it
saved almost 10 cents per document). Also, serverless
architecture was leveraged to reduce infrastructure costs.
OCR/CV solutions performance vs. cost:
For a given use case, the most expensive
solution delivered the worst result. A second
to best result was demonstrated by the vendor
with the second to cheapest solution.
Performance vs. price
31. Buy vs. Customize vs. Build
Cloud OCR + extraction APIs
vs. Custom model
In cases with high volume of documents, it’s
worth investing in an in-house built custom
model to reduce costs of extra services (ex.
form and table API) in the long run.
~8th month is a break-even point on average
for the IDP custom extraction model vs. APIs
32. Takeaways
1. Ecosystem matters: Data integration with built-in industry specific connectors, data
pipelines, OCR, NLP, security, storage, and a human-in-the-loop workflow — All these
elements should be integrated with each other for optimal performance.
1. Use unbiased benchmarking framework for evaluating real performance of different
providers, based on your use case and datasets.
1. Work with Provectus to reduce your Document Processing costs
a. By 2-8x comparing to manual workflows
b. By 30%+ comparing to legacy OCR solutions
c. By 10%+ comparing to modern cloud solutions.
34. Commitments & Deliverables
Helping businesses choose the right document processing solution for their healthcare
use cases. A fully funded engagement for qualified customers.
IDP Solution Discovery Program. Unbiased!
Schedule a 30 min. pre-assessment session here:
IDP Solution Discovery Program
You provide:
1. Business use cases overview
2. Access to datasets
3. Commitment to support
the engagement
We deliver:
1. Solutions evaluation report
based on your unique data
2. Solution architecture
3. TCO estimate
35. 125 University Avenue
Suite 295, Palo Alto
California, 94301
provectus.com
Questions, details?
We would be happy to answer!
Notas del editor
Hello everyone! I’m excited to welcome you on our webinar dedicated to the topic of Document Processing Solutions for healthcare providers, health insurance companies, and pharmaceutical companies.
This webinar is brought to you by Provectus – AI-first consultancy and solutions provider.
To set your expectations: We will keep the level of details on 200 feet, to provide you a detailed strategy moving forward and to continue our conversation after this webinar.
At the end of the webinar, we will tell you more about the unique opportunity to apply for a fully-funded IDP Solution Discovery Program by Provectus.
Let's quickly go through the agenda.
The webinar is grouped into 5 blocks. Each of them is a logical continuation of the previous one.
We will start with introduction and setting up the context for this webinar.
Then, we will proceed to the main section:
First, we will take a look at Healthcare use cases that can be addressed with the IDP solution.
Second, we will reiterate what IDP really is and what building blocks it encapsulates in.
Third, with a clean understanding of the IDP solution components, we will review how not to get lost in the ocean of available solutions. We will segment the market and compare pros and cons of each of them. We will talk about positioning yourself within the boundaries of those markets and figuring out exact solutions you need.
And finally, we will discuss how to choose the right solution for you.
Quick introduction.
I’m Iskandar, a ML SA at Provectus. For last 5 year I’ve been working at Provectus on variety of ML use cases.
Stepan is a CTO at Provectus who brings deep expertise in cloud, distributed systems, and machine learning to this webinar.
Please do not hesitate to ask questions in the Questions Tab as we move through the webinar content.
A quick overview of Provectus.
We are obsessed about generating value from data through Artificial Intelligence and deploying it to the real world.
We work on hardest problems in the world like Simulating turbulence in Supernova explosion using deep learning models or predicting binding sites for protein-peptide interactions as well as on Applied AI solutions, for example computer vision based disease screening or intelligent document processing for Healthcare and Insurance industries.
Quick facts — We are headquartered in Silicon Valley, have 500 employees world wide, and serve clients ranging from cutting-edge startups to large Fortune 500 enterprises.
Some of our clients are listed here. We are comfortable working with big Enterprises and driving multi-million transformational projects with AI, as well as helping startups, partnering with them from the early stages through all of the funding rounds and successful acquisition or IPO.
I’m proud to notice a wide range of clients from Healthcare and Life Sciences industry.
Health data is frequently incomplete and inconsistent, and is often unstructured, with information contained in clinical notes, laboratory reports, insurance claims, medical images, and time series data across disparate document formats and systems. Every health care provider, payer, and life sciences company is trying to solve the problem of processing documents and structuring the data, because if they do, they can make better patient support decisions, design better clinical trials, and operate more efficiently.
Document processing and healthcare are a long running couple. So, you probably know all of these use cases: .... All of them share the same framework for processing the documents.
Main benefits: operational speed and cost reduction. In our practice we see 2-8x+ cost reduction comparing to fully manual process and 30%+ reduction comparing to legacy OCR solutions
Stepan: An important note: we are not going to discuss details, KPIs and business ROI of each particular business use case for healthcare. Although we can help you discover and prioritize a use case, we expect that you know your business really well and you have already justified and identified a particular project for a 2021 roadmap. In that sense, this webinar is really biased to action and execution rather than outlining the strategy.
Let's move to what document processing is.
Health data is frequently incomplete and inconsistent, and is often unstructured, with information contained in clinical notes, laboratory reports, insurance claims, medical images, and time series data across disparate document formats and systems. Every health care provider, payer, and life sciences company is trying to solve the problem of processing documents and structuring the data, because if they do, they can make better patient support decisions, design better clinical trials, and operate more efficiently.
Document processing and healthcare are a long running couple. So, you probably know all of these use cases: .... All of them share the same framework for processing the documents.
Main benefits: operational speed and cost reduction. In our practice we see 2-8x+ cost reduction comparing to fully manual process and 30%+ reduction comparing to legacy OCR solutions
Stepan: An important note: we are not going to discuss details, KPIs and business ROI of each particular business use case for healthcare. Although we can help you discover and prioritize a use case, we expect that you know your business really well and you have already justified and identified a particular project for a 2021 roadmap. In that sense, this webinar is really biased to action and execution rather than outlining the strategy.
Let's move to what document processing is.
Here you can see a short demo of how our document processing solution works, from document upload to review of actual results.
First part of the system is Computer Vision (CV).
It's usually called OCR, but it's an outdated term which stands for optical character recognition. CV is a broader term and it includes OCR as one of its components.
So, general goal is to identify main entities within document (paragraphs, forms, tables, etc.) and successfully identify written text in them (segmentation and OCR).
Next, we move to data extraction.
Here we are solving a context search problem, where we are trying to extract knowledge from semi-structured data from the CV/OCR step and form structured answers out of them.
The bottleneck here is tables and unstructured text…----Keep in mind that CV and extraction steps can be resolved using an end-to-end approach. For example, a picked model can resolve both tasks within one model (but actually inside it's still separated in those logical blocks).
Next step is evaluation, which is somehow missed in a lot of the cases.
Machines are really good at reducing processing time of repetitive operations, but humans are great in complex context tasks like data extraction.
So, the solution here will be to build a hybrid system.
Plus, we should always have verification mechanisms, especially in highly regulated areas like healthcare.
By hybrid approach and verification, I mean that low confidence results and missing information are forwarded to human experts for review.Also, a sample of successfully fetched information is forwarded to human experts, because you have to always monitor performance of your system.
Storage is another topic which is often missed in document processing solutions. Moreover, the requirements for storage for the healthcare industry are different in a good way. Yes, we have the same object storage, relational databases, etc., but alongside that we have to blend ontology specifications. a.k.a standard describing data formats and elements and application programming interface for exchanging records.
What it essentially means is that data should be annotated with a specific format to be queryable and discoverable by other machines.
Think of it as a knowledge graph.
And the healthcare industry has its format which is called FHIR.
Storage is another topic which is often missed in document processing solutions. Moreover, the requirements for storage for the healthcare industry are different in a good way. Yes, we have the same object storage, relational databases, etc., but alongside that we have to blend ontology specifications. a.k.a standard describing data formats and elements and application programming interface for exchanging records.
What it essentially means is that data should be annotated with a specific format to be queryable and discoverable by other machines.
Think of it as a knowledge graph.
And the healthcare industry has its format which is called FHIR.
Last but not least is automation. Automation is a glue for all of the pieces we’ve mentioned before, as it helps build a final product out of these pieces.
Automation encapsulates: data capture, monitoring …
As you can tell, IDP goes way beyond just OCR. To resolve a document processing problem, you should take care of lots of things like data capture, crowdsourcing, etc.
One of the weak points in any document processing solution is algorithms for extracting information. And most of the struggle is in unstructured text and table information.
Now, let’s move to next section of our webinar — Solutions map, where we will provide an overview of the market of possible solutions for IDP.
Documents are everywhere, right?
For healthcare, it’s clinical records, lab reports, medical charts. For insurance, it’s claims. For HR, its resumes or CVs. For finance, it’s invoices.
And there are solutions everywhere! You may not know it, but there are more than 200 different vendors offering IDP solutions.
As you can imagine, it’s super easy to get lost in the variety of all of these options.
A good way to start is to refer to something like Gartner quadrants or Forrester waves. They are really good for giving you a glance on a market.
But in 90% of cases, you have to double click on specific areas of your use case, as leaders of those reports are not necessary best for your specific use cases and datasets.
So let’s dive in one step deeper and segment the document processing market.
The first market segment is big cloud vendors. You all know them. Technological giants. They provide general-purpose technology components for document processing.
AWS has Textract and Comprehend. Google has Document AI. Microsoft has Form Recognizer.
Working with these types of solutions has their own pros and cons: …
This is a younger group of up-and-coming vendors who have built solutions using AI-native platforms.
Generally, startups can handle documents that are more complex or have greater variation.
As far as advantages go: …
On the other hand, startups have disadvantages: …
Legacy vendors typically build IDP solutions on top of a legacy platform.
Niche vendors focused on limited types of documents/use cases. You might find hidden gems here!
Vendors that restructure your documents workflow by introducing standard types of documents which are really easy to process.
Pros: ..
Cons: …
It is what it sounds like. Integrators help you glue different pieces of your document processing solution together.
System Integrators may offer IDP as part of their portfolio of solutions. Their IDP offering may be a solution from another IDP vendor or developed in house.
There are more than 200 different vendors offering similar capabilities... at least based on vendor’s marketing web sites :)So, let’s find out how you can evaluate their offerings and choose the right document processing solution for your healthcare organization.
How would you move from a high-level, Gartner-style PowerPoint based comparison to a real, metrics-based evaluation that compares apples to apples? How would you make a decision between build vs. buy vs. hybrid approach when you buy components and then integrate/customize?
As a reminder, we have cloud vendors, independent startups, niche and legacy vendors, as well as System Integrators.
Introducing Provectus Evaluation Methodology — an UNBIASED benchmarking tool for choosing the right Intelligent Documents Processing Solution for your business.
Some vendors demonstrate better quality when processing lab reports, others do really well with handwritten notes in clinical records. We need to consider all of these factors, and we at Provectus have developed a methodology to make it easier to choose the best option.
Steps are pretty simple to understand. Define dataset -> define metrics -> evaluate -> rank -> pick or repeat
Any metrics should be based on a specific dataset. It’s extremely important not only to know what metrics to use, but also the data they are computed on.
Exploratory Data Analysis is the must. We always start with data and not from vendors or ML Models.
Knowing type of documents, ratio between handwritten and printed texts, tables, forms and paragraphs, images, noise levels, and also assessing a long tail of custom edge cases is crucial.
Based on EDA, we generate evaluation dataset, which should be a representative example of data, so we can make a fair assumption on target performance metrics.
Once you have an evaluation dataset, we pick metrics to measure that will be used for our final composite index. Exact list of metrics is usually based on specific business use case you would like to optimize through AI.
Let’s double click on composite index we use for solution evaluation.
We compose the index out of numerous indexes. For example:
text index characterize how solution extracts metadata from unstructured text: paragraphs and notes
Handwritten index - how good we can extract handwritten text
Form index - how good the solution is at recognizing key value pairs
Price index may seem straightforward but it has multiple components: Variable price per document, maintenance cost as well customization and integration cost.
Let’s double click on a couple of subindexes.
Text index is responsible for quantifying how well solution fixes grammar errors, deals with noise characters, and finally maps entities to predefined ontology.
And also some provides may deal better with one languages versus another, so it’s is important to evaluate and quantify.
Another interesting index is a robustness index.
What it’s essentially doing is measuring how good the solutions are at handling different edge case situations.
Like large spaces between key-value pairs or noise in document which might be caused by poor scan or age of document etc.
It usually impact the cost of handling of long tail distribution of documents. Obviously if IDP solution is not robust enough, it puts pressure on your total cost of business process that includes Human experts in the loop.
Now, I want to show you a case study and TCO of one of our clients, where we were using this evaluation methodology.
Build - does not include initial investments in R&D.
Alright, hope now you have a solid strategy for moving forward.
Here is the next practical step for you to get started.
Provectus has designed a dedicated program to help you choose the best vendor and solution for your Documents Processing use case.
Here is how the Program works:
You provide a business use case and an access to your datasets as well as executive sponsorship to support the engagement.
We explore your dataset and run it through Provectus benchmarking framework against different OCR/IDP vendors.
As a result, we will determine the most cost-efficient solution with the highest quality output for your specific healthcare use case.
The offering is available to a limited number of customers and is a fully funded engagement. Apply now, so we can help you learn more about the program and get you qualified. The application link is available in the downloadable handout slides to this webinar and will also be sent to you in the follow-up email after the webinar.
https://provectus.com/intelligent-document-processing-discovery-program/
Why should I opt for building a custom solution vs. buying a ready-made one? (buy vs. build) - You can’t avoid customization one way or another.
What other criteria should I consider when choosing the document processing solution? - Specific metrics. Consider TCO.
How do you handle sensitive information for model training? - Entire system is sealed within boundaries of you private cloud.
We want to process medical charts for extracting patient data. How can we get started and what else should we consider? - Data analysis, evaluation.
How long does it take to set up a solution? - if you have data -> POC 1-2 weeks; usually 2-4 weeks, but setting up business processes may prolong this time