SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Automating document analysis and text
extraction with Amazon Textract
A I M 2 0 2
Randall Hunt
Senior technical evangelist and software engineer
AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Documents are important
Primary tool of record keeping, communicating, collaborating, and transacting
Finance
Insurance
Real estate
Accounting
Tax management
Medical
Legal
Business management
Education
And many more
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
16.3 million US mortgage applications ($2.1 trillion) in
2016
*
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
*
About 240 million W-2 tax forms processed for FY 2018 in
the US
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Need for processing documents
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
How documents are processed today
Optical character recognition
(OCR)
Manual
processing
Rules and
template-based extraction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Challenges for processing
documents: Manual processing
Expensive Error-prone Time-consuming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Challenges for processing documents:
Manual processing
Output
1. Exempt is true
2. 28 is true
3. CPP/QPP is true
4. RPC/RRQ is true
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Error-prone Flat bag of wordsSimple documents only
Challenges for processing
documents: OCR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Output
Extract data quickly & No code or templates to accurately maintain
Challenges for processing
documents: OCR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Output
Start Date End Date Employer Name Position Held Reason for leaving
1/15/2009 6/30/2013 Any Company Head Baker Family relocated
Challenges for processing
documents: OCR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Challenges for processing
documents
Output
Full Name Date of Birth Gender
John X Doe 01 01 1971
Male
First Middle Last MM DD YYYY
Female
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Limited by
accuracy of OCR
Significant development and
management overhead
Templates
are brittle
Challenges for processing documents: Rules and template-
based extraction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Challenges for processing documents: Rules and template-
based extraction
The well-known W-2 US tax form has hundreds of variants each year
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
It looks easy, but …
…not a single corresponding pixel value in common
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Amazon Textract features
Text extraction Table extraction Form extraction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Amazon Textract: Text extraction
Blocks: PAGE, PARAGRAPH, LINE, WORD
is washed by waves, and cooled
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Name Description
Blocks List of blocks identified
from the document
ID Unique ID of the unit
Relationships CHILD
Block type PAGE, PARAGRAPH, LINE, WORD
Pages Contains number of
pages in the document
Amazon Textract text extraction API:
DetectDocumentText
Name Description
Document Blob or Amazon S3 object
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Blocks: PAGE, TABLE, CELL
For each block, you get
• Text
• Confidence score
• Block relationships (e.g., cells within a table)
Amazon Textract: Table extraction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Name Description
Document Blob or Amazon S3 object
FeatureTypes TABLES
Name Description
Blocks List of blocks identified
from the document
ID Unique ID of the unit
Relationships CHILD
Block type PAGE, TABLE, CELL
Pages Contains number of
pages in the document
Amazon Textract table extraction API:
Analyze document with tables as FeatureTypes parameter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Blocks: PAGE, KEY_VALUE_SET
For each block of your document
• Form field name (key) and field value (value) association
• Confidence score
• Page number
• Block relationships
Amazon Textract: Form extraction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Name Description
Document Blob or Amazon S3 object
FeatureTypes FORMS
Name Description
Blocks List of blocks identified
from the document
ID Unique ID of the unit
Relationships KEY, VALUE, CHILD
Block type PAGE, KEY_VALUE_SET
Pages Contains number of
pages in the document
Amazon Textract forms extraction API:
Analyze document with forms as FeatureTypes parameter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Supports single-page documents
such
as images (e.g.,
mobile capture)
For multi-page documents,
up to 3,000 pages
Amazon Textract: Sync and async
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Output
Extract data quickly &
accurately
No code or templates to
maintain
Amazon Textract: Text extraction
simplified
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Output {
Start Date: 1/15/2009
End Date: 6/30/2013
Employer Name: Any Company
Position Held: Head Baker
Reason for leaving: Family relocated
}
Amazon Textract: Table
extraction simplified
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Output Full Name:
First: John
Middle: X
Last: Doe
Date of Birth:
MM: 01
DD: 01
YYYY: 1971
Gender:
Male: True
Female: False
Amazon Textract: Form
extraction simplified
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Text extraction: OCR reimagined
Orientation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Text extraction: OCR reimagined
Structure variability
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Text extraction: OCR reimagined
Document variability
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Segmentation and rectification
Photometric
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Segmentation and rectification
Geometric
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Table and cell detection
Understand document structure and context to find tables
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Table and cell detection
Understand cells even without explicit boundaries
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Table and cell detection
Variable-sized rows and columns
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Field name (key) and value extraction
Full Name:
First: John
Middle: X
Last: Doe
Date of Birth:
MM: 01
DD: 01
YYYY: 1971
Gender:
Male: True
Female: False
Detect phrases or groups of words
Output
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Inferring key/value association
Detect structures of the same form without templates
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Inferring key/value association
Key/value association
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Beyond OCR: Inferring key/value association
Infer empty values
Full Name:
First: John
Middle: null
Last: Doe
Date of Birth:
MM: 01
DD: 01
YYYY: 1971
Gender:
Male: True
Female: False
Output
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Reference architecture: Index and search documents
Input
Uploaded document
images such as tax
forms, credit
applications, or
medical notes
Amazon S3
Uploaded
documents are
stored in data lake
AWS Lambda
A Lambda function is
triggered to initiate
document analysis
using the Amazon
Textract API
Amazon Textract
Automatically
extract text,
including key-value
pairs and tables
Amazon
Elasticsearch
Service
Extracted data and
confidence scores
are indexed to
enable document
search
Output
Perform contextual
search on millions of
documents or
integrate data into
your document
management system
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Reference architecture: Form capture
Input
Customer uses mobile
app to capture a photo of
a W-2 form
Amazon Textract
The Amazon Textract API
is integrated into the end-
user application to
automatically extract text
from the W-2 form and
auto-populate the
form fields
Customer application
Customers experience
real-time capture of tax
information by taking a
photo instead of
performing manual data
entry
Database
User-submitted data
is loaded into a
database for use in tax
preparation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Reference architecture: Extract for NLP
Quickly turn extracted text and data into actionable insights
Input
Uploaded document
images of medical
notes, explanation of
benefits, and
patient forms
Amazon S3
Uploaded
documents are
stored in S3
NLP
Use natural language
processing to extract
insights from
medical documents
Amazon
Elasticsearch
Service
Easily search
through extracted
data and text
insights
Output
Discover
medical
insights to
improve
patient care
Amazon
Textract
Automatically
extract words,
lines of text, and
tables
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Amazon Textract: Launch customers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S U M M I T
Amazon Textract: Benefits
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Randall Hunt

Más contenido relacionado

La actualidad más candente

Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsAmazon Web Services
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAmazon Web Services Japan
 
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...Amazon Web Services
 
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...Amazon Web Services
 
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...Amazon Web Services
 
Best practices for integrating Amazon Rekognition into your own application
Best practices for integrating Amazon Rekognition into your own applicationBest practices for integrating Amazon Rekognition into your own application
Best practices for integrating Amazon Rekognition into your own applicationAmazon Web Services
 
Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step FunctionsAmazon Web Services
 
20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon MacieAmazon Web Services Japan
 
Amazon Rekognition: Deep Learning-Based Image and Video Analysis
Amazon Rekognition: Deep Learning-Based Image and Video AnalysisAmazon Rekognition: Deep Learning-Based Image and Video Analysis
Amazon Rekognition: Deep Learning-Based Image and Video AnalysisAmazon Web Services
 
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive 20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive Amazon Web Services Japan
 
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic SessionAmazon Web Services Japan
 
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMailAWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMailAmazon Web Services Japan
 
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信Amazon Web Services Japan
 

La actualidad más candente (20)

Amazon Rekognition
Amazon RekognitionAmazon Rekognition
Amazon Rekognition
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundations
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon Kinesis
 
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...
Deep Dive on Amazon S3 Security and Management (E2471STG303-R1) - AWS re:Inve...
 
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...
Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection - SID341 ...
 
S3 Versioning.pptx
S3 Versioning.pptxS3 Versioning.pptx
S3 Versioning.pptx
 
ここから始めるAWSセキュリティ
ここから始めるAWSセキュリティここから始めるAWSセキュリティ
ここから始めるAWSセキュリティ
 
AWS Direct Connect
AWS Direct ConnectAWS Direct Connect
AWS Direct Connect
 
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
SID201_IAM for Enterprises How Vanguard strikes the Balance Between Agility, ...
 
Best practices for integrating Amazon Rekognition into your own application
Best practices for integrating Amazon Rekognition into your own applicationBest practices for integrating Amazon Rekognition into your own application
Best practices for integrating Amazon Rekognition into your own application
 
Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step Functions
 
20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie
 
Amazon Rekognition: Deep Learning-Based Image and Video Analysis
Amazon Rekognition: Deep Learning-Based Image and Video AnalysisAmazon Rekognition: Deep Learning-Based Image and Video Analysis
Amazon Rekognition: Deep Learning-Based Image and Video Analysis
 
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive 20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive
20191030 AWS Black Belt Online Seminar AWS IoT Analytics Deep Dive
 
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session
20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session
 
Graph and Amazon Neptune
Graph and Amazon NeptuneGraph and Amazon Neptune
Graph and Amazon Neptune
 
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMailAWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
AWS Black Belt Techシリーズ Amazon WorkDocs / Amazon WorkMail
 
Setting Up a Landing Zone
Setting Up a Landing ZoneSetting Up a Landing Zone
Setting Up a Landing Zone
 
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信
[AWSマイスターシリーズ] Amazon CloudFront / Amazon Elastic Transcoderによるコンテンツ配信
 

Similar a Automating document analysis and text extraction with Amazon Textract - AIM202 - Chicago AWS Summit

Add Intelligence to Applications - AIM203 - Anaheim AWS Summit
Add Intelligence to Applications - AIM203 - Anaheim AWS SummitAdd Intelligence to Applications - AIM203 - Anaheim AWS Summit
Add Intelligence to Applications - AIM203 - Anaheim AWS SummitAmazon Web Services
 
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdf
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdfAdd intelligence to applications - AIM205 - Santa Clara AWS Summit.pdf
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Add Intelligence to Applications with AWS AI Services
Add Intelligence to Applications with AWS AI ServicesAdd Intelligence to Applications with AWS AI Services
Add Intelligence to Applications with AWS AI ServicesNicholas Walsh
 
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAmazon Web Services
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitAmazon Web Services
 
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitData modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitAmazon Web Services
 
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...Amazon Web Services
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Amazon Web Services
 
AI/ML Week: Innovate Digital Content Management
AI/ML Week: Innovate Digital Content ManagementAI/ML Week: Innovate Digital Content Management
AI/ML Week: Innovate Digital Content ManagementAmazon Web Services
 
Accelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAccelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAmazon Web Services
 
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...Amazon Web Services
 
The Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLThe Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLAmazon Web Services
 
Improve contact center and CRM experiences via machine learning and analytics...
Improve contact center and CRM experiences via machine learning and analytics...Improve contact center and CRM experiences via machine learning and analytics...
Improve contact center and CRM experiences via machine learning and analytics...Amazon Web Services
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...Amazon Web Services
 
Drive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningDrive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningAWS Summits
 
Machine Learning Analytics for the rest of us
Machine Learning Analytics for the rest of usMachine Learning Analytics for the rest of us
Machine Learning Analytics for the rest of usAmazon Web Services
 
Ask me anything about building data lakes on AWS - ADB209 - New York AWS Summit
Ask me anything about building data lakes on AWS - ADB209 - New York AWS SummitAsk me anything about building data lakes on AWS - ADB209 - New York AWS Summit
Ask me anything about building data lakes on AWS - ADB209 - New York AWS SummitAmazon Web Services
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Amazon Web Services
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 

Similar a Automating document analysis and text extraction with Amazon Textract - AIM202 - Chicago AWS Summit (20)

Add Intelligence to Applications - AIM203 - Anaheim AWS Summit
Add Intelligence to Applications - AIM203 - Anaheim AWS SummitAdd Intelligence to Applications - AIM203 - Anaheim AWS Summit
Add Intelligence to Applications - AIM203 - Anaheim AWS Summit
 
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdf
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdfAdd intelligence to applications - AIM205 - Santa Clara AWS Summit.pdf
Add intelligence to applications - AIM205 - Santa Clara AWS Summit.pdf
 
Add Intelligence to Applications with AWS AI Services
Add Intelligence to Applications with AWS AI ServicesAdd Intelligence to Applications with AWS AI Services
Add Intelligence to Applications with AWS AI Services
 
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
 
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
 
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitData modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
 
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
 
AI/ML Week: Innovate Digital Content Management
AI/ML Week: Innovate Digital Content ManagementAI/ML Week: Innovate Digital Content Management
AI/ML Week: Innovate Digital Content Management
 
Accelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAccelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-Services
 
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...
Amplifying fullstack serverless apps with AppSync & the Amplify Framework - M...
 
Open Data on AWS
Open Data on AWSOpen Data on AWS
Open Data on AWS
 
The Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLThe Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/ML
 
Improve contact center and CRM experiences via machine learning and analytics...
Improve contact center and CRM experiences via machine learning and analytics...Improve contact center and CRM experiences via machine learning and analytics...
Improve contact center and CRM experiences via machine learning and analytics...
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM305 ...
 
Drive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningDrive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine Learning
 
Machine Learning Analytics for the rest of us
Machine Learning Analytics for the rest of usMachine Learning Analytics for the rest of us
Machine Learning Analytics for the rest of us
 
Ask me anything about building data lakes on AWS - ADB209 - New York AWS Summit
Ask me anything about building data lakes on AWS - ADB209 - New York AWS SummitAsk me anything about building data lakes on AWS - ADB209 - New York AWS Summit
Ask me anything about building data lakes on AWS - ADB209 - New York AWS Summit
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

Automating document analysis and text extraction with Amazon Textract - AIM202 - Chicago AWS Summit

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Automating document analysis and text extraction with Amazon Textract A I M 2 0 2 Randall Hunt Senior technical evangelist and software engineer AWS
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Documents are important Primary tool of record keeping, communicating, collaborating, and transacting Finance Insurance Real estate Accounting Tax management Medical Legal Business management Education And many more
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T 16.3 million US mortgage applications ($2.1 trillion) in 2016 *
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T * About 240 million W-2 tax forms processed for FY 2018 in the US
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Need for processing documents
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T How documents are processed today Optical character recognition (OCR) Manual processing Rules and template-based extraction
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Challenges for processing documents: Manual processing Expensive Error-prone Time-consuming
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Challenges for processing documents: Manual processing Output 1. Exempt is true 2. 28 is true 3. CPP/QPP is true 4. RPC/RRQ is true
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Error-prone Flat bag of wordsSimple documents only Challenges for processing documents: OCR
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Output Extract data quickly & No code or templates to accurately maintain Challenges for processing documents: OCR
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Output Start Date End Date Employer Name Position Held Reason for leaving 1/15/2009 6/30/2013 Any Company Head Baker Family relocated Challenges for processing documents: OCR
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Challenges for processing documents Output Full Name Date of Birth Gender John X Doe 01 01 1971 Male First Middle Last MM DD YYYY Female
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Limited by accuracy of OCR Significant development and management overhead Templates are brittle Challenges for processing documents: Rules and template- based extraction
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Challenges for processing documents: Rules and template- based extraction The well-known W-2 US tax form has hundreds of variants each year
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T It looks easy, but … …not a single corresponding pixel value in common
  • 16. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Amazon Textract features Text extraction Table extraction Form extraction
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Amazon Textract: Text extraction Blocks: PAGE, PARAGRAPH, LINE, WORD is washed by waves, and cooled
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Name Description Blocks List of blocks identified from the document ID Unique ID of the unit Relationships CHILD Block type PAGE, PARAGRAPH, LINE, WORD Pages Contains number of pages in the document Amazon Textract text extraction API: DetectDocumentText Name Description Document Blob or Amazon S3 object
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Blocks: PAGE, TABLE, CELL For each block, you get • Text • Confidence score • Block relationships (e.g., cells within a table) Amazon Textract: Table extraction
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Name Description Document Blob or Amazon S3 object FeatureTypes TABLES Name Description Blocks List of blocks identified from the document ID Unique ID of the unit Relationships CHILD Block type PAGE, TABLE, CELL Pages Contains number of pages in the document Amazon Textract table extraction API: Analyze document with tables as FeatureTypes parameter
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Blocks: PAGE, KEY_VALUE_SET For each block of your document • Form field name (key) and field value (value) association • Confidence score • Page number • Block relationships Amazon Textract: Form extraction
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Name Description Document Blob or Amazon S3 object FeatureTypes FORMS Name Description Blocks List of blocks identified from the document ID Unique ID of the unit Relationships KEY, VALUE, CHILD Block type PAGE, KEY_VALUE_SET Pages Contains number of pages in the document Amazon Textract forms extraction API: Analyze document with forms as FeatureTypes parameter
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Supports single-page documents such as images (e.g., mobile capture) For multi-page documents, up to 3,000 pages Amazon Textract: Sync and async
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Output Extract data quickly & accurately No code or templates to maintain Amazon Textract: Text extraction simplified
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Output { Start Date: 1/15/2009 End Date: 6/30/2013 Employer Name: Any Company Position Held: Head Baker Reason for leaving: Family relocated } Amazon Textract: Table extraction simplified
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Output Full Name: First: John Middle: X Last: Doe Date of Birth: MM: 01 DD: 01 YYYY: 1971 Gender: Male: True Female: False Amazon Textract: Form extraction simplified
  • 28. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Text extraction: OCR reimagined Orientation
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Text extraction: OCR reimagined Structure variability
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Text extraction: OCR reimagined Document variability
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Segmentation and rectification Photometric
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Segmentation and rectification Geometric
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Table and cell detection Understand document structure and context to find tables
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Table and cell detection Understand cells even without explicit boundaries
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Table and cell detection Variable-sized rows and columns
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Field name (key) and value extraction Full Name: First: John Middle: X Last: Doe Date of Birth: MM: 01 DD: 01 YYYY: 1971 Gender: Male: True Female: False Detect phrases or groups of words Output
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Inferring key/value association Detect structures of the same form without templates
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Inferring key/value association Key/value association
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Beyond OCR: Inferring key/value association Infer empty values Full Name: First: John Middle: null Last: Doe Date of Birth: MM: 01 DD: 01 YYYY: 1971 Gender: Male: True Female: False Output
  • 41. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reference architecture: Index and search documents Input Uploaded document images such as tax forms, credit applications, or medical notes Amazon S3 Uploaded documents are stored in data lake AWS Lambda A Lambda function is triggered to initiate document analysis using the Amazon Textract API Amazon Textract Automatically extract text, including key-value pairs and tables Amazon Elasticsearch Service Extracted data and confidence scores are indexed to enable document search Output Perform contextual search on millions of documents or integrate data into your document management system
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reference architecture: Form capture Input Customer uses mobile app to capture a photo of a W-2 form Amazon Textract The Amazon Textract API is integrated into the end- user application to automatically extract text from the W-2 form and auto-populate the form fields Customer application Customers experience real-time capture of tax information by taking a photo instead of performing manual data entry Database User-submitted data is loaded into a database for use in tax preparation
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reference architecture: Extract for NLP Quickly turn extracted text and data into actionable insights Input Uploaded document images of medical notes, explanation of benefits, and patient forms Amazon S3 Uploaded documents are stored in S3 NLP Use natural language processing to extract insights from medical documents Amazon Elasticsearch Service Easily search through extracted data and text insights Output Discover medical insights to improve patient care Amazon Textract Automatically extract words, lines of text, and tables
  • 45. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Amazon Textract: Launch customers
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Amazon Textract: Benefits
  • 48. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Randall Hunt