SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
From Ingestion to Insights:
How to Deliver Business Value at Scale
Qlik Data Integration & Analytics Summit for Financial Services
February 26, 2020
Misha Goussev
Principal Solutions Architect
Financial Services Partner Technology – AWS
goussev@amazon.com
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Financial Services Industry data trends
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Financial institutions collect both structured data, including CRM and transaction
data, and unstructured data, including chatlog transcriptions, social media, and
mobile interactions
2.2 million terabytes of new data is created everyday1
90% of data worldwide has been generated in the last five years
1. Axis Corporate, Understanding Big Data in Financial Services, https://axiscorporate.com/us/infographic/understanding-big-data-in-financial-services-infographic/
Financial institutions are collecting an unprecedented amount of data
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
The challenges of Big Data
Data silos: having pockets of data in different places, controlled by different groups,
inherently obscures data.
Analyzing diverse datasets: challenge of using different systems and approaches to data
management is that the data structures and information vary; different systems may also have
the same type of information, but it's labeled differently.
Managing data access: with data stored in so many locations, it's difficult to both access all
of it and to link to external tools for analysis.
Accelerating machine learning: requires a powerful data lake foundation, because ML and AI
thrive on large, diverse datasets.
Reference: How Amazon is solving big-data challenges with data lakes By Werner Vogels on 30 January 2020 09:00 AM, https://www.allthingsdistributed.com/2020/01/aws-datalake.html
AWS Technology Stack for
Data Storage and Analytics
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Purpose-built databases – Working Backwards
Lift and shift,
ERP, CRM,
finance
Real-time
bidding,
shopping cart,
customer
preferences
Content
management,
personalization,
mobile
Leaderboards,
real-time
analytics
Fraud detection,
social
networking,
recommendation
engine
IoT applications,
event tracking
Systems
of record, supply
chain, health care,
registrations,
financial
Industrial equipment
maintenance, fleet
management, and
route optimization
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
S3 data lakes provide a flexible foundation for analytics and innovation
Structured data
Data that are highly
normalized with common
schema and stored in
relational databases,
powering transactional line-
of-business applications
ERP CRM
LOB
applications
Semistructured data
Data that contain
identifiers without
conforming to a
predefined schema
Mobile Social
Sensors POS terminals
Unstructured data
Data that do not conform
to a data model and are
typically stored as
individual files
Phone
calls
Images
Videos Email
Batch load
Extracts data from
various data sources
at periodic intervals
and moves them to
the data lake
AWS Glue
Streaming
Ingests data that are
generated from
multiple sources such
as log files, telemetry,
mobile applications,
and social networks
Amazon
Kinesis
Amazon S3 data lake
Cloud-scale centralized
and scalable architecture
that enables enterprise
data science
Amazon S3
Amazon Redshift
And data stored in the data lake can also
be made directly searchable and queryable
Amazon Athena
Analytics
Data Warehouses are repositories
of normalized data and provide
the foundational technology for BI
Amazon
QuickSight
Amazon
EMR
Amazon
MSK
Machine Learning
Storing data in an Amazon S3 data lake
enables customers to leverage predictive
or prescriptive analytics; perform ad-hoc
analyses; and use AI/ML for automation
and efficiency
Amazon
SageMaker
AWS Deep
Learning AMIs
Amazon
EMR
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Data lakes are the future of data management
Used for all use cases including
machine learning, real-time
streaming analytics, data discovery,
and business intelligence
Data is stored as-is,
without having to first
structure the data
Centralized repository
that allows structured and
unstructured data to be
stored at any scale
Access to historical data within
seconds without the cost of
managing infrastructure
How financial institutions are using
AWS data lakes
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Industry-leading financial institutions are building data lakes on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Nasdaq moves mountains of data to an AWS data lake every day
Amazon
Redshift
Nasdaq needed to provide
greater accessibility to data for
internal groups and regulators.
Nasdaq built a data lake on Amazon S3
and chose Amazon Redshift to realize
cost efficiencies and fulfill security and
regulatory requirements.
Nasdaq moves an average of 30 billion
rows into Amazon Redshift everyday
(with 60 billion on a peak day), and
uses the service to power its data
analytics applications.
Amazon
S3
Nasdaq has been a user of Amazon Redshift since it was released and we are
extremely happy with it. Currently, our system is moving an average of 5.5 billion
rows into Amazon Redshift every day.
- Nate Sammons, Principal Architect, Nasdaq
“
”
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Dow Jones is using an AWS data lake to better serve its customers
Dow Jones was looking to develop
new products and craft targeted
customer communications.
The company built a data lake on
Amazon S3 that also relies heavily on
Amazon Redshift to enable cost-
effective, ad-hoc querying of large data
sets, including anonymized clickstream
data generated by multiple products.
By building a data lake on AWS,
Dow Jones enabled more than 100
data scientists to access multiple
data sets, build dashboards that
generate custom insights, and
experiment with machine learning
using Amazon SageMaker.
Amazon
S3
It wasn’t enough to just put the raw data into the platform. We
needed to make it useable to our users, so our version of the data
lake is cleansed, it’s performance-optimized, and it’s keyed so that
it can be used as a standalone item.
– Colleen Camuccio, VP, Program
Management, Dow Jones
“
”
Amazon
Redshift
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
FINRA uses an AWS data lake to oversee over 3,000 securities firms
FINRA needed a platform that could
ingest, process, and store 36 billion
market events on an average day and
dynamically scale up to handle 100
billion events on a peak day.
FINRA built a data lake on
AWS using Amazon S3 and
EMR to store and analyze data
from 3,700 broker dealers and
12 exchanges.
FINRA’s flexible platform can adapt
to changing market dynamics while
providing analysts with the tools
needed to query the data set.
Amazon
S3
Amazon
EMR
We got some huge pleasant surprises out of [going all in on AWS] that we weren’t
expecting at all. First of those is amazing performance improvements. On average,
400 times improvement to interactive queries. The investigative capacity to our
surveillance team has expanded dramatically.
– Steve Randich, CIO, FINRA
“
”
Demo:
‘The Art of the Possible’
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Financial Data Innovation Workshop: A glimpse into the art of the possible
Amazon S3AWS Data
Exchange
Comprehend
Amazon
Athena
Amazon
Redshift
Translate
Amazon
QuickSight
SageMaker
Traditional data
Data science
teams
Traders
Developers
Risk managers
Analysts
AWS Glue
Proprietary data
Alternative data
Forecast
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Demo: Amazon S3 and AWS Glue data catalog
AWS Glue data catalogAmazon S3 data files
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Demo: Interactive query using Athena & data visualization using QuickSight
Amazon QuickSightAmazon Athena query
Sample Reference Architecture
with Qlik products
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Server-based data lake ingest pipeline with Qlik and EC2 (or EMR)
Region
Raw
layer (S3)
ISV
Data
Databases
Files
Legacy
Files
Processed Layer
(S3)
EC2 via Qlik Data
Catalyst
(transform)
Consumption
layer (S3)
EC2 via Qlik
Data Catalyst
(enrich)
Hive Metastore
Redshift
Spectrum
ConsumerIngest Transform
1
2
3
5
1. Data from and ISV and
legacy systems is brought
in raw form into the raw
layer S3 buckets. Attunity
Replicate may be used to
ingest data from RDBMS
databases with continuous
CDC option.
2. Data from raw data
buckets is processed and
stored in processed layer
using transient EC2 (or
EMR cluster).
3. Data from processed layer
is transformed based on
business rules (derived,
flattened, enriched) and
copied into consumption
layer by transient EC2 (or
EMR cluster).
4. Hive tables are created for
data landing in different
layers as needed for
consumption
5. Data in consumption layer
is used by analytics teams
via EMR analytics cluster
and visualized via BI tools
such as Qlik.
Attunity
Replicate(CDC)
Attunity
Replicate(CDC)
Catalog
Glue
Catalog
orDatabases
Attunity
Compose
Getting started
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Inquire about the Financial Data Innovation Workshop
Financial Data Innovation
Workshop Description
Learn how to effectively ingest,
catalog, integrity check, perform
analytics, use ML and AI capabilities
against structured and semi-
structured financial data, sourced from
AWS Data Exchange and real-time
market data providers.
Contacts:
Balaji Gopalan Nichole Brown
Sr. Partner Solutions Architect Sr. Partner Development Manager
Financial Services Industry Analytics
Amazon Web Services Amazon Web Services
balajgop@amazon.com nicbrwn@amazon.com
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Leverage AWS resources to start building a data lake on AWS today
2018 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Ready to start building?
Work with your account
team to schedule a Big
Data Immersion Day
Work with an APN
Partner to implement
solutions on AWS.
Work with the AWS Professional
Services team to set up an AWS Data
Lake Workshop, AWS Data Lake
Assessment, or AWS Data Lake
Accelerator
Appendix
How to build a data lake on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Essential elements of a data lake and analytics solution
Analytics
Machine
Learning
Real-time Data
Movement
On-premises
Data Movement
Data lake
on AWS
Data Movement
Import your data from on-premises, and in real-time
Data Lake
Store any type of data securely, from gigabytes to exabytes
Analytics
Analyze your data with the broadest selection of
analytics services
Machine Learning
Predict future outcomes, and prescribe actions for
rapid response
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Step 1: Data movement
Data Movement
The first step to building data lakes on AWS is to move data
to the cloud. AWS makes data transfer simple by providing
the widest range of options to transfer data to the cloud.
On-premises
data movement
Real-time data
movement
AWS Direct
Connect
AWS
Snowball
AWS
Snowmobile
AWS Storage
Gateway
AWS IoT
Core
Amazon Kinesis
Video Streams
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Analytics
Machine
Learning
Real-time Data
Movement
On-premises
Data Movement
Data lake
on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Step 2: Data lake
Data lake
Once data is ready for the cloud, AWS makes it easy to store in
any format, securely and at massive scale with Amazon S3 and
Amazon Glacier. To make it easy for end users to discover the
relevant data to use in their analysis, AWS Glue automatically
creates a single searchable and queryable catalog.
Object Storage
Backup and Archive Data Catalog
Amazon S3
Amazon Glacier AWS Glue
Analytics
Machine
Learning
Real-time Data
Movement
On-premises
Data Movement
Amazon S3 Glacier
Deep Archive
Amazon S3
Object Lock
Amazon S3
Intelligent Tiering
Data lake
on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Step 3: Perform analytics
Analytics
AWS provides the broadest, and most cost-effective set of
analytic services that run for data lakes.
Interactive analytics Big data processing Data warehousing
Amazon
Athena
Amazon
EMR
Amazon
Redshift
Real-time analytics Operational analytics Dashboards and
visualizations
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon
QuickSight
Analytics
Machine
Learning
Real-time Data
Movement
On-premises
Data Movement
Data lake
on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
Step 4: Machine learning
Machine learning
For predictive analytics use cases, AWS provides a broad
set of machine learning services, and tools that run on
your AWS data lake.
Frameworks and interfaces Platform services
Application services
Amazon Deep
Learning AMIs
Amazon
SageMaker
AWS provides solution-oriented APIs
for computer vision and natural
language processing
Analytics
Machine
Learning
Data lake
on AWS
® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.
AWS Lake Formation will make setting up a data lake as simple as defining where your
data sits and what data access and security policies you want to apply.
• Collects and catalogs data from databases and object storage
• Moves the data into your new Amazon S3 data lake
• Cleans and classifies data using ML algorithms
• Secures access to your sensitive data
• Leverage data sets with Amazon analytics and ML services
AWS Lake Formation will allow FIs to build a secure data lake in days
AWS Lake Formation
Thank you!
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

Why Finance Should Consider Agile Modern Data Delivery Platform
Why Finance Should Consider Agile Modern Data Delivery PlatformWhy Finance Should Consider Agile Modern Data Delivery Platform
Why Finance Should Consider Agile Modern Data Delivery Platformsyed_javed
 
Why HR Should Consider Agile Modern Data Delivery Platform
Why HR Should Consider Agile Modern Data Delivery PlatformWhy HR Should Consider Agile Modern Data Delivery Platform
Why HR Should Consider Agile Modern Data Delivery Platformsyed_javed
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksUnlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationDenodo
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Skender Kollcaku
 
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made EasyMapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made EasyPrecisely
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
 
Why IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery PlatformWhy IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery Platformsyed_javed
 
Life is a Stream of Events
Life is a Stream of Events Life is a Stream of Events
Life is a Stream of Events confluent
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkDatabricks
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Dataconfluent
 
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays
 
BI Reporting Application Comparison
BI Reporting Application ComparisonBI Reporting Application Comparison
BI Reporting Application ComparisonScott Mitchell
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
 
4870 ibm-storage-solutions-final_nov26_18_34019934_usen
4870  ibm-storage-solutions-final_nov26_18_34019934_usen4870  ibm-storage-solutions-final_nov26_18_34019934_usen
4870 ibm-storage-solutions-final_nov26_18_34019934_usenduc_spt
 

La actualidad más candente (20)

Why Finance Should Consider Agile Modern Data Delivery Platform
Why Finance Should Consider Agile Modern Data Delivery PlatformWhy Finance Should Consider Agile Modern Data Delivery Platform
Why Finance Should Consider Agile Modern Data Delivery Platform
 
Why HR Should Consider Agile Modern Data Delivery Platform
Why HR Should Consider Agile Modern Data Delivery PlatformWhy HR Should Consider Agile Modern Data Delivery Platform
Why HR Should Consider Agile Modern Data Delivery Platform
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksUnlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
 
Cloud Modernization with Data Virtualization
Cloud Modernization with Data VirtualizationCloud Modernization with Data Virtualization
Cloud Modernization with Data Virtualization
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
 
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made EasyMapInfo Pro v2021 - Next Generation Location Analytics Made Easy
MapInfo Pro v2021 - Next Generation Location Analytics Made Easy
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Why IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery PlatformWhy IT Should Consider Agile Modern Data Delivery Platform
Why IT Should Consider Agile Modern Data Delivery Platform
 
Life is a Stream of Events
Life is a Stream of Events Life is a Stream of Events
Life is a Stream of Events
 
Hybrid IT: Legg Mason
Hybrid IT: Legg MasonHybrid IT: Legg Mason
Hybrid IT: Legg Mason
 
Qlik vs. Tableau: High-Level Comparison
Qlik vs. Tableau: High-Level ComparisonQlik vs. Tableau: High-Level Comparison
Qlik vs. Tableau: High-Level Comparison
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To Know
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
 
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
 
BI Reporting Application Comparison
BI Reporting Application ComparisonBI Reporting Application Comparison
BI Reporting Application Comparison
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
4870 ibm-storage-solutions-final_nov26_18_34019934_usen
4870  ibm-storage-solutions-final_nov26_18_34019934_usen4870  ibm-storage-solutions-final_nov26_18_34019934_usen
4870 ibm-storage-solutions-final_nov26_18_34019934_usen
 

Similar a From ingest to insights with AWS

Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSAmazon Web Services
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA
 

Similar a From ingest to insights with AWS (20)

Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML Architectures
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Data Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data StrategyData Con LA 2022 - Modern Data Strategy
Data Con LA 2022 - Modern Data Strategy
 

Último

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 

Último (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 

From ingest to insights with AWS

  • 1. From Ingestion to Insights: How to Deliver Business Value at Scale Qlik Data Integration & Analytics Summit for Financial Services February 26, 2020 Misha Goussev Principal Solutions Architect Financial Services Partner Technology – AWS goussev@amazon.com © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 3. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Financial institutions collect both structured data, including CRM and transaction data, and unstructured data, including chatlog transcriptions, social media, and mobile interactions 2.2 million terabytes of new data is created everyday1 90% of data worldwide has been generated in the last five years 1. Axis Corporate, Understanding Big Data in Financial Services, https://axiscorporate.com/us/infographic/understanding-big-data-in-financial-services-infographic/ Financial institutions are collecting an unprecedented amount of data
  • 4. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. The challenges of Big Data Data silos: having pockets of data in different places, controlled by different groups, inherently obscures data. Analyzing diverse datasets: challenge of using different systems and approaches to data management is that the data structures and information vary; different systems may also have the same type of information, but it's labeled differently. Managing data access: with data stored in so many locations, it's difficult to both access all of it and to link to external tools for analysis. Accelerating machine learning: requires a powerful data lake foundation, because ML and AI thrive on large, diverse datasets. Reference: How Amazon is solving big-data challenges with data lakes By Werner Vogels on 30 January 2020 09:00 AM, https://www.allthingsdistributed.com/2020/01/aws-datalake.html
  • 5. AWS Technology Stack for Data Storage and Analytics
  • 6. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Purpose-built databases – Working Backwards Lift and shift, ERP, CRM, finance Real-time bidding, shopping cart, customer preferences Content management, personalization, mobile Leaderboards, real-time analytics Fraud detection, social networking, recommendation engine IoT applications, event tracking Systems of record, supply chain, health care, registrations, financial Industrial equipment maintenance, fleet management, and route optimization
  • 7. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. S3 data lakes provide a flexible foundation for analytics and innovation Structured data Data that are highly normalized with common schema and stored in relational databases, powering transactional line- of-business applications ERP CRM LOB applications Semistructured data Data that contain identifiers without conforming to a predefined schema Mobile Social Sensors POS terminals Unstructured data Data that do not conform to a data model and are typically stored as individual files Phone calls Images Videos Email Batch load Extracts data from various data sources at periodic intervals and moves them to the data lake AWS Glue Streaming Ingests data that are generated from multiple sources such as log files, telemetry, mobile applications, and social networks Amazon Kinesis Amazon S3 data lake Cloud-scale centralized and scalable architecture that enables enterprise data science Amazon S3 Amazon Redshift And data stored in the data lake can also be made directly searchable and queryable Amazon Athena Analytics Data Warehouses are repositories of normalized data and provide the foundational technology for BI Amazon QuickSight Amazon EMR Amazon MSK Machine Learning Storing data in an Amazon S3 data lake enables customers to leverage predictive or prescriptive analytics; perform ad-hoc analyses; and use AI/ML for automation and efficiency Amazon SageMaker AWS Deep Learning AMIs Amazon EMR
  • 8. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Data lakes are the future of data management Used for all use cases including machine learning, real-time streaming analytics, data discovery, and business intelligence Data is stored as-is, without having to first structure the data Centralized repository that allows structured and unstructured data to be stored at any scale Access to historical data within seconds without the cost of managing infrastructure
  • 9. How financial institutions are using AWS data lakes
  • 10. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Industry-leading financial institutions are building data lakes on AWS
  • 11. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Nasdaq moves mountains of data to an AWS data lake every day Amazon Redshift Nasdaq needed to provide greater accessibility to data for internal groups and regulators. Nasdaq built a data lake on Amazon S3 and chose Amazon Redshift to realize cost efficiencies and fulfill security and regulatory requirements. Nasdaq moves an average of 30 billion rows into Amazon Redshift everyday (with 60 billion on a peak day), and uses the service to power its data analytics applications. Amazon S3 Nasdaq has been a user of Amazon Redshift since it was released and we are extremely happy with it. Currently, our system is moving an average of 5.5 billion rows into Amazon Redshift every day. - Nate Sammons, Principal Architect, Nasdaq “ ”
  • 12. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Dow Jones is using an AWS data lake to better serve its customers Dow Jones was looking to develop new products and craft targeted customer communications. The company built a data lake on Amazon S3 that also relies heavily on Amazon Redshift to enable cost- effective, ad-hoc querying of large data sets, including anonymized clickstream data generated by multiple products. By building a data lake on AWS, Dow Jones enabled more than 100 data scientists to access multiple data sets, build dashboards that generate custom insights, and experiment with machine learning using Amazon SageMaker. Amazon S3 It wasn’t enough to just put the raw data into the platform. We needed to make it useable to our users, so our version of the data lake is cleansed, it’s performance-optimized, and it’s keyed so that it can be used as a standalone item. – Colleen Camuccio, VP, Program Management, Dow Jones “ ” Amazon Redshift
  • 13. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. FINRA uses an AWS data lake to oversee over 3,000 securities firms FINRA needed a platform that could ingest, process, and store 36 billion market events on an average day and dynamically scale up to handle 100 billion events on a peak day. FINRA built a data lake on AWS using Amazon S3 and EMR to store and analyze data from 3,700 broker dealers and 12 exchanges. FINRA’s flexible platform can adapt to changing market dynamics while providing analysts with the tools needed to query the data set. Amazon S3 Amazon EMR We got some huge pleasant surprises out of [going all in on AWS] that we weren’t expecting at all. First of those is amazing performance improvements. On average, 400 times improvement to interactive queries. The investigative capacity to our surveillance team has expanded dramatically. – Steve Randich, CIO, FINRA “ ”
  • 14. Demo: ‘The Art of the Possible’
  • 15. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Financial Data Innovation Workshop: A glimpse into the art of the possible Amazon S3AWS Data Exchange Comprehend Amazon Athena Amazon Redshift Translate Amazon QuickSight SageMaker Traditional data Data science teams Traders Developers Risk managers Analysts AWS Glue Proprietary data Alternative data Forecast
  • 16. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Demo: Amazon S3 and AWS Glue data catalog AWS Glue data catalogAmazon S3 data files
  • 17. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Demo: Interactive query using Athena & data visualization using QuickSight Amazon QuickSightAmazon Athena query
  • 19. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Server-based data lake ingest pipeline with Qlik and EC2 (or EMR) Region Raw layer (S3) ISV Data Databases Files Legacy Files Processed Layer (S3) EC2 via Qlik Data Catalyst (transform) Consumption layer (S3) EC2 via Qlik Data Catalyst (enrich) Hive Metastore Redshift Spectrum ConsumerIngest Transform 1 2 3 5 1. Data from and ISV and legacy systems is brought in raw form into the raw layer S3 buckets. Attunity Replicate may be used to ingest data from RDBMS databases with continuous CDC option. 2. Data from raw data buckets is processed and stored in processed layer using transient EC2 (or EMR cluster). 3. Data from processed layer is transformed based on business rules (derived, flattened, enriched) and copied into consumption layer by transient EC2 (or EMR cluster). 4. Hive tables are created for data landing in different layers as needed for consumption 5. Data in consumption layer is used by analytics teams via EMR analytics cluster and visualized via BI tools such as Qlik. Attunity Replicate(CDC) Attunity Replicate(CDC) Catalog Glue Catalog orDatabases Attunity Compose
  • 21. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved.2018 Amazon Web Services Inc. or its Affiliates. All rights reserved. Inquire about the Financial Data Innovation Workshop Financial Data Innovation Workshop Description Learn how to effectively ingest, catalog, integrity check, perform analytics, use ML and AI capabilities against structured and semi- structured financial data, sourced from AWS Data Exchange and real-time market data providers. Contacts: Balaji Gopalan Nichole Brown Sr. Partner Solutions Architect Sr. Partner Development Manager Financial Services Industry Analytics Amazon Web Services Amazon Web Services balajgop@amazon.com nicbrwn@amazon.com
  • 22. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Leverage AWS resources to start building a data lake on AWS today 2018 Amazon Web Services Inc. or its Affiliates. All rights reserved. Ready to start building? Work with your account team to schedule a Big Data Immersion Day Work with an APN Partner to implement solutions on AWS. Work with the AWS Professional Services team to set up an AWS Data Lake Workshop, AWS Data Lake Assessment, or AWS Data Lake Accelerator
  • 24. How to build a data lake on AWS
  • 25. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Essential elements of a data lake and analytics solution Analytics Machine Learning Real-time Data Movement On-premises Data Movement Data lake on AWS Data Movement Import your data from on-premises, and in real-time Data Lake Store any type of data securely, from gigabytes to exabytes Analytics Analyze your data with the broadest selection of analytics services Machine Learning Predict future outcomes, and prescribe actions for rapid response
  • 26. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Step 1: Data movement Data Movement The first step to building data lakes on AWS is to move data to the cloud. AWS makes data transfer simple by providing the widest range of options to transfer data to the cloud. On-premises data movement Real-time data movement AWS Direct Connect AWS Snowball AWS Snowmobile AWS Storage Gateway AWS IoT Core Amazon Kinesis Video Streams Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Analytics Machine Learning Real-time Data Movement On-premises Data Movement Data lake on AWS
  • 27. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Step 2: Data lake Data lake Once data is ready for the cloud, AWS makes it easy to store in any format, securely and at massive scale with Amazon S3 and Amazon Glacier. To make it easy for end users to discover the relevant data to use in their analysis, AWS Glue automatically creates a single searchable and queryable catalog. Object Storage Backup and Archive Data Catalog Amazon S3 Amazon Glacier AWS Glue Analytics Machine Learning Real-time Data Movement On-premises Data Movement Amazon S3 Glacier Deep Archive Amazon S3 Object Lock Amazon S3 Intelligent Tiering Data lake on AWS
  • 28. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Step 3: Perform analytics Analytics AWS provides the broadest, and most cost-effective set of analytic services that run for data lakes. Interactive analytics Big data processing Data warehousing Amazon Athena Amazon EMR Amazon Redshift Real-time analytics Operational analytics Dashboards and visualizations Amazon Kinesis Amazon Elasticsearch Service Amazon QuickSight Analytics Machine Learning Real-time Data Movement On-premises Data Movement Data lake on AWS
  • 29. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. Step 4: Machine learning Machine learning For predictive analytics use cases, AWS provides a broad set of machine learning services, and tools that run on your AWS data lake. Frameworks and interfaces Platform services Application services Amazon Deep Learning AMIs Amazon SageMaker AWS provides solution-oriented APIs for computer vision and natural language processing Analytics Machine Learning Data lake on AWS
  • 30. ® 2019 Amazon Web Services Inc. or its Affiliates. All rights reserved. AWS Lake Formation will make setting up a data lake as simple as defining where your data sits and what data access and security policies you want to apply. • Collects and catalogs data from databases and object storage • Moves the data into your new Amazon S3 data lake • Cleans and classifies data using ML algorithms • Secures access to your sensitive data • Leverage data sets with Amazon analytics and ML services AWS Lake Formation will allow FIs to build a secure data lake in days AWS Lake Formation
  • 31. Thank you! © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.