SlideShare una empresa de Scribd logo
1 de 94
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Olivier Klein 奧樂凱
Emerging Technologies Solutions
Architect, Asia-Pacific
Modern Data Architectures for
Business Insights at Scale
Data analysis for a better customer experience
• Your business creates and stores
data and logs all the time
• Data points and logs allow you to
understand individual customer
experience and improve it
• Analysis of logs and trails help
gain insights
Ever Increasing Big Data
Volume
Velocity
Variety
95% of the 1.2 zettabytes
of data in the digital
universe is unstructured
70% of of this is user-
generated content
Unstructured data growth
explosive, with estimates
of compound annual
growth (CAGR) at 62%
from 2008 – 2012.
Source: IDC
GB TB
PB
ZB
EB
Big Data: Unconstrained data growth
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
Data volume - Gap
1990 2000 2010 2020
Big Data Evolution
Batch
Reports
Real-time
Alerts
Prediction
Forecasts
Plethora of Tools
Amazon
Glacier
S3 DynamoDB
RDS
EMR
Amazon
Redshift
Data Pipeline
Amazon
Kinesis
Kinesis-enabled
app
Lambda ML
SQS
ElastiCache
DynamoDB
Streams
Amazon Elasticsearch
Service
Big Data Challenges
Is there a reference architecture?
What tools should I use?
How?
Why?
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and
create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven
automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical
infrastructure
Driving Business Outcomes via Data Analytics
Deliver continuous differentiation
Personalization InteractiveModernize/consolidate
Personalization InteractiveModernize/consolidate
Deliver continuous differentiation
A full-service residential real estate brokerage
Redfin manages data on
hundreds of millions
of properties and
millions of customers
The Hot Homes algorithm
automatically calculates
the likelihood by analyzing
more than 500 attributes
of each home
Was fully AWS-native
since day one
https://aws.amazon.com/solutions/case-studies/redfin/
Hot Homes
There's an 80% chance this home will sell in the next 11 days – go tour it soon.
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon S3
Data lake
Amazon EMR
Amazon
Kinesis
Amazon RedShift
Answers &
Insights
Hot HomesUsers
Properties
Agents
User Profile
Recommendation
Hot Homes
Similar Homes
Agent Follow-up
Agent Scorecard
Marketing
A/B Testing
Real Time Data
…
Amazon
DynamoDB
BI / Reporting
Redfin Manages Data on Hundreds of Millions of Properties Using AWS
.
Once we solved the
infrastructure problem, we
could dream a little bigger. Now
we can deliver results without
worrying about how to scale.
Yong Huang, Director, Big Data and Analytics
”
“ • Zero on-premises infrastructure
• Using spot pricing for EC2, Redfin saved 90%
compared to running on-demand
• Using AWS, Redfin maintains a small technical team,
allowing much simplified server management and
allowing the transition to DevOps
• Redfin is able to launch products like Hot Homes to
greatly increase the buyer experience, by leveraging
the agility and scale of AWS
Personalization InteractiveModernize/consolidate
Deliver continuous differentiation
American upscale fashion retailer
Nordstrom has
323 stores operating
in 38 of the United States
and also in Canada; the
largest in number of
stores and
geographic footprint
of its retail competitors
Fashion retailer that sells
clothing, shoes,
cosmetics, and
accessories
Nordstrom is
going all in on AWS
https://aws.amazon.com/solutions/case-studies/nordstrom/
NORDSTROM
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Outcomes
& Insights
Personalized
recommendations within
seconds (from 15-20 min)
Scale the expertise of
stylists to all shoppers
Reduce costs by 2X order
of magnitude
…
Mobile Users
Desktop Users
Analytics
Tools
Online Stylist
Amazon
RedShift
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
AWS
Lambda
Amazon S3
Data Storage
NORDSTROM
Nordstrom gives personalized style recommendations in seconds
.
Alert me when the
internet is down ...
Keith Homewood
Cloud Product Owner, Nordstrom
”
“ • Nordstrom Recommendation is the online version of a
stylist. It can analyze and deliver personalized
recommendations in seconds
• Going All-In on AWS has resulted in reducing costs
by 2X
• Continuous delivery allows Nordstrom to deliver
multiple production launches a day in a single
application
• Can now create a personalized recommendation in
seconds, in what used to take 15-20 minutes of
processing
• Nordstrom Cloud Product Owner finds the reliability
and availability of AWS so suitable that as long as the
internet is working, Nordstrom Recommendation is
working
Nordstrom
Personalization InteractiveModernize/consolidate
Deliver continuous differentiation
Technology that helps brick-and-mortar retailers optimize performance
Trusted by over
500 global brands
in 45 countries worldwide
and counting
Euclid analyzes customer
movement data to
correlate traffic with
marketing campaigns and
to help retailers optimize
hours for peak traffic
Was fully AWS-native
since day one
https://aws.amazon.com/solutions/case-studies/euclid/
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
Euclid Analytics
Campaigns
WiFi - Foot traffic
Transactions
Walk-Bys
New & Return Visitors
Visit Duration
Engagement Rate
Bounce Rate
Storefront Potential &
Conversion
Customer segmentation
and loyalty assessment
Regional and categorical
roll-up reporting
Zoning for large-format
locations
Euclid EventIQAmazon S3
Data lake
Amazon RDS
for MySQL
Amazon EMR
Amazon
RedShift
Amazon EC2
Amazon
Elastic
Beanstalk
Elastic Load
Balancing
Euclid analytics processes POS analytics for 600 global brands in hours
.
We were totally amazed at the
speed - a simple count of rows
that would take 5½ hours
using MySQL only took 30
seconds with Amazon Redshift
Dexin Wang, Director of Platform Engineering, Euclid
”
“ • Process 10’s of TB in hours vs. 2 weeks
• 80-90% reduction in costs
• Euclid has a network of traffic counting sensors in
nearly 400 shopping centers, malls, and street
locations
• Euclid analyzes 10+ billion events monthly and 300
million shopping sessions yearly
• "We might have to re-compute up to 18 months of
customer data. That requires a lot of computational
power, which spikes traffic. We need resources that
can scale up on demand and scale down when we
don’t need it.”
Experiment and scale based on your business needs
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
SHORT LIST
BUSINESS CASES
Modernization Automation
Experiment and scale based on your business needs
MATCH
AVAILABLE DATA
Metrics and
Monitoring
Workflow
Logs
ERP
Transactions
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
Experiment and scale based on your business needs
AWS
Import/ Export
Amazon S3
Amazon
Kinesis
Amazon
EMR
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Answers &
Insights
Amazon
Redshift
Amazon
QuickSight
Amazon
SQS
CHOOSE
BEST FIT
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon S3
Data lake
Amazon EMR
Amazon
Kinesis
Amazon RedShift
Answers &
Insights
Hot HomesUsers
Properties
Agents
User Profile
Recommendation
Hot Homes
Similar Homes
Agent Follow-up
Agent Scorecard
Marketing
A/B Testing
Real Time Data
…
Amazon
DynamoDB
BI / Reporting
A platform to build business outcomes from data
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
1 4
0 9
5
COLLECT
Types of Data
Database records
Search documents
Log files
Messaging events
Devices / sensors / IoT stream
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Stream
storage
IoT
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
Applications
AWS Import/Export
Snowball
Logging
AWS
CloudTrail
DOCUMENTS
FILES
Search
File store
LoggingTransport
Messaging
Message MESSAGES Queue
Messaging
Store
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Apache Kafka
Amazon DynamoDB
Streams
Amazon SQS
Amazon SQS
• Managed message queue service
Apache Kafka
• High throughput distributed messaging
system
Amazon Kinesis Streams
• Managed stream storage + processing
Amazon Kinesis Firehose
• Managed data delivery
Amazon DynamoDB
• Managed NoSQL database
• Tables can be stream-enabled
Message & Stream Storage
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
IoT
Messaging
Message MESSAGES
Messaging
Queue
Stream
Why Stream Storage?
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Streaming MapReduce
• Parallel consumption
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Producer 1
shard 1 / partition 1
shard 2 / partition 2
Consumer 1
Count of
red = 4
Count of
violet = 4
Consumer 2
Count of
blue = 4
Count of
green = 4
Producer 2
Producer 3
Producer n
Key = violet
DynamoDB stream Amazon Kinesis stream Kafka topic
Amazon Kinesis Firehose
• Fully managed data streaming service to ingest and
capture data into your storage or data warehouse
• Ability to batch load, compress or encrypt streaming
data
• Elastic to scale to any throughput (no more sharding)
• Charged only per GB processed ($0.035 per GB)
What Stream Storage should I use?
Amazon
DynamoDB
Streams
Amazon
Kinesis
Streams
Amazon
Kinesis
Firehose
Apache
Kafka
Amazon
SQS
AWS managed
service
Yes Yes Yes No Yes
Guaranteed
ordering
Yes Yes Yes Yes No
Delivery exactly-once at-least-once exactly-once at-least-once at-least-once
Data retention
period
24 hours 7 days N/A Configurable 14 days
Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ
Scale /
throughput
No limit /
~ table IOPS
No limit /
~ shards
No limit /
automatic
No limit /
~ nodes
No limits /
automatic
Parallel clients Yes Yes No Yes No
Stream MapReduce Yes Yes N/A Yes N/A
Record/object size 400 KB 1 MB Amazon Redshift row size Configurable 256 KB
Cost Higher (table cost) Low Low Low (+admin) Low-medium
Hot Warm
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
Database
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Search
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon S3
Amazon SQS
Message
Amazon S3
File
LoggingIoTApplicationsTransportMessaging
File Storage
Amazon S3
• Highly available object storage
• Designed for 99.999999999% annual
data durability
• Replicated across 3 facilities
• Virtually unlimited scale
• Pay only for what you use, you don’t
need to pre-provision
• Allows event notifications to trigger
further action
• Native support by big data frameworks
Amazon S3
Cost Conscious Design
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483 777,600,000
Amazon S3 or
Amazon
DynamoDB?
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1300 2,048 1,483 777,600,000
Scenario 2300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use
No need to
move data
Query S3 directly
& right away
No infrastructure to
setup & manage
Fast results
within seconds
Pay for just the
queries you run
Amazon Athena
Interactive query service that makes it
easy to analyze data in Amazon S3
using standard SQL
What about HDFS & Amazon Glacier?
• Use HDFS for very frequently accessed (hot)
data
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for infrequently
accessed data
• Use Amazon Glacier for archiving cold data
Cache, database, search
COLLECT STORE
Mobile apps
Web apps
Data centers
AWS Direct
Connect
RECORDS
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
DOCUMENTS
FILES
Messaging
Message MESSAGES
Devices
Sensors &
IoT platforms
AWS IoT STREAMS
Apache Kafka
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Streams
Hot
Stream
Amazon SQS
Message
Amazon Elasticsearch
Service
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
SearchSQLNoSQLCacheFile
LoggingIoTApplicationsTransportMessaging
HotWarm
Database Anti-pattern
Database tier
Best Practice - Use the Right Tool for the Job
Data Tier
Search
Amazon
Elasticsearch
Service
Cache
Amazon
ElastiCache
• Redis
• Memcached
SQL
• Amazon Aurora
• MySQL
• PostgreSQL
• Oracle
• SQL Server
NoSQL
• Amazon
DynamoDB
• Cassandra
• HBase
• MongoDB
Database tier options
BREAK
Next up: Real-Time Analytics and Engagement
PROCESS /
ANALYZE
COLLECT STORE
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
SearchSQLNoSQLCacheFileMessage
Stream
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
Process /
analyze
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
PROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
FastSlowFast
BatchMessageInteractiveStreamML
Amazon EMR
Amazon EC2
Amazon EC2
Tools and Frameworks
Machine Learning
• Amazon ML, Amazon EMR (Spark ML), Amazon Rekognition
Interactive
• Amazon Redshift, Amazon EMR (Presto, Spark)
Batch
• Amazon EMR (MapReduce, Hive, Pig, Spark)
Messaging
• Amazon SQS application on Amazon EC2
Streaming
• Micro-batch: Spark Streaming, KCL
• Real-time: Amazon Kinesis Analytics, Storm,
AWS Lambda, KCL
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
PROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
FastSlowFast
BatchMessageInteractiveStreamML
Amazon EC2
Amazon EC2
Amazon EMR
• Amazon EMR is a fully managed
Hadoop cluster
• Transient and long running clusters
• Direct integration into Amazon S3
• Easy to scale and enable burstable
capacity
• Integration with AWS Spot Market
Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR
1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
Amazon Redshift
• Fully managed petabyte-scale data
warehouse
• Scalable amount of cluster nodes
• ODBC/JDBC connector for BI tools
using SQL
• Supports Amazon DynamoDB and
Amazon S3 to load data
• Less than a 10th of a cost of traditional
solutions
Amazon Redshift
Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance
New X1 Instance - Tons of Memory
• Designed for large-scale, in-memory
applications in the cloud
• Ideal for in-memory databases like SAP
HANA and big data processing apps like
Spark and Presto
• Powered by Intel® Xeon® E7 8880 v3
Haswell processors
• Features up to 2TB of memory and up to
128 vCPUs per instance
• 8X the memory offered by any other Amazon EC2
instance
3. Affordable Petabyte-scale Analytics
AWS helps customers maximize the value of Big Data
investments while reducing overall IT costs
Secure,
Highly Durable storage
$28.16 / TB / month
Data
Archiving
$7.16 / TB / month
Real-time
streaming data load
$0.035 / GB
10-node
Spark Cluster
$0.15 / hr
Petabyte-scale
Data Warehouse
$0.25 / hr
Amazon Glacier Amazon S3 Amazon RedshiftAmazon EMRAmazon Kinesis
Artificial Intelligence
Predictions via Machine Learning
ML gives computers the ability to learn without being explicitly
programmed
Machine learning algorithms:
• Supervised learning ← “teach” program
- Classification ← Is this transaction fraud? (yes / no)
- Regression ← Customer life-time value?
• Unsupervised learning ← Let it learn by itself
- Clustering ← Market segmentation
Amazon Machine Learning
• Easy to use, managed machine
learning service built for developers
• Machine learning technology based
on Amazon’s internal systems
• Create models using data stored in
Amazon S3, Amazon RDS or Amazon
Redshift
• Request predictions on batch or real-
time
Amazon Machine
Learning
Machine Learning Algorithms
• Classification
• Sentiment analysis – Do people like my new product?
• Linear Regression
• Trend prediction – How much revenue next month?
• Clustering
• Recommendation - Other people bought this!
• Association
• Market basket analysis – Bundled products
• Neural Networks
• Pattern recognition - Speech recognition
Amazon Machine
Learning
Amazon EMR +
Spark Mlib
GPU Optimized
EC2 Instance
Amazon Rekognition
Image Recognitions and Analysis
powered by Deep Learning which
allows to search, verify and organize
millions of images
Easy to use Batch Analysis Real-time
Analysis
Continually Improving Low Cost
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes
Serverless Rekognition Demo
Serverless website that uses Rekognition to identify
faces and classify pictures
Amazon S3
AWS Lambda
Amazon API
Gateway
Amazon
DynamoDB
Amazon
Rekognition
Mobile
CodeFor.Cloud/image
Unlimited
Replays
Returns an MP3
or audio stream
Lightning Fast
Response
Fully Managed and
Low Cost
Amazon Polly
Turn text into lifelike speech using deep
learning technologies to synthesize
speech that sounds like a human voice
Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
Amazon Lex
Conversational interfaces for your
applications, powered by the same
Natural Language Understanding
(NLU) & Automatic Speech Recognition
(ASR) models as Alexa
Integrated
development in
AWS console
Trigger AWS
Lambda
functions
Multi-step
conversations
Continually improving
ASR & NLU models
Enterprise
connectors
Fully Managed
Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileMessage
Stream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
LoggingIoTApplicationsTransportMessaging
ETL
What About ETL?
https://aws.amazon.com/big-data/partner-solutions/
ETLSTORE PROCESS / ANALYZE
AWS Glue
Easily understand your data sources,
prepare the data, and load it reliably to
data stores and your analytics pipeline
Integrated with:
S3, RDS, Redshift & any JDBC-
compliant data store
Build Your Data
Catalog
Generate And Edit
Transformations
Schedule And Run
Your Jobs
CONSUME
STORE CONSUMEPROCESS / ANALYZE
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Applications & API
Analysis and visualization
Notebooks
IDE
Business
users
Data scientist,
developers
COLLECT ETL
Amazon Quicksight
• Fast, cloud-powered, BI service that
makes it easy to build visualizations,
perform ad-hoc analysis, and get insights
from data.
• Connectors for files, third party platforms,
AWS services and other partner BI tools
• In-memory calculation engine (SPICE)
to accelerate analysis and visualization
• $9 per user per month
Athena & Quicksight Demo
Amazon
S3
Amazon
Athena
Amazon
Quicksight
Analyze past flight performance data stored in S3
Bureau of Transportation Flight Data Statistics
www.transtats.bts.gov
Create visualizations from S3 with Athena & Quicksight
Putting It All Together
Amazon SQS apps
Streaming
Amazon Kinesis
Analytics
Amazon KCL
apps
AWS Lambda
Amazon Redshift
COLLECT STORE CONSUMEPROCESS / ANALYZE
Amazon Machine
Learning
Presto
Amazon
EMR
Amazon Elasticsearch
Service
Apache Kafka
Amazon SQS
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Amazon DynamoDB
Amazon S3
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Streams
HotHotWarm
FastSlowFast
BatchMessageInteractiveStreamML
SearchSQLNoSQLCacheFileQueueStream
Amazon EC2
Amazon EC2
Mobile apps
Web apps
Devices
Messaging
Message
Sensors &
IoT platforms
AWS IoT
Data centers
AWS Direct
Connect
AWS Import/Export
Snowball
Logging
Amazon
CloudWatch
AWS
CloudTrail
RECORDS
DOCUMENTS
FILES
MESSAGES
STREAMS
Amazon QuickSight
Apps & Services
Analysis&visualizationNotebooksIDEAPI
Reference architecture
LoggingIoTApplicationsTransportMessaging
ETL
Let’s talk business outcomes of data analytics!
Suncorp is moving "all-in" on cloud.
Project Ignite will extract benefits of $170 million
- Group CEO Patrick Snowball
Insurance Policy Insurance Claim Core Banking Life Admin
Kinesis
for Real-
Time
10TB/day
Amazon
S3
AdRoll: AWS Lambda for log files
Valentino Volonghi
CTO, AdRoll
“Polling is not a scalable strategy to
figure out when new files are added to S3,
especially when you add 17M of them per
month. So we moved Lambda in front of
S3.”
• Cross-platform, cross-device
advertising platform
• Offers retargeting based on
clickstream data
300TB
new
data/mont
h
Rethink how to become a data-driven business
• Business outcomes - start with the insights and actions you
want to drive, then work backwards to a streamlined design
• Experimentation - start small, test many ideas, keep the
good ones and scale those up, paying only for what you
consume
• Agile and timely - deploy data processing infrastructure in
minutes, not months. take advantage of a rich platform of
services to respond quickly to changing business needs
Thank You!
Next up: Q&A

Más contenido relacionado

La actualidad más candente

Intro to Game Development & Operations on AWS
Intro to Game Development & Operations on AWSIntro to Game Development & Operations on AWS
Intro to Game Development & Operations on AWSAmazon Web Services
 
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...AWS Korea 금융산업팀
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
INTEGRATE 2022 - Data Mapping in the Microsoft Cloud
INTEGRATE 2022 - Data Mapping in the Microsoft CloudINTEGRATE 2022 - Data Mapping in the Microsoft Cloud
INTEGRATE 2022 - Data Mapping in the Microsoft CloudDaniel Toomey
 
E Discovery and Archiving in Microsoft Office 365 - Presented by Atidan
E Discovery and Archiving in Microsoft Office 365 - Presented by AtidanE Discovery and Archiving in Microsoft Office 365 - Presented by Atidan
E Discovery and Archiving in Microsoft Office 365 - Presented by AtidanDavid J Rosenthal
 
Dkos(mesos기반의 container orchestration)
Dkos(mesos기반의 container orchestration)Dkos(mesos기반의 container orchestration)
Dkos(mesos기반의 container orchestration)Won-Chon Jung
 
Various Cloud offerings AWS/AZURE/GCP
Various Cloud offerings AWS/AZURE/GCPVarious Cloud offerings AWS/AZURE/GCP
Various Cloud offerings AWS/AZURE/GCPMohammad Imran Ansari
 
Cloud computing and sustainability
Cloud computing and sustainabilityCloud computing and sustainability
Cloud computing and sustainabilityOffice365UK
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesEastBanc Tachnologies
 
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...HostedbyConfluent
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data ManagementNeo4j
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 

La actualidad más candente (20)

Intro to Game Development & Operations on AWS
Intro to Game Development & Operations on AWSIntro to Game Development & Operations on AWS
Intro to Game Development & Operations on AWS
 
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...
[금융사를 위한 AWS Generative AI Day 2023] 2_세상을 바꾸고 있는 Generative AI에...
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
INTEGRATE 2022 - Data Mapping in the Microsoft Cloud
INTEGRATE 2022 - Data Mapping in the Microsoft CloudINTEGRATE 2022 - Data Mapping in the Microsoft Cloud
INTEGRATE 2022 - Data Mapping in the Microsoft Cloud
 
Introduction to AWS IoT
Introduction to AWS IoTIntroduction to AWS IoT
Introduction to AWS IoT
 
E Discovery and Archiving in Microsoft Office 365 - Presented by Atidan
E Discovery and Archiving in Microsoft Office 365 - Presented by AtidanE Discovery and Archiving in Microsoft Office 365 - Presented by Atidan
E Discovery and Archiving in Microsoft Office 365 - Presented by Atidan
 
Databases on AWS Workshop.pdf
Databases on AWS Workshop.pdfDatabases on AWS Workshop.pdf
Databases on AWS Workshop.pdf
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Dkos(mesos기반의 container orchestration)
Dkos(mesos기반의 container orchestration)Dkos(mesos기반의 container orchestration)
Dkos(mesos기반의 container orchestration)
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Various Cloud offerings AWS/AZURE/GCP
Various Cloud offerings AWS/AZURE/GCPVarious Cloud offerings AWS/AZURE/GCP
Various Cloud offerings AWS/AZURE/GCP
 
Cloud computing and sustainability
Cloud computing and sustainabilityCloud computing and sustainability
Cloud computing and sustainability
 
AWS in Financial Services
AWS in Financial ServicesAWS in Financial Services
AWS in Financial Services
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics services
 
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
Keep Your Cache Always Fresh with Debezium! with Gunnar Morling | Kafka Summi...
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 

Destacado

Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Michael Bohlig
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSAmazon Web Services
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchAmazon Web Services
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 

Destacado (7)

Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 

Similar a Modern Data Architectures for Business Insights at Scale

在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享Amazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Ian Massingham
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Ian Massingham
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
AWSome Day Manchester 2105 - Intro/Close
AWSome Day Manchester 2105 - Intro/CloseAWSome Day Manchester 2105 - Intro/Close
AWSome Day Manchester 2105 - Intro/CloseIan Massingham
 
Euronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdfEuronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdfAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Ian Massingham
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsAmazon Web Services
 
AWS AWSome Day London October 2015
AWS AWSome Day London October 2015 AWS AWSome Day London October 2015
AWS AWSome Day London October 2015 Ian Massingham
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Amazon Web Services
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAmazon Web Services
 
AWSome Day Intro - Copenhagen 20160309
AWSome Day Intro - Copenhagen 20160309AWSome Day Intro - Copenhagen 20160309
AWSome Day Intro - Copenhagen 20160309Amazon Web Services
 

Similar a Modern Data Architectures for Business Insights at Scale (20)

在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
AWSome Day Manchester 2105 - Intro/Close
AWSome Day Manchester 2105 - Intro/CloseAWSome Day Manchester 2105 - Intro/Close
AWSome Day Manchester 2105 - Intro/Close
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Euronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdfEuronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdf
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 
AWS AWSome Day London October 2015
AWS AWSome Day London October 2015 AWS AWSome Day London October 2015
AWS AWSome Day London October 2015
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
 
AWSome Day Intro - Copenhagen 20160309
AWSome Day Intro - Copenhagen 20160309AWSome Day Intro - Copenhagen 20160309
AWSome Day Intro - Copenhagen 20160309
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Modern Data Architectures for Business Insights at Scale

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein 奧樂凱 Emerging Technologies Solutions Architect, Asia-Pacific Modern Data Architectures for Business Insights at Scale
  • 2. Data analysis for a better customer experience • Your business creates and stores data and logs all the time • Data points and logs allow you to understand individual customer experience and improve it • Analysis of logs and trails help gain insights
  • 3. Ever Increasing Big Data Volume Velocity Variety
  • 4. 95% of the 1.2 zettabytes of data in the digital universe is unstructured 70% of of this is user- generated content Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012. Source: IDC GB TB PB ZB EB Big Data: Unconstrained data growth
  • 5. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Available for analysis Generated data Data volume - Gap 1990 2000 2010 2020
  • 7. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Data Pipeline Amazon Kinesis Kinesis-enabled app Lambda ML SQS ElastiCache DynamoDB Streams Amazon Elasticsearch Service
  • 8. Big Data Challenges Is there a reference architecture? What tools should I use? How? Why?
  • 9. Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure Driving Business Outcomes via Data Analytics
  • 10. Deliver continuous differentiation Personalization InteractiveModernize/consolidate
  • 12. A full-service residential real estate brokerage Redfin manages data on hundreds of millions of properties and millions of customers The Hot Homes algorithm automatically calculates the likelihood by analyzing more than 500 attributes of each home Was fully AWS-native since day one https://aws.amazon.com/solutions/case-studies/redfin/
  • 13. Hot Homes There's an 80% chance this home will sell in the next 11 days – go tour it soon.
  • 14. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Amazon S3 Data lake Amazon EMR Amazon Kinesis Amazon RedShift Answers & Insights Hot HomesUsers Properties Agents User Profile Recommendation Hot Homes Similar Homes Agent Follow-up Agent Scorecard Marketing A/B Testing Real Time Data … Amazon DynamoDB BI / Reporting
  • 15. Redfin Manages Data on Hundreds of Millions of Properties Using AWS . Once we solved the infrastructure problem, we could dream a little bigger. Now we can deliver results without worrying about how to scale. Yong Huang, Director, Big Data and Analytics ” “ • Zero on-premises infrastructure • Using spot pricing for EC2, Redfin saved 90% compared to running on-demand • Using AWS, Redfin maintains a small technical team, allowing much simplified server management and allowing the transition to DevOps • Redfin is able to launch products like Hot Homes to greatly increase the buyer experience, by leveraging the agility and scale of AWS
  • 17. American upscale fashion retailer Nordstrom has 323 stores operating in 38 of the United States and also in Canada; the largest in number of stores and geographic footprint of its retail competitors Fashion retailer that sells clothing, shoes, cosmetics, and accessories Nordstrom is going all in on AWS https://aws.amazon.com/solutions/case-studies/nordstrom/ NORDSTROM
  • 18.
  • 19. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Outcomes & Insights Personalized recommendations within seconds (from 15-20 min) Scale the expertise of stylists to all shoppers Reduce costs by 2X order of magnitude … Mobile Users Desktop Users Analytics Tools Online Stylist Amazon RedShift Amazon Kinesis AWS Lambda Amazon DynamoDB AWS Lambda Amazon S3 Data Storage NORDSTROM
  • 20. Nordstrom gives personalized style recommendations in seconds . Alert me when the internet is down ... Keith Homewood Cloud Product Owner, Nordstrom ” “ • Nordstrom Recommendation is the online version of a stylist. It can analyze and deliver personalized recommendations in seconds • Going All-In on AWS has resulted in reducing costs by 2X • Continuous delivery allows Nordstrom to deliver multiple production launches a day in a single application • Can now create a personalized recommendation in seconds, in what used to take 15-20 minutes of processing • Nordstrom Cloud Product Owner finds the reliability and availability of AWS so suitable that as long as the internet is working, Nordstrom Recommendation is working Nordstrom
  • 22. Technology that helps brick-and-mortar retailers optimize performance Trusted by over 500 global brands in 45 countries worldwide and counting Euclid analyzes customer movement data to correlate traffic with marketing campaigns and to help retailers optimize hours for peak traffic Was fully AWS-native since day one https://aws.amazon.com/solutions/case-studies/euclid/
  • 23.
  • 24. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights Euclid Analytics Campaigns WiFi - Foot traffic Transactions Walk-Bys New & Return Visitors Visit Duration Engagement Rate Bounce Rate Storefront Potential & Conversion Customer segmentation and loyalty assessment Regional and categorical roll-up reporting Zoning for large-format locations Euclid EventIQAmazon S3 Data lake Amazon RDS for MySQL Amazon EMR Amazon RedShift Amazon EC2 Amazon Elastic Beanstalk Elastic Load Balancing
  • 25. Euclid analytics processes POS analytics for 600 global brands in hours . We were totally amazed at the speed - a simple count of rows that would take 5½ hours using MySQL only took 30 seconds with Amazon Redshift Dexin Wang, Director of Platform Engineering, Euclid ” “ • Process 10’s of TB in hours vs. 2 weeks • 80-90% reduction in costs • Euclid has a network of traffic counting sensors in nearly 400 shopping centers, malls, and street locations • Euclid analyzes 10+ billion events monthly and 300 million shopping sessions yearly • "We might have to re-compute up to 18 months of customer data. That requires a lot of computational power, which spikes traffic. We need resources that can scale up on demand and scale down when we don’t need it.”
  • 26. Experiment and scale based on your business needs Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights SHORT LIST BUSINESS CASES Modernization Automation
  • 27. Experiment and scale based on your business needs MATCH AVAILABLE DATA Metrics and Monitoring Workflow Logs ERP Transactions Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights
  • 28. Experiment and scale based on your business needs AWS Import/ Export Amazon S3 Amazon Kinesis Amazon EMR Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights Amazon Redshift Amazon QuickSight Amazon SQS CHOOSE BEST FIT
  • 29. Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Amazon S3 Data lake Amazon EMR Amazon Kinesis Amazon RedShift Answers & Insights Hot HomesUsers Properties Agents User Profile Recommendation Hot Homes Similar Homes Agent Follow-up Agent Scorecard Marketing A/B Testing Real Time Data … Amazon DynamoDB BI / Reporting
  • 30. A platform to build business outcomes from data Ingest/ Collect Consume/ visualize Store Process/ analyze 1 4 0 9 5
  • 32. Types of Data Database records Search documents Log files Messaging events Devices / sensors / IoT stream Devices Sensors & IoT platforms AWS IoT STREAMS Stream storage IoT COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database Applications AWS Import/Export Snowball Logging AWS CloudTrail DOCUMENTS FILES Search File store LoggingTransport Messaging Message MESSAGES Queue Messaging
  • 33. Store
  • 34. Amazon Kinesis Firehose Amazon Kinesis Streams Apache Kafka Amazon DynamoDB Streams Amazon SQS Amazon SQS • Managed message queue service Apache Kafka • High throughput distributed messaging system Amazon Kinesis Streams • Managed stream storage + processing Amazon Kinesis Firehose • Managed data delivery Amazon DynamoDB • Managed NoSQL database • Tables can be stream-enabled Message & Stream Storage Devices Sensors & IoT platforms AWS IoT STREAMS IoT Messaging Message MESSAGES Messaging Queue Stream
  • 35. Why Stream Storage? • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Streaming MapReduce • Parallel consumption 4 4 3 3 2 2 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 Producer 1 shard 1 / partition 1 shard 2 / partition 2 Consumer 1 Count of red = 4 Count of violet = 4 Consumer 2 Count of blue = 4 Count of green = 4 Producer 2 Producer 3 Producer n Key = violet DynamoDB stream Amazon Kinesis stream Kafka topic
  • 36. Amazon Kinesis Firehose • Fully managed data streaming service to ingest and capture data into your storage or data warehouse • Ability to batch load, compress or encrypt streaming data • Elastic to scale to any throughput (no more sharding) • Charged only per GB processed ($0.035 per GB)
  • 37. What Stream Storage should I use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS AWS managed service Yes Yes Yes No Yes Guaranteed ordering Yes Yes Yes Yes No Delivery exactly-once at-least-once exactly-once at-least-once at-least-once Data retention period 24 hours 7 days N/A Configurable 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic Parallel clients Yes Yes No Yes No Stream MapReduce Yes Yes N/A Yes N/A Record/object size 400 KB 1 MB Amazon Redshift row size Configurable 256 KB Cost Higher (table cost) Low Low Low (+admin) Low-medium Hot Warm
  • 38. COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS Database AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Search Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon S3 Amazon SQS Message Amazon S3 File LoggingIoTApplicationsTransportMessaging File Storage
  • 39. Amazon S3 • Highly available object storage • Designed for 99.999999999% annual data durability • Replicated across 3 facilities • Virtually unlimited scale • Pay only for what you use, you don’t need to pre-provision • Allows event notifications to trigger further action • Native support by big data frameworks Amazon S3
  • 40. Cost Conscious Design Example: Should I use Amazon S3 or Amazon DynamoDB? “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2048 1483 777,600,000
  • 41. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2,048 1,483 777,600,000 Amazon S3 or Amazon DynamoDB?
  • 42. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month Scenario 1300 2,048 1,483 777,600,000 Scenario 2300 32,768 23,730 777,600,000 Amazon S3 Amazon DynamoDB use use
  • 43. No need to move data Query S3 directly & right away No infrastructure to setup & manage Fast results within seconds Pay for just the queries you run Amazon Athena Interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
  • 44. What about HDFS & Amazon Glacier? • Use HDFS for very frequently accessed (hot) data • Use Amazon S3 Standard for frequently accessed data • Use Amazon S3 Standard – IA for infrequently accessed data • Use Amazon Glacier for archiving cold data
  • 45. Cache, database, search COLLECT STORE Mobile apps Web apps Data centers AWS Direct Connect RECORDS AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail DOCUMENTS FILES Messaging Message MESSAGES Devices Sensors & IoT platforms AWS IoT STREAMS Apache Kafka Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Streams Hot Stream Amazon SQS Message Amazon Elasticsearch Service Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS SearchSQLNoSQLCacheFile LoggingIoTApplicationsTransportMessaging HotWarm
  • 47. Best Practice - Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch Service Cache Amazon ElastiCache • Redis • Memcached SQL • Amazon Aurora • MySQL • PostgreSQL • Oracle • SQL Server NoSQL • Amazon DynamoDB • Cassandra • HBase • MongoDB Database tier options
  • 48. BREAK Next up: Real-Time Analytics and Engagement
  • 50. COLLECT STORE Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm SearchSQLNoSQLCacheFileMessage Stream Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging Process / analyze Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift PROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR FastSlowFast BatchMessageInteractiveStreamML Amazon EMR Amazon EC2 Amazon EC2
  • 51. Tools and Frameworks Machine Learning • Amazon ML, Amazon EMR (Spark ML), Amazon Rekognition Interactive • Amazon Redshift, Amazon EMR (Presto, Spark) Batch • Amazon EMR (MapReduce, Hive, Pig, Spark) Messaging • Amazon SQS application on Amazon EC2 Streaming • Micro-batch: Spark Streaming, KCL • Real-time: Amazon Kinesis Analytics, Storm, AWS Lambda, KCL Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift PROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR FastSlowFast BatchMessageInteractiveStreamML Amazon EC2 Amazon EC2
  • 52. Amazon EMR • Amazon EMR is a fully managed Hadoop cluster • Transient and long running clusters • Direct integration into Amazon S3 • Easy to scale and enable burstable capacity • Integration with AWS Spot Market
  • 53. Amazon EMR • Amazon EMR supports all common Hadoop Frameworks such as: • Spark, Pig, Hive, Hue, Oozie … • Hbase, Presto, Impala … • Decouples storage from compute • Allows independent scaling • Direct Integration with DynamoDB and S3 Amazon S3Amazon DynamoDB Amazon EMR
  • 54. 1 instance x 100 hours = 100 instances x 1 hour (and with Spot Pricing not only faster but also cheaper)
  • 55. Amazon Redshift • Fully managed petabyte-scale data warehouse • Scalable amount of cluster nodes • ODBC/JDBC connector for BI tools using SQL • Supports Amazon DynamoDB and Amazon S3 to load data • Less than a 10th of a cost of traditional solutions Amazon Redshift
  • 56. Intel® Processor Technologies Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance
  • 57. New X1 Instance - Tons of Memory • Designed for large-scale, in-memory applications in the cloud • Ideal for in-memory databases like SAP HANA and big data processing apps like Spark and Presto • Powered by Intel® Xeon® E7 8880 v3 Haswell processors • Features up to 2TB of memory and up to 128 vCPUs per instance • 8X the memory offered by any other Amazon EC2 instance
  • 58. 3. Affordable Petabyte-scale Analytics AWS helps customers maximize the value of Big Data investments while reducing overall IT costs Secure, Highly Durable storage $28.16 / TB / month Data Archiving $7.16 / TB / month Real-time streaming data load $0.035 / GB 10-node Spark Cluster $0.15 / hr Petabyte-scale Data Warehouse $0.25 / hr Amazon Glacier Amazon S3 Amazon RedshiftAmazon EMRAmazon Kinesis
  • 60. Predictions via Machine Learning ML gives computers the ability to learn without being explicitly programmed Machine learning algorithms: • Supervised learning ← “teach” program - Classification ← Is this transaction fraud? (yes / no) - Regression ← Customer life-time value? • Unsupervised learning ← Let it learn by itself - Clustering ← Market segmentation
  • 61. Amazon Machine Learning • Easy to use, managed machine learning service built for developers • Machine learning technology based on Amazon’s internal systems • Create models using data stored in Amazon S3, Amazon RDS or Amazon Redshift • Request predictions on batch or real- time Amazon Machine Learning
  • 62. Machine Learning Algorithms • Classification • Sentiment analysis – Do people like my new product? • Linear Regression • Trend prediction – How much revenue next month? • Clustering • Recommendation - Other people bought this! • Association • Market basket analysis – Bundled products • Neural Networks • Pattern recognition - Speech recognition Amazon Machine Learning Amazon EMR + Spark Mlib GPU Optimized EC2 Instance
  • 63. Amazon Rekognition Image Recognitions and Analysis powered by Deep Learning which allows to search, verify and organize millions of images Easy to use Batch Analysis Real-time Analysis Continually Improving Low Cost
  • 65. Demographic Data Facial Landmarks Sentiment Expressed Image Quality Brightness: 25.84 Sharpness: 160 General Attributes
  • 66. Serverless Rekognition Demo Serverless website that uses Rekognition to identify faces and classify pictures Amazon S3 AWS Lambda Amazon API Gateway Amazon DynamoDB Amazon Rekognition Mobile CodeFor.Cloud/image
  • 67.
  • 68. Unlimited Replays Returns an MP3 or audio stream Lightning Fast Response Fully Managed and Low Cost Amazon Polly Turn text into lifelike speech using deep learning technologies to synthesize speech that sounds like a human voice
  • 69. Amazon Polly “The temperature in WA is 75°F” “The temperature in Washington is 75 degrees Fahrenheit” Amazon Polly: Text In, Life-like Speech Out
  • 70. Amazon Lex Conversational interfaces for your applications, powered by the same Natural Language Understanding (NLU) & Automatic Speech Recognition (ASR) models as Alexa Integrated development in AWS console Trigger AWS Lambda functions Multi-step conversations Continually improving ASR & NLU models Enterprise connectors Fully Managed
  • 71. Intents A particular goal that the user wants to achieve Utterances Spoken or typed phrases that invoke your intent Slots Data the user must provide to fulfill the intent Prompts Questions that ask the user to input data Fulfillment The business logic required to fulfill the user’s intent BookHotel
  • 72. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FastSlowFast BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileMessage Stream Amazon EC2 Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS LoggingIoTApplicationsTransportMessaging ETL
  • 74. AWS Glue Easily understand your data sources, prepare the data, and load it reliably to data stores and your analytics pipeline Integrated with: S3, RDS, Redshift & any JDBC- compliant data store
  • 79. STORE CONSUMEPROCESS / ANALYZE Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI Applications & API Analysis and visualization Notebooks IDE Business users Data scientist, developers COLLECT ETL
  • 80. Amazon Quicksight • Fast, cloud-powered, BI service that makes it easy to build visualizations, perform ad-hoc analysis, and get insights from data. • Connectors for files, third party platforms, AWS services and other partner BI tools • In-memory calculation engine (SPICE) to accelerate analysis and visualization • $9 per user per month
  • 81.
  • 82. Athena & Quicksight Demo Amazon S3 Amazon Athena Amazon Quicksight Analyze past flight performance data stored in S3 Bureau of Transportation Flight Data Statistics www.transtats.bts.gov Create visualizations from S3 with Athena & Quicksight
  • 83. Putting It All Together
  • 84. Amazon SQS apps Streaming Amazon Kinesis Analytics Amazon KCL apps AWS Lambda Amazon Redshift COLLECT STORE CONSUMEPROCESS / ANALYZE Amazon Machine Learning Presto Amazon EMR Amazon Elasticsearch Service Apache Kafka Amazon SQS Amazon Kinesis Streams Amazon Kinesis Firehose Amazon DynamoDB Amazon S3 Amazon ElastiCache Amazon RDS Amazon DynamoDB Streams HotHotWarm FastSlowFast BatchMessageInteractiveStreamML SearchSQLNoSQLCacheFileQueueStream Amazon EC2 Amazon EC2 Mobile apps Web apps Devices Messaging Message Sensors & IoT platforms AWS IoT Data centers AWS Direct Connect AWS Import/Export Snowball Logging Amazon CloudWatch AWS CloudTrail RECORDS DOCUMENTS FILES MESSAGES STREAMS Amazon QuickSight Apps & Services Analysis&visualizationNotebooksIDEAPI Reference architecture LoggingIoTApplicationsTransportMessaging ETL
  • 85. Let’s talk business outcomes of data analytics!
  • 86. Suncorp is moving "all-in" on cloud. Project Ignite will extract benefits of $170 million - Group CEO Patrick Snowball Insurance Policy Insurance Claim Core Banking Life Admin
  • 87.
  • 88.
  • 89.
  • 90.
  • 92. AdRoll: AWS Lambda for log files Valentino Volonghi CTO, AdRoll “Polling is not a scalable strategy to figure out when new files are added to S3, especially when you add 17M of them per month. So we moved Lambda in front of S3.” • Cross-platform, cross-device advertising platform • Offers retargeting based on clickstream data 300TB new data/mont h
  • 93. Rethink how to become a data-driven business • Business outcomes - start with the insights and actions you want to drive, then work backwards to a streamlined design • Experimentation - start small, test many ideas, keep the good ones and scale those up, paying only for what you consume • Agile and timely - deploy data processing infrastructure in minutes, not months. take advantage of a rich platform of services to respond quickly to changing business needs

Notas del editor

  1. 50 mins
  2. Volume – 100 to 150TB a day Velocity – 1million reads and writes per second is becoming a norm Variety -> IOT/log data/ streaming data Transactional data File data Fixed Schema CSV Parquet Avero Schema-free JSON Key-value Small files, large files,
  3. Hourly server logs: were your systems misbehaving 1hr ago Weekly / Monthly Bill: what you spent this billing cycle Daily customer-preferences report from your web site’s click stream: what deal or ad to try next time Daily fraud reports: was there fraud yesterday Real-time alerts: what went wrong now Real-time spending caps: prevent overspending now Real-time analysis: what to offer the current customer now Real-time detection: block fraudulent use now I need to harness big data, fast I want more happy customers I want to save/make more money
  4. Hive Spark Storm Kafka HBase Flume Impala Cascading EMR DynamoDB S3 Redshift Kinesis RDS Glacier
  5. Is there a reference architecture? What tools should I use? How? Why?
  6. Primary Drivers: Maximize revenue by delivering consistent and personalized marketing and multichannel shopping experiences and keeping a fresh assortment of merchandise in stock Streamline supply chain operations by analyzing wholesale, inventory, RFID, and POS retail data in real time, automating data exchange with small suppliers, and leveraging consistent supplier data Secondary Drivers: Optimize store operations Boost performance and increase operational efficiency by archiving inactive data from key retail applications Empower customer service reps to manage issues effectively and be active in social media And now if you look at the retail market drivers, no surprise here to find as number priority: personalisation and the impact of personalisation down the chain. This is the number one priority for all retailers, but again we need to qualify a personalisation exercise. We need to understand the perimeter of personalisation, we need to be strategic on the different engagements we want propose to our customers, we need to be specific on the benefits the customer is expecting. The benefits could be related to basket transformation ratios, churn prevention strategies through tailored landing pages based on a simple analysis of search terms or a wider exercise conversion rates optimisations. Concept: Qualify the perimeter and the success criteria. Start small, prove your point and have a clear road map on what good looks like by qualifying difficulty/effort vs return. It’s great to talk about omnichannel but you will go nowhere if you don’t have a clear road maps of events and if you cannot demonstrate value. The second priority is the bottom line – supply chain – logistics – stock management. A more complex area due to the tools that are currently being used and the legacy aspects of these tools. However the market is changing, supply chain is becoming a commodity, most supply chain tools are moving to a SAAS model. Also, it’s fair to say that there is a close link in between the primary drivers – product assortment has got an impact on supply chain, providing a brand experience needs adaptation of the entire operations from both sales and support angles. AWS will definitely play a part on this market driver and we are currently helping organisations in few of these areas, for example NISA developing a mobile OCS or Kelloggs using analytics to optimize trade spent and avoid waste or Unilever for decreasing time to market of campaigns. The key here, is to understand the benefits we are bringing to the organisations.
  7. Sifting through data is challenging. Need a solution to store and process them and translate them into knowledge and insights Matchmaking millions of users with 100million of properties with thousands of agents. Users: Clickstream (View, Search, ) Contacts, Tours, Open Houses, Offers... Properties: Property facts & history Neighborhood & POI Agents: Availability Performance, Survey…
  8. "Redfin Hot Homes gives my clients the ultimate insider information," said Keith Thomas, a Redfin real estate agent in Orange County. "Now we know which homes we need to see today, and which ones can wait until next week." Users: Clickstream (View, Search, ) Contacts, Tours, Open Houses, Offers... Properties: Property facts & history Neighborhood & POI Agents: Availability Performance, Survey…
  9. Talk about the services AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. AWS Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device. You can also use AWS Lambda to create new back-end services where compute resources are automatically triggered based on custom requests. With AWS Lambda you pay only for the requests served and the compute time required to run your code. Billing is metered in increments of 100 milliseconds, making it cost-effective and easy to scale automatically from a few requests per day to thousands per second. Amazon DynamoDB Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale.
  10. Product owner shares journey to all in https://www.youtube.com/watch?v=TXmkj2a0fRE
  11. Euclid, a fast-growing technology start-up, helps brick-and-mortar retailers optimize marketing, merchandising, and operations performance by measuring foot traffic, store visits, walk-by conversion, bounce rate, visit duration, and customer loyalty. Euclid analyzes customer movement data to correlate traffic with marketing campaigns and to help retailers optimize hours for peak traffic, among other activities. Euclid stores up to 30 GB of uncompressed data per day in Amazon S3. Dexin Wang, Director of Platform Engineering reports, “Amazon Redshift is very easy to scale with minimal management requirements,” he comments. “It’s also cost effective. We saw a 90 percent cost reduction moving from our previous database system to Amazon Redshift.” Euclid stores information on Amazon Simple Storage Service (Amazon S3), and processes data in parallel with Amazon Elastic MapReduce (Amazon EMR). Initially, the company ran its data store on MySQL but moved to Amazon Redshift to improve performance for analytic workloads. “Using Amazon Redshift, our analysts can work with large data sets and run SQL-based queries to our stack quickly,” Dexin Wang, Director of Platform Engineering reports. “We were totally amazed at the speed—a simple count of rows that would take 5 1/2 hours using MySQL only took 30 seconds with Amazon Redshift.” Wang estimates that it only took a few days to port production data over to Amazon Redshift and start running analysis on it. “Amazon Redshift is very easy to scale with minimal management requirements,” he comments. “It’s also cost effective. We saw a 90 percent cost reduction moving from our previous database system to Amazon Redshift.” The analytics team leverages Amazon EMR and Hadoop to aggregate and analyze data. “Amazon EMR does most of the heavy lifting,” says Leung. “I used Hadoop in my previous work and we had to spend time installing and managing the cluster. We don’t have to do that with AWS. We only use the service when we need it, which is a great cost savings.” Figure 1 below demonstrates Euclid’s environment on AWS. As the company continues to grow, it takes advantage of Amazon Redshift and Amazon EMR to run complex queries on large and growing data sets with improved performance. “We’ve collected 1 to 30 GB of data per day over the last three years,” notes Leung. “By running on AWS and taking advantage of Amazon Redshift, we can scale to provide the computational power to complete a task on our entire data set, tens of terabytes, in a couple of hours—a task that used to take two weeks. Overall, compared to what we would have to spend to build an infrastructure capable of meeting our peak compute load requirements, we’re saving 80 to 90 percent using AWS.” Wang adds, “We didn’t want to worry about infrastructure or scaling. We just want to be able to ask questions and get answers. AWS helps us get answers quickly.”
  12. Turn on Euclid Express today and get key insights you never had before:  Walk-Bys  New & Return Visitors  Visit Duration  Engagement Rate  Bounce Rate  Storefront Potential & Conversion New insights across all your locations:  Identify leaders and laggards across your chain by KPI  Quickly replicate best practices from your top-performing stores  Pinpoint key trends and regional differences  Get a clearer overall picture by integrating existing systems Comprehensive features. Powerful insights:  Customer segmentation and loyalty assessment  Regional and categorical roll-up reporting  Zoning for large-format locations  Labor optimization and staffing schedules  Automated insights and predictive analysis  Analyze events, resets and promotions with Euclid EventIQ  Industry, segment, and geographic benchmarking
  13. So, as you get started, you'll want to first shortlist the business cases that you are looking to address, and then work backwards from there. Once you've gotten down to the one or two things that you believe you can change with the right insights, then this is the launch point. I'll share with you a starting point is for some customers, which is a combination of automating part of the business, and modernizing systems in the process. An example would be improving response times in a call center. You might have different response approaches and times, depending on the channel that a customer communicates with you. But, you may want to standardize your first response to a customer to happen within 1 hour, regardless of channel, whether that is call center, complaining on twitter, or leaving feedback in an app store. You would identify that a starting place will be to modernize the event and data capture systems, and focus on those that are used by your most valued customers. The key action is to automate the capture of events and data. Then, we will eventually look to automate the response that gets triggered to the customer.
  14. With that as a starting point, you would then look back to what sources of data would be usable to automate that response. In this case, it may be using existing metrics and monitoring systems to establish a baseline and see where the current responses are happening and not. The next source could be the actual workflow logs, let's say in your call center. This could give you an indication of keywords to look for, and how to start automating detection and response. Finally, there could be valuable information available within your ERP systems. This could be information like orders, returns, customer feedback, etc. The important thing is that you take a targeted set of data, and for a limited time scope to begin with
  15. Once you have found the specific data that you expect will be able to provide you insights, then you start thinking about lean design. You would ask, "what's the least amount of infrastructure that I need to turn this data into insights?" This is the key to unlocking a new world of solving challenges in an agile way. Start with a small, fully decoupled design. Each part of the system can scale up, as you add more users or add more data., and as you add more use cases. It's maybe the first time you've had the chance to only pay for what is giving you benefit. In the scenario of improving response time to customers feedback, we would look at using AWS Import/Export to load data into from the ERP and workflow systems. We may also be using our monitoring systems to detect customer feedback through other channels, and we would use some Kinesis to capture that in real-time. Then, all data would be put into S3 for that very inexpesnsive and durable storage and staging. Then if you remember the patterns we saw earlier of using Redshift as a flexible, purpose-built data warehouse. For the data that arrives unstructured, and needs context added, that would first be processed in EMR, the fully managed Hadoop service, but keeping the data in S3, and not having to lock yourself to a large persistent cluster. That processed data can then be moved straight into Redshift. And, for the users who will create and consume these insights? The key system requirement here is to automate response. So, the Amazon Simple Queueing Service, or SQS, would be used to connect to your existing applications which service your customers. For measuring performance and customer satisfaction, you could then use our cloud-native business intelligence and data visualization tool, which will be available later this year. Amazon Quicksight will give business users BI capabilities against source like Redshift for just $9 per month per user.
  16. "Redfin Hot Homes gives my clients the ultimate insider information," said Keith Thomas, a Redfin real estate agent in Orange County. "Now we know which homes we need to see today, and which ones can wait until next week." Users: Clickstream (View, Search, ) Contacts, Tours, Open Houses, Offers... Properties: Property facts & history Neighborhood & POI Agents: Availability Performance, Survey…
  17. But there's a flaw to this, right? This is just the data. Traditionally, when we look at this, we say "start with all your data, then ... question mark ... question mark ... profit!" That's really hard! This is the most expensive way to go about it, because you're paying all your costs upfront; trying to capture everything to begin with and hoping that there's some results. But what we're really after is that we want things like revenue lift. We want to enter and expand in new markets. We want customer delight and brand advocacy. We want operational excellence. These are the real goals. So what we'll talk about is how our customers are starting here, they're starting with what they need to get done and finding the shortest path with the least amount of data to get there. This is where we talk about these iterations and innovation cycles. So we'll cover some parts the platform -- what it is that you will use as you start your journey -- what's the smallest amount of "stuff" that you can use to get started.
  18. We have a lot of services, right? AWS has got over seventy services and if you're using AWS for the first time it might be hard to know exactly where should I get started. where should I get started so if you're looking at things like scaling your analytics and beginning a Big Data project, this is where you can drill in and start. From a data warehousing standpoint, I've talked about there being a lot of cost benefits of moving to Redshift, but more importantly you rethink what it means to run a data warehouse. It's no longer about buying a massive appliance. You can start with these really small clusters that are tuned for a particular group. So you can build a set of small, specialized data warehouses that are very inexpensive, very scalable. If you're working with unstructured data, which may include touching mobile data for the first time, and you want to run Hadoop, and you've got the skills internally but you're tired of trying to manage your own Hadoop clusters, because it's not a great experience on-premises. Moving over to something that's fully managed let's you say "right now I'm running 10 nodes and I want to change and run 50 nodes for one hour." That's clicks of a button. When you're done with that, you just shut it down and stop paying for the Hadoop cluster, because you've decoupled the storage, which lives in S3. This combination of Redshift, plus EMR, plus S3 is a really common combination of services with our customers. If you're also doing real-time streaming, Amazon Kinesis, our fully managed stream processing service, is a fit for capturing and moving the data. This often goes together with DynamoDB, our fully managed NoSQL service, for the real-time serving of data to customers. These are often the key services used for Big Data and analytics initiatives. We also have predictive modeling, with our machine learning service, Amazon Machine Learning. Now, a common pattern we see, because these are all interoperable, customers use one of these services and then drop the data back down into that very inexpensive storage layer, S3, consume from that and then push it back down. And we also provide backup services with glacier. So, this may be a pattern that also is attractive to you.
  19. Types of Data Database Records Search Documents Log Files Messaging Events Devices / Sensors / IoT Stream
  20. Huge buffer…
  21. http://calculator.s3.amazonaws.com/index.html#r=IAD&key=calc-BE3BA3E4-1AC5-4E7A-B542-015056D8EDAF Kinesis -> $52.14 per month SQS -> $133.42 per month for puts or $400/month (put, get, delete) DynamoDB -> $3809.88 per month (10TB of storage cost itself is $2500/month) Cost (100rpsx 35KB) $52/month $133/month * 2 = $266/month ? Amazon DynamoDB Service (US-East) $ Provisioned Throughput Capacity: $120 Indexed Data Storage: $2560.90 DynamoDB Streams: $1.3 Amazon SQS Service (US-East) Pricing Example Let’s assume that our data producers put 100 records per second in aggregate, and each record is 35KB. In this case, the total data input rate is 3.4MB/sec (100 records/sec*35KB/record). For simplicity, we assume that the throughput and data size of each record are stable and constant throughout the day. Please note that we can dynamically adjust the throughput of our Amazon Kinesis stream at any time. We first calculate the number of shards needed for our stream to achieve the required throughput. As one shard provides a capacity of 1MB/sec data input and supports 1000 records/sec, four shards provide a capacity of 4MB/sec data input and support 4000 records/sec. So a stream with four shards satisfies our required throughput of 3.4MB/sec at 100 records/sec. We then calculate our monthly Amazon Kinesis costs using Amazon Kinesis pricing in the US-East Region: Shard Hour: One shard costs $0.015 per hour, or $0.36 per day ($0.015*24). Our stream has four shards so that it costs $1.44 per day ($0.36*4). For a month with 31 days, our monthly Shard Hour cost is $44.64 ($1.44*31). PUT Payload Unit (25KB): As our record is 35KB, each record contains two PUT Payload Units. Our data producers put 100 records or 200 PUT Payload Units per second in aggregate. That is 267,840,000 records or 535,680,000 PUT Payload Units per month. As one million PUT Payload Units cost $0.014, our monthly PUT Payload Units cost is $7.499 ($0.014*535.68). Adding the Shard Hour and PUT Payload Unit costs together, our total Amazon Kinesis costs are $1.68 per day, or $52.14 per month. For $1.68 per day, we have a fully-managed streaming data infrastructure that enables us to continuously ingest 4MB of data per second, or 337GB of data per day in a reliable and elastic manner.
  22. Scenario1: http://calculator.s3.amazonaws.com/index.html#r=IAD&key=calc-F6B3AD98-1404-4770-BAB0-1F5397F445A7 Scenario 2: http://calculator.s3.amazonaws.com/index.html#r=IAD&key=calc-2440EC2A-1C16-4BCE-B5CE-5075887F4A47
  23. 2 x 2 Matrix Structured Level of query (from none to complex) Draw down the slide
  24. More : https://aws.amazon.com/blogs/aws/ec2-instance-update-x1-sap-hana-t2-nano-websites/
  25. AWS helps customers maximize the value of Big Data investments while reducing overall IT costs. Amazon S3 provides secure, highly durable storage as low as $28.16 per terabyte. With Amazon Glacier, AWS provides low cost data archive platform that starts at only $7.17 per terabyte. That’s why customers like Netflix, Nasdaq and Pinterest store and process petabytes of data for analytics in S3. AWS also provides a broad range of analytic options that provide customers with enterprise capabilities and performance without the typical high price and up-front investment of traditional enterprise software: AWS provides a managed petabyte-scale data warehouse and a super-fast business intelligence and visualization service at 1/10th de cost of traditional software solutions. With Amazon RedShift you can analyze a petabyte of data for only $0.25/hour and then use Amazon QuickSight to explore that data for only $10 per user per month. For streaming data, you can load a terabyte of streaming data with Amazon Kinesis Firehose for only $0.035 per GB. You can spin up a 10-node managed Spark cluster to aggregate data with Amazon EMR for only $0.15 per hour. https://na32.salesforce.com/06938000001bpTh
  26. Will this customer leave us?
  27. Add connector Direct Acyclic Graphs? Exactly once processing & DAG? – how do you do this?? https://storm.apache.org/documentation/Rationale.html http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming
  28. Cost: Redshift – Moderate Impala - Presto – Low S3A* is an open source connector. It is not in EMR 1.2.1 – using bootstrap you can install 2.2 ( we have a bootstrap action) Query Speed Redshift – Extremely fast SQL queries Spark, Impala – Extremely Fast to Fast Hive QL Hive, Tez – Moderately Fast to Slow Hive QL Data Volume? UDFs? Manageability? http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn https://amplab.cs.berkeley.edu/benchmark/ http://nerds.airbnb.com/redshift-performance-cost/
  29. Applications & API Analysis and Visualization Notebooks IDE
  30. "Over the next two years as we move to our optimised platform, we'll be able to extract ... benefits of $170 million," in addition to benefits already realised from the transformation process begun in 2010, Snowball said. Suncorp's vision for its "optimised platform" is digitally enabled customer-facing systems sitting atop simplified core administration systems that feed into a data lake that can drive predictive analytics and business intelligence across the group. "Increasingly our customers want to connect digitally and we're living in a world of both mobility and technological disruption," Snowball said. "To ensure that we stay ahead of the competition, we've been investing in systems that are digitally enabled to allow our customers and business partners to access us, how and where they want. "Standing behind our digital frontend we are completing the development of four core administration systems: One policy and one claim system for all our general insurance businesses both here and in New Zealand, a world-class banking system and a new life administration system." "These core systems will feed our customer, policy and claims data along with HR, finance and management data, into our single, centralised data lake," the group CEO said. "This will allow us to establish a best in class business intelligence function providing forward-looking, predictive analytics to deliver better solutions and outcomes for our customers. "All of this will sit in a secure and flexible cloud environment where our lean and agile capabilities will enable us to deliver new services at high speed and lower cost," Snowball said.
  31. When you leave here and go back to your office, hopefully some of the things you've seen today will spark ideas of how you can build systems that will better enable the business. As there is increasingly a recognition that businesses need to be more data-driven to enable automation and to enhance the decision making of the business, now is a good time to really rethink how to go about that. More and more, we see our customers moving on from old, legacy approaches of buying large inexpensive data infrastructure, which make months or years to start getting results from. To be truly business focused, there are three ways they are thinking differently. First is to start projects with specific business outcomes in mind. Start backwards from the insights and actions you want, then work backwards to a streamlined design Second is experimentation. Start with a lean design. Use just enough data to test your ideas, Use just enough services to test those ideas. The design of the system is to scale up capacity as and when you need it. So, if you hit on a great result, you scale that one up, and the ones that didn't work out can just be turned off. Think... win quick and fail cheap. Finally, speed. Our best customers are changing their markets; they're redefining what service levels and customer experience means in those markets. Much of this is that they move quickly. When an opportunity presents itself, and the business wants to move on that opportunity, they think in terms of weeks to design, and minutes to deploy. This gives a material advantage over businesses that will wait for 6 months to get approvals to buy an appliance and more storage. We've had a lot of customers really succeeding here in Southeast Asia. In Singapore, we have some great customers, like Redmart and Grab, and a number of others that are fundamentally changing aspects of our daily lives, as consumers. There are people in this room today who are going to be the ones we're talking about this time next year and I'm really looking forward to sharing your success at that time. I want to thank you very much I hope the rest of the conference is great for you guys