Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.
6. Data
App
App
h(p://blog.mccrory.me/2010/12/07/data-‐gravity-‐in-‐the-‐clouds/
Data
has
gravity
Compute
Storage
Big
Data
7. Data
App
App
h(p://blog.mccrory.me/2010/12/07/data-‐gravity-‐in-‐the-‐clouds/
latency
Throughput
…and
iner0a
at
volume…
Compute
Storage
Big
Data
10. Getting your Data into AWS
Amazon S3
Corporate
Data
Center
• Console Upload
• FTP
• AWS Import Export
• S3 API
• Direct Connect
• Storage Gateway
• 3rd Party Commercial Apps
• Tsunami UDP
11. Write directly to a data source
Your
applica+on
Amazon S3
DynamoDB
Any
other
data
store
Amazon S3
Amazon
EC2
12. Queue, pre-process and then write
Amazon
Simple
Queue
Service
(SQS)
Amazon S3
DynamoDB
Any
other
data
store
13. Amazon
SQS
Amazon S3
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
Choose depending upon design
16. EMR is Hadoop in the Cloud
Amazon Elastic MapReduce (EMR)?
17. EMR
Cluster
S3
Put
the
data
into
S3
Choose:
Hadoop
distribuGon,
#
of
nodes,
types
of
nodes,
custom
configs,
Hive/Pig/etc.
Get
the
output
from
S3
Launch
the
cluster
using
the
EMR
console,
CLI,
SDK,
or
APIs
You
can
also
store
everything
in
HDFS
How does EMR work ?
23. When
you
turn
off
your
cloud
resources,
you
actually
stop
paying
for
them
24. SQL based processing
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehouse
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
25. Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service
in the AWS cloud
What is Amazon Redshift ?
Easy to provision and scale
No upfront costs, pay as you go
High performance at a low price
Open and flexible with support for popular BI tools
29. Your choice of BI Tools
Amazon S3 Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
33. Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift
Web App Server
Visualization tools
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
34. Sharing results and visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
35. Geospatial Visualizations
Amazon S3 Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Visualization tools
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
36. Rinse and Repeat
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Amazon data pipeline
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
37. The complete architecture
Amazon S3 Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Amazon data pipeline
Amazon
SQS
DynamoDB
Any
SQL
or
NoSQL
Store
Log
Aggrega+on
tools
39. Amazon Kinesis
• Real-time processing
• Massive scale
• Integrated
• Use cases:
• Real-time log analysis
• Real-time data analytics
• Social media monitoring
• Financial transactions
• Online machine learning
40. Amazon Kinesis Data Flow
Data
Sources
App.4
[Machine
Learning]
AWS
Endpoint
App.1
[Aggregate
&
De-‐
Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
ExtracGon]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability Zone
Shard 1
Shard 2
Shard N
Availability Zone Availability Zone
43. Who they are
What they can do
Your real life connections to them
Examples of what they can do
44. Data Architecture
Data Analyst
Raw Data
Get
Data
Join via Facebook
Add a Skill Page
Invite Friends
Web Servers Amazon S3
User Action Trace Events
EMR
Hive Scripts Process Content
• Process log files with
regular expressions to parse
out the info we need.
• Processes cookies into
useful searchable data such
as Session, UserId, API
Security token.
• Filters surplus info like
internal varnish logging.
Amazon S3
Aggregated Data
Raw Events
Internal Web
Excel Tableau
Amazon Redshift
45. We
found
that
Amazon
Redshi^
offers
the
performance
we
needed
while
freeing
us
from
the
licensing
costs
of
our
previous
soluGon
With
Amazon
Redshi^
and
Tableau,
anyone
in
the
company
can
set
up
any
queries
they
like—from
how
users
are
reacGng
to
a
feature,
to
growth
by
demographic
or
geography,
to
the
impact
sales
efforts
have
had
in
different
areas.
It’s
very
flexible
Jon
Hoffman,
So<ware
Engineer,
Foursquare
0
0.2
0.4
0.6
Female Male
Gender
0 20 40 60 80
Age
Foursquare
Gorilla Coffee
Gray's Papaya
Amorino
When do people go to a place?