[Webinar] SpiraTest - Setting New Standards in Quality Assurance
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
1. AWS Roadshow 2013
Über den Wolken – befreien Sie Ihre IT
Datenanalyse und Business Intelligence
Michael Hanisch
Mgr. Solutions Architecture
Matthias Jung
Solutions Architect
Constantin Gonzalez
Solutions Architect
9. Data volume
Generated data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
10. Elastic and highly scalable
+
No upfront capital expense
+
Only pay for what you use
+
Available on-demand
=
Remove
constraints
32. How does it work?
1. Put the data
into S3 (or HDFS)
S3
EMR Cluster
EMR
3. Get the
results
2. Launch your cluster.
Choose:
• Hadoop distribution
• How many nodes
• Node type (hi-CPU,
hi-memory, etc.)
• Hadoop apps (Hive,
Pig, HBase)
33. How does it work?
EMR Cluster
S3
EMR
You can
easily resize
the cluster
34. How does it work?
EMR Cluster
S3
EMR
Use Spot
nodes to
save time
and money
35. How does it work?
EMR Cluster
S3
EMR
Launch parallel clusters
against the same data
source (tune for the
workload)
36. How does it work?
S3
EMR Cluster
When the work is complete,
you can terminate the cluster
(and stop paying)
37. How does it work?
EMR Cluster
You can store
everything in HDFS
(local disk)
High Storage nodes
= 48 TB/node
38. How does it work?
EMR Cluster
Launch in a Virtual
Private Cloud for
extra security
45. Customers asked us for a data warehouse the AWS way:
Easy to provision and scale up massively
No upfront costs, pay as you go
Really fast performance at a really low price
Open and flexible with support for popular tools
46. Amazon Redshift Is:
A fast and powerful, petabyte-scale data warehouse that is
A Lot Faster
A Lot Cheaper
Amazon Redshift
A Whole Lot Simpler
48. Amazon Redshift parallelizes and distributes everything
Common BI Tools
Query
JDBC/ ODBC
Leader
Node
Load
Backup
Restore
Resize
1 0 GigE Mesh
Compute
Node
Compute
Node
Compute
Node
49. Amazon Redshift Runs on Optimized Hardware
HS1.8XL:
128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate
HS1.XL:
16GB RAM, 2 Cores, 3 Spindles, 2TB Storage
Optimized for I/O intensive workloads
High disk density
Runs in HPC - fast network
HS1.8XL available on Amazon EC2
50. Redshift lets you start
small Node (XL) grow big Large Node (8XL)
Extra Large and
8 Extra
3 spindles, 2TB, 15GiB RAM
2 virtual cores, 10GigE
Single Node (2TB)
24 spindles, 16TB, 120GiB RAM
16 virtual cores, 10GigE
Cluster 2-100 Nodes (32TB – 1.6PB)
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
Cluster 2-32 Nodes (4TB – 64TB)
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
8X
L
51. Priced to Analyze All the
Customer’s Data
Price Per Hour for HS1.XL
Single Node
Effective Hourly Price Per
TB
Effective Annual Price per
TB
On-Demand
$ 0.850
$ 0.425
$ 3,723
1 Year Reservation
$ 0.500
$ 0.250
$ 2,190
3 Year Reservation
$ 0.228
$ 0.114
$
Simple Pricing: Number of Nodes x Cost per Hour
No charge for Leader Node
Pay as you grow
999
52. Amazon Redshift Simplifies
Provisioning
•
Create a cluster in minutes
•
Automatically patch your OS and data warehouse software
•
Scale up to 1.6PB with a few clicks and no downtime
Amazon Redshift
Amazon Redshift
54. Initial Pilot Results
Current production environment
32 nodes, 128 CPUs, 4.2TB RAM, 1.6 PB disk
Tested 2B row data set, 6 representative queries on a
2-node Amazon Redshift cluster
queries ran > 10x faster
55. Amazon Redshift Integrates
With All Data Sources
Amazon EC2
Amazon
DynamoDB
Amazon Relational
Database Service (RDS)
Amazon
Redshift
Corporate
Data Center
Amazon Elastic
MapReduce
Amazon Simple Storage
Service (S3)
AWS Storage
Gateway Service
56. Integrates With Existing BI Tools
JDBC/ODBC
Amazon Redshift
Connect your tools to Amazon Redshift using standard
drivers from PostgreSQL.org
58. Cloud ETL for Big Data
Reporting
and BI
S3
Elastic MapReduce
•
•
•
Redshift
Maintain online SQL access to your historical data
Transformation and enrichment with EMR
Longer history ensures better insight
61. AWS Data Pipeline
Data-intensive orchestration and automation
Reliable and scheduled
Easy to use, drag and drop
Execution and retry logic
Map data dependencies
Create and manage temporary compute
resources