1. Analytical Compute Grid
(ACG)
Elastic “Big Data” Infrastructure
Rackspace® Private Cloud powered by OpenStack® Use Case
by Natasha Gajic
October 17, 2012
2. Rackspace’s EBI Environment
Current Environment “Big Data” Problem
Windows and Linux Cost of purchasing
operating systems additional licenses
Oracle and Microsoft Time required to set up
databases solutions new hardware
Microsoft and Oracle Increased demand for DBA
replication technology resources
SSIS System performance
Informatica System scalability
Dedicated servers Capacity
Rapid data set growth
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
2
3. Analytical Compute Grid (ACG) Features
• Host ever growing set of data
• Quick data collection and retrieval
• Rapid scalability
• Ease of maintenance
• Provide standard data access API
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
3
4. Analytical Compute Grid (ACG) Features
• Ability to provide variety of storage types:
• Columnar
• Relational
• HDFS
• Enable users to select optimal storage
type for information collected
• Leverage Rackspace® Private Cloud
powered by OpenStack® and open
source technology
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
4
10. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
10
11. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
11
12. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
12
13. ACG on Rackspace® Private Cloud powered by OpenStack®
Node
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
13
14. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
14
15. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
15
16. ACG on Rackspace® Private Cloud powered by OpenStack®
Controller
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
16
17. ACG on Rackspace® Private Cloud powered by OpenStack®
API
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
17
18. ACG on Rackspace® Private
Cloud powered by OpenStack®
Indexing Structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
18
19. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure:
• Resides on a set of Rackspace® Private Cloud powered by
OpenStack® instances
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
19
20. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure:
• Resides on a set of Rackspace® Private Cloud powered by
OpenStack® instances
• It is a set of pointers ultimately addressing database entities
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
20
21. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure:
• Resides on a set of Rackspace® Private Cloud powered by
OpenStack® instances
• It is a set of pointers ultimately addressing database entities
• ACG Controller manages Indexing Structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
21
22. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure:
• Resides on a set of Rackspace® Private Cloud powered by
OpenStack® instances
• It is a set of pointers ultimately addressing database entities
• ACG Controller manages Indexing Structure
• Dynamically expands vertically and horizontally to address a
growing data set
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
22
23. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure Enables:
• Distribution of data bases across many instances
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
23
24. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure Enables:
• Distribution of data bases across many instances
• Splitting large data sets across many instances
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
24
25. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure Enables:
• Distribution of data bases across many instances
• Splitting large data sets across many instances
• Parallelization of large data set queries
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
25
26. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure Enables:
• Distribution of data bases across many instances
• Splitting large data sets across many instances
• Parallelization of large data set queries
• Deploying data stores with optimal configuration, minimizing
maintenance
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
26
27. ACG on Rackspace® Private Cloud powered by OpenStack®
Indexing Structure
• ACG Indexing Structure Enables:
• Distribution of data bases across many instances
• Splitting large data sets across many instances
• Parallelization of large data set queries
• Deploying data stores with optimal configuration, minimizing
maintenance
• Accessing data residing in
variety of storage types via
uniform interface
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
27
28. ACG on Rackspace® Private Cloud powered by OpenStack®
Sorter & Aggregator
• ACG Sorter & Aggregator Enables:
• Joining the results from multiple ACG nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
28
29. ACG on Rackspace® Private Cloud powered by OpenStack®
Sorter & Aggregator
• ACG Sorter & Aggregator Enables:
• Joining the results from multiple ACG nodes
• Result sorting and aggregation
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
29
30. ACG on Rackspace® Private Cloud powered by OpenStack®
Sorter & Aggregator
• ACG Sorter & Aggregator Enables:
• Joining the results from multiple ACG nodes
• Result sorting and aggregation
• Together with temporary segment it will support
joining heterogeneous data sets
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
30
31. ACG on Rackspace® Private
Cloud powered by OpenStack®
Quality Attributes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
31
32. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack®
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
32
33. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack®
Creates ACG node in 30 seconds
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
33
34. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack®
Creates ACG node in 30 seconds
Creates ACG nodes concurrently
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
34
35. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size resulting
Creates ACG node in 30 seconds in:
Creates ACG nodes concurrently
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
35
36. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size resulting
Creates ACG node in 30 seconds in: Quick data distribution
Creates ACG nodes concurrently
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
36
37. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size resulting
Creates ACG node in 30 seconds in: Quick data distribution
Creates ACG nodes concurrently Query parallelization
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
37
38. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes - Performance
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size resulting
Creates ACG node in 30 seconds in: Quick data distribution
Creates ACG nodes concurrently Query parallelization
Fast data retrieval
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
38
39. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Scalability
Rackspace® Private Cloud ACG
powered by OpenStack®
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
39
40. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Scalability
Rackspace® Private Cloud ACG
powered by OpenStack®
Quick and concurrent ACG node
creation
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
40
41. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Scalability
Rackspace® Private Cloud ACG
powered by OpenStack®
Quick and concurrent ACG node
creation
Ability to re-size existing nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
41
42. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Scalability
Rackspace® Private Cloud ACG
powered by OpenStack®
Quick and concurrent ACG node
creation
Ability to re-size existing nodes
Ability to remove nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
42
43. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Scalability
Rackspace® Private Cloud ACG
powered by OpenStack® Indexing structure and controlled
Quick and concurrent ACG node data set size allow ACG to
creation stabilize quickly as it expands or
Ability to re-size existing nodes contracts
Ability to remove nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
43
44. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud ACG
powered by OpenStack®
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
44
45. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud ACG
powered by OpenStack®
Rapidly replace failed ACG nodes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
45
46. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Availability
Rackspace® Private Cloud ACG
powered by OpenStack® Deploys data store native
Rapidly replace failed ACG nodes availability mechanisms
(replication, data distribution…)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
46
47. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack®
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
47
48. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack®
Adding ACG nodes expands:
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
48
50. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack®
Adding ACG nodes expands:
Storage capacity
CPU power
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
50
51. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack®
Adding ACG nodes expands:
Storage capacity
CPU power
RAM
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
51
52. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack®
Adding ACG nodes expands:
Storage capacity
CPU power
RAM
No DBA or system administrators
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
52
53. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size enables:
Adding ACG nodes expands:
Storage capacity
CPU power
RAM
No DBA or system administrators
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
53
54. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size enables:
Adding ACG nodes expands: Optimal and stable data store
Storage capacity configuration
CPU power
RAM
No DBA or system administrators
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
54
55. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size enables:
Adding ACG nodes expands: Optimal and stable data store
Storage capacity configuration
CPU power Reducing demand for managing
RAM data store objects
No DBA or system administrators
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
55
56. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Maintainability
Rackspace® Private Cloud ACG
powered by OpenStack® Controlled data set size enables:
Adding ACG nodes expands: Optimal and stable data store
Storage capacity configuration
CPU power Reducing demand for managing
RAM data store objects
No DBA or system administrators Stable query execution plans
activity required
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
56
57. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
57
58. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
58
59. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:
Columnar – Cassandra : time series data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
59
60. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:
Columnar – Cassandra : time series data
Relational – PostgreSQL : relational data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
60
61. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types:
Columnar – Cassandra : time series data
Relational – PostgreSQL : relational data
HDFS – Hadoop : un-structured data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
61
62. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Flexibility
ACG
Variety of storage types: Ability to select optimal storage
Columnar – Cassandra : time series data type for individual use case
Relational – PostgreSQL : relational data
HDFS – Hadoop : un-structured data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
62
63. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
63
64. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces:
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
64
65. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces:
SQL language
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
65
66. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces:
SQL language
JDBC API
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
66
67. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces:
SQL language
JDBC API
Data store native calls
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
67
68. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces: Native bulk loader utility
SQL language
JDBC API
Data store native calls
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
68
69. ACG on Rackspace® Private Cloud powered by OpenStack®
Quality Attributes – Usability
ACG
Standard interfaces: Native bulk loader utility
SQL language
JDBC API ACG will support joining
Data store native calls heterogeneous data sets
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
69
70. ACG on Rackspace® Private
Cloud powered by OpenStack®
Rackspace Use Case
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
70
71. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Subject:
• Complex availability calculation sourcing 3
months of monitoring data and creating 1 billion
records in initial calculation
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
71
72. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 1
• Data Warehouse Microsoft SQL server database
• SSIS data loading
• SQL server with 24 CPUs and 250GB RAM was
dedicated to the initial calculation
• SQL server stored procedure performed the
calculation
• Source and result are stored in traditional data
warehouse structure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
72
73. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case
• Environment 2
• In 30 seconds, ACG Node Manager instantiated
new columnar data store consisting of 4
Cassandra nodes, and registered it in ACG
Indexing Structure
• Each ACG node has 2CPUs and 8GB RAM
• Informatica data loading
• Calculation developed in Java
• Source and result are stored in columnar
structure suitable for time series data
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
73
74. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Result
• Calculation Duration
•Microsoft SQL Server lasted 5 days
•ACG calculation completed in 3.5 hours
• Storage Size
• Microsoft SQL server 500GB
•ACG 20 GB
• Complexity of the calculation
•Columnar data store is optimal for time series data.
Sourcing from columnar data store resulted in relatively
simple Java calculation process comparing to SQL
server stored procedure
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
74
75. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
75
76. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
76
77. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
•Simplified processes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
77
78. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
•Simplified processes
•Ability to process terabytes of data per day close to
real-time and on-demand
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
78
79. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
•Simplified processes
•Ability to process terabytes of data per day close to
real-time and on-demand
•Improved trending and reporting:
• enhances support capabilities
• improved Rackspace customer experience
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
79
80. ACG on Rackspace® Private Cloud powered by OpenStack®
Rackspace Use Case - Conclusion
• Selecting optimal data store for use case resulted in:
• Substantial performance improvement
• Reduced storage demand
• Simplified processes
• Ability to process terabytes of data per day close to
real-time and on-demand
• Improved trending and reporting:
• enhances support capabilities
• improved Rackspace customer experience
• Significant cost reduction
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
80