2. Who am I
• Schubert Zhang (张松波)
• Chief Architect and Director of Big Data Engineering
and Cloud
• Research Cloud Tech., Develop Cloud Projects and
Products from 2007
• Led the core development team of CMCC “Big Cloud”.
@Hanborq
• 10-years telecom products development and tech-
management. @UTStarcom
3. Agenda
• Introduction of Cloud Storage and Computing
• Big Data and Cloud
• Our Big-Data/Cloud Products and Solutions
• Anything for Discussion …
5. A Popular Definition of Cloud …
• Cloud computing is a model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.
• Cloud storage is a model of networked online storage where data is stored on
multiple servers. Hosting companies operate large data centers, which provides
the resources according to the requirements of the customer and expose them as
storage pools, which the customers can themselves use to store files or data
objects. Physically, the resource may span across multiple servers or/and data
centers.
• It promotes availability and is composed of five essential characteristics, three
service models, and four deployment models.
6. A Popular Definition of Cloud …
Hybrid
Clouds
Deployment Private Community Public Cloud
Models Cloud Cloud
Service Software as a Platform as a Infrastructure as a
Models Service (SaaS) Service (PaaS) Service (IaaS)
On Demand Self-Service
Essential Broad Network Access Rapid Elasticity
Characteristics
Resource Pooling Measured Service
Massive Scale Elastic Computing
Common Homogeneity Geographic Distribution
Characteristics
Virtualization Service Orientation
Low Cost Software Advanced Security
7. Examples of Famous Cloud Products
• Google Techs:
– Google AppEngine (Storage for Database, etc.) GFS2/Bigtable/MapReduce/
– Google Storage (Storage for Objects) Megastore/Spanner/Pregel
/Dremel…
• Amazon AWS
– Simple Storage Service – S3 (Storage for Objects) Techs:
– Cloud Drive (Online Storage for Individuals) Web-Service-Protocol/
– SimpleDB (Storage for Database)
– Elastic Compute Cloud – EC2 (Compute)
Bitstore/Keymap/Dynamo
…
• Rackspace
Techs:
– Cloud Servers (Compute)
– Cloud Files (Storage for Objects) Open Stack …
• Facebook Techs:
– Messages Hive/Scribe/Haystack/Hadoop
– Photo Storage
…
• Cloudera
– Hadoop …
8. We focus on
The Technologies Back of the Cloud
• Storage • Computing
• High Scalability • High Scalability
– Shared-Nothing
– Object-Oriented • Parallel Computing Framework
– NoSQL
– … – MR - MapReduce
– BSP - Bulk Synchronous Parallel
• High Availability
– Failure-Detecting • Job/Task scheduler
– Server Clustering
– Replication
• Failure rework
– Eventual Consistency • PDM - Parallel Data Analysis/Mining
– …
Algorithms
• Big Data – Simple Statistic/Analysis
– PB level storage
– Structured or non-structured – Classification/Clustering …
– Information Retrieval – For Recommendation and AD
– Indexing
– Automatic re-sharding/re-partitioning – …
– Automatic load balancing
– …
• High Throughput/Latency
– Optimized IO and data write/read models.
10. Big Data
• Immutable Law of Big Data
– Volume
– Variety
– Velocity
• Need ….
– Distributed System
• Many-many commodity machines
– Scale-out vs. Scale-Up
• Scale-out: Auto vs. Manually
11. Big Data, Big Business
$2.25B
$400M
$1.7B
$250M
$263M
$2.35B
>>$30.5M (vc)
Storage Products/Solutions Data Warehouse
NAS (Limited Scale-out) (MPP)
12. The Next Decade in Data Management
A stable system capable of variety of apps is necessary.
Innovations in database are a requirement.
New data stores are necessary.
Differentiation between programs ill continue until key innovations in data management
platforms become uniform.
17. Products and Features
Cloud API
Cloud DataStore ObjectStorage MapReduce Compute
Services Cloud Cloud Cloud Cloud
SandStor PebStor MapReduce
Cloud vCompute
CloudOS
Stack
Hardware & OS
CloudOS SandStor PebStor MapReduce vCompute
• Distributed Cloud Platform • Distributed • Distributed Blob • Flexible Parallel Data • Virtual Machines
• Commodity Hardware and Structured Data Data Management Processing and Computing
Cluster Management Framework Resources mgmt
• Common features
• Common features of CloudOS • Common features of • Multi VMs support
• High Scalability CloudOS
• High Reliability(Data Replication) of CloudOS • Efficiency indexes • Elastic VMs
• Large-scale
• High Availability • High efficiency and meta mgmt provisioning
• High parallelized
Indexing • Efficiency storage • Auto-scale
• Strong Consistency • Locality computing
• Multi-level Cache space mgmt
• High Throughput • Simple model for
• Compression • De-duplicating programming
• Load Balancing
• Fast random access, • Unlimited blob size • Abundant high-level
• Global Data Access
Low Latency languages and
• Global File system toolkits
• Flexible Schema
• Simplify Complexity of Apps • Seamlessly integrated
• High Durability, no
data loss with storage system
July 3, 2012 17
18. Cloud Service Platform
Cloud Services 相似的同类产品或业务 • Cloud Services API
ObjectStorage Cloud Service Amazon S3 – 基于Web,随处可得
Google Storage for Developer – RESTful风格,简单易用
Rackspace Files/OpenStack Swift – 提供对语言开发SDK
Google BlobStore
DataStore Cloud Service Amazon SimpleDB • Cloud Services的特点
Google DataStore – 用户无需关心实现
MapReduce Cloud Service Amazon MapReduce – 随处可得
Hadooop – 数据可靠性高
Video Media Cloud Service … Video – 伸缩性强
Delivery/Streaming/Transcoding/ – 可用性高(99.9%)
Time-shifting/Analytics
– 按实际使用付费
– 简单易用
• Multi-Level Cloud Services:
– API符合业界标准/习惯
– Infrastructure
– Platform
– 丰富的管理和监控工具
– Applications – 严密且灵活的安全策略
– 多种云服务整合的AAA服
务
19. Object Storage Platform
build another S3
RockStor Object Storage system provides object storage infrastructure
services which guaranteed efficiency, robustness and load-balance.
Object Access Layer
Providing Client Lib Object-Oriented
High Availability
MetaStore Layer
DHT-based Consistent Overlay Network
High Scalability
Data Chunk Store Layer
Autonomous Overlay Network Huge Capacity
Clustered storage nodes
24. CloudNAS+MagicBox Enterprise
Solution
办公/SOHO网络 Company LAN or WAN
BigdataClou
d NAS Proxy
Enterprise Private
Access files via Web Service BigdataCloud
CIFS/NFS/FTP RESTful API
MagicBox
Service
MagicBox
Client
• CloudNAS • MagicBox
NAS Proxy + NAS in BigdataCloud Backup/Sync/Sharing/Versioning
– File Server
– Documents Backup
– Archive Server
– Backup Server – Collaboration
25. Parallel Computing Platform
Applications
Dataset as Input. job launch
Partition/Split as used
defined policy
MapReduce
JobTracker
ass
ign
red
assign map uce
Data Split-1 Map-1
Data Split-2 Map-2
Reduce-1 Output-1
Data Split-3 Map-3
Data Split-4 Map-4
Reduce-2 Output-2
Data Split-5 Map-5
MapReduce
BSP