(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Data platform at Samsung (Big Learning)
1. SRA-SV | Cloud Research LabSRA-SV | Cloud Research Lab
Guangdeng Liao
Zhan Zhang
Samsung Cloud Research Lab
Data Platform at Samsung
2. SRA-SV | Cloud Research Lab Slide 2
Our Mission: provide scalable, reliable, and secure storage and
computation for Samsung R&D
Samsung Data Platform
Resources:
• Hundreds of machines
• Petabytes of storage
• keep increasing..
3. SRA-SV | Cloud Research Lab Slide 3
What we have in our platform
Distributed MR processing
Data warehousing with
Hive/Pig
In-house web-based ETL
portal
Many more..
Offline
K-V store HBase
In-house Blob store
Online Storm
Many more..
Online
Apache Mahout
ElasticSearch
In house unified web portal
In house Single Sign On
Visualization
Many more..
Dev. & management tools
By using platform, we already significantly improve ETL process, data
management and processing for other teams!!
4. SRA-SV | Cloud Research Lab Slide 4
So, are we done?
No. Many more complex challenges.
5. SRA-SV | Cloud Research Lab Slide 5
Challenge #1: How to build scalable and efficient machine
learning over Big Data?
6. SRA-SV | Cloud Research Lab Slide 6
MR-based Mahout is good but...
Not good at expressing data dependency and iterative algorithms like PageRank
Map: distribute rank to link targets
Reduce: collect ranks from multiple sources
Iterate
n
i i
i
tC
tPR
N
xPR
1 )(
)(
)1(
1
)(
One job/iteration
Startup penaltyI/O Penalty
Unfortunately, a lot of MLDM are iterative jobs
7. SRA-SV | Cloud Research Lab Slide 7
Graph naturally represents data dependency
8. SRA-SV | Cloud Research Lab Slide 8
Graph-based Processing: Think like a Vertex
Scheduling
p p
p
p
p
p
p
In-memory data graph over a cluster
Communication
– Message-based
– Shared memory-
based
Vertex abstraction
– Think like a vertex’s
– In-memory processing
Execution engine
– Bulk synchronous
parallel
– Asynchronous parallel
Popular frameworks:
– Giraph
– GraphLab
9. SRA-SV | Cloud Research Lab Slide 9
Graph-based Machine Learning
We used Apache Giraph 1.0 and developed machine learning library over it:
Alternative Least Square
(ALS)
Weight ALS
SGD ( Matrix Factorization)
Bias SGD
Belief Propagation
Recommendation Graphical Model
KMeans
KMeans++
Fuzzy-Clustering
Clustering
We see one magnitude order of speedups compared to MR-based approach
in our cluster
10. SRA-SV | Cloud Research Lab Slide 10
Challenge #2: How to make Big Model + Big Data like Deep
Learning scalable and efficient?
11. SRA-SV | Cloud Research Lab Slide 11
One example: Deep Learning1
Many more examples (millions to billions parameters ) in Speech
Recognition, Image Processing and NLP
1Imagenet classification with deep convolutional neural networks, in NIPS 2012
12. SRA-SV | Cloud Research Lab Slide 12
Model-Parallel Framework
User
defined
model
Auto-generation
of model topology
Auto-partition of
topology over
cluster
c1
c2
Auto-deployment
of topology (in-
memory)
c3
Neuron-like
programming
Message-based
communication
Message-driven
computation
Parallelize a big machine learning model over a cluster
13. SRA-SV | Cloud Research Lab Slide 13
Architecture over Yarn
Node Manager
Node manager
Controller
Partition and
deploy topology
Node manager
Application Master
Container
Container
Container
Data Communication:
• node-level
• group-level
Control comm. based on
Thrift
Data comm. based on Netty
14. SRA-SV | Cloud Research Lab Slide 14
Execution Engine
• Execution Engine (Deep Neural Net)
– Training layer by layer controlled by
Execution Engine..
– Progress reporting
– Process control: end user can control the
training process, and even restart the
process from a certain point
– System snapshot for fault tolerance
Input
RBM
RBMSoftmax
Fully connected
• Generic Execution Engine
– Abstract the common design pattern from our development
experiences of deep neural net algorithm.
– Generalized to support various other algorithms
15. SRA-SV | Cloud Research Lab Slide 15
Model-parallel is still not scalable enough over Big Data
16. SRA-SV | Cloud Research Lab Slide 16
Deep Learning Platform: Hybrid of Data-parallelism and Model-
parallelism
……..Data Chunk
Model-parallel Model-parallel
Data Chunk
……..
Parameter
Server 1
Parameter
Server n
……..
Parameters coordination
Data-parallelism
Lots of model
instances
Parameter servers
help models learn
each other
17. SRA-SV | Cloud Research Lab Slide 17
Distributed Parameter Servers
Client Client Client
HBase/HDFS
In-memory
cache/storage
In-memory
cache/storage
In-memory
cache/storage
Server 1 Server 2 Server 3
Netty communication layer
Currently we support asynchronous parameter pulls and push
Synchronized version is also supported
Pull/Push/Sync
18. SRA-SV | Cloud Research Lab Slide 18
Deep Learning Algorithms
Aim at three major application fields: speech recognition, image
processing and NLP
What we have developed Our Roadmap
Feed Forward Neural Network
Restricted Boltzmann Machine
Deep Belief Network
Sparse Auto-encoder
Convolutional Neural Network
Recurrent Neural Network
19. SRA-SV | Cloud Research Lab Slide 19
Summary
• We are providing our Hadoop-based data platform
– hundreds machines, petabytes of storages
– Hadoop ecosystem (MapReduce, HBase, Yarn, HDFS, Zookeeper, Oozie, Lipstick, Mahout etc.)
– In-house ETL pipeline
– In-house unified web portal with SSO
• We are working hard on big learning to make our platform intelligent
– Large-scale graph-based machine learning
– Large-scale deep learning
– And many more under progress