Más contenido relacionado
La actualidad más candente (20)
Similar a BigData Clusters Redefined (20)
Más de DataWorks Summit (20)
BigData Clusters Redefined
- 1. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
Samuel Kommu
Technical Marketing Engineer
sakommu@cisco.com
Hadoop Summit 2014
V3
- 2. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
• Hadoop Overview
• Scheduling and Prioritizing
Compute
Network
• Visibility Plugins
• Hadoop Optimization and Tuning
• Recommendations
• Q & A
- 3. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Big Data @ Cisco - www.cisco.com/go/bigdata
Multi-year network and compute analysis testing
(In conjunction with partners)
Hadoop World
2011, 2012 & 2013
Hadoop Summit
2012 & 2013
Certifications and Solutions with UCS C-Series and
Nexus Series Switches
Cloudera Hadoop Certified Technology
Oracle NoSQL Validated Solution
Visibility & Monitoring
- 5. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Could take several days
just to read
Per a test it took 11 days
100 Terabytes of data
- 6. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
NAS/SAN
Since you are still limited by the
Throughput of the storage systems
100 Terabytes of data
- 7. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
100 Terabytes of direct attached storage
Hadoop Distributed File System
Same job took 15 minutes
once they went parallel
spreading the load across 1000 nodes
- 8. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
• The Data Ingest & Replication
External Connectivity
East West Traffic (Replication of data blocks)
• Map Phase – Raw data Analyzed and
converted to name/value pair.
Workload translate to multiple batches of Map
task
Reducer can start the reduce phase ONLY
after the entire Map set is complete
• Mostly a IO/compute function
Hadoop Distributed File System Unstructured Data
Map
Map
Map
Map
Map
Key 1
Key 1
Key 1
Key 1
Key 1
Key 1
Key 1
Key 2
Key 1
Key 1
Key 1
Key 3
Key 1
Key 1
Key 1
Key 4
Reduce
Shuffle Phase
ReduceReduce
Result/Output
Reduce
Map
Map
Map
Map
Map
- 9. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
Key 1
Key 1
Key 1
Key 4
Key 1
Key 1
Key 1
Key 3
Key 1
Key 1
Key 1
Key 2
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
• Shuffle Phase - All name/value pair are sorted and
grouped by their keys.
• Reducer is PULLING the data from the Mapper Nodes
• High Network Activity
• Reduce Phase – All values associates with a key are
process for results, three phases
Copy - get intermediate result from each data
node local disk
Merge - to reduce the number of files
Reduce method
• Output Replication Phase - Reducer replicating result to
multiple nodes
Highest Network Activity
• Network Activities Dependent on Workload Behavior
Hadoop Distributed File System Unstructured Data
Map
Map
Map
Map
Map
Key 1
Key 1
Key 1
Key 1
Reduce
Shuffle Phase
ReduceReduce
Result/Output
Reduce
Map
Map
Map
Map
Map
- 10. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
Analyze
Search - Count
Extract Transform Load
(ETL - TeraSort)
Explode
Normalizing Data
Reduce
Ingress vs. Egress Data Set
1:0.3
Ingress vs. Egress Data Set
1:1
Ingress vs. Egress Data Set
1:2
Reduce
Reduce
- 11. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
Analyze
Simulated with
Shakespeare Wordcount
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort with output replication
Job Patterns have varying impact on network utilization
- 12. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
Small Flows/Messaging
(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)
Small – Medium Incast
(Hadoop Shuffle)
Large Flows
(HDFS Ingest)
Large Incast
(Hadoop Replication)
- 14. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
Scheduling
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
Rescheduling
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
Prioritize/De-Prioritize
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
- 15. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Client 1 Client 2
Resource Manager
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Node Manager
ContainerContainer
Job Submission
Resource Request
Node Status
MapReduce Status
- 16. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations
0
5000
10000
15000
20000
25000
30000
35000
40000
Latency(us)
Time
UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us)
1
70
139
208
277
346
415
484
553
622
691
760
829
898
967
1036
1105
1174
1243
1312
1381
1450
1519
1588
1657
1726
1795
1864
1933
2002
2071
2140
2209
2278
2347
2416
2485
2554
2623
2692
2761
2830
2899
2968
3037
3106
3175
3244
3313
3382
3451
3520
3589
3658
3727
3796
3865
3934
4003
4072
4141
4210
4279
4348
4417
4486
4555
4624
4693
4762
4831
4900
4969
5038
5107
5176
5245
5314
5383
5452
5521
5590
5659
5728
5797
5866
5935
BufferUsed
Timeline
Hadoop TeraSort Hbase
Read/Update
Latency
Comparison of Non-
QoS vs. QoS Policy
~60% for Read
Improvement
- 18. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Traditional Networking – Switching/Routing Decisions are per flow
What if we could divide a flow into multiple parts –
that could take independent network paths
- 19. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Normal
Congested
Leaf 1 Leaf 2
Spine 2Spine 1
In traditional networks
available today:
Leaf 1 is not aware of the
congestion between Spine 2
and Leaf 2
Congestion awareness across
the network is essential for
optimal traffic distribution
- 20. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Small FlowsLarge Flows
Impact of large flows
on small flows
Without
Dynamic Packet Prioritization
Large Flows tend to use up resources:
• Bandwidth
• Buffer
Unless smaller flows are prioritized – large
flows could potentially have an adverse
impact on the smaller flows
- 21. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Decisions are per leaf
Does not consider end-to-end connectivity/congestion
Load balancing decisions are made based on
end-to-end connectivity/congestion
50% Usage
Congested Link
66% Usage
33% Usage
ECMP DLB
Leaf 1 Leaf 2
Spine 2Spine 1
Leaf 1 Leaf 2
Spine 2Spine 1
- 22. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Buffer usage is
mostly on Leafs
Not on Spines
Map Progress
Reduce Progress
- 24. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
OS : Red Hat 6.2
Hadoop : Cloudera 3 U5
Memcached
Symmetric - No link failures Asymmetric – One failed link
- 25. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
Tests with Memcached
Dynamic Packet Prioritization helps improve
memcached performance
Note: These tests were disk bound – Will be
testing with servers containing higher number
of drives
- 26. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
ECMP
DLB
- 28. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30
• Hardware
1 x Nexus 9508 (Spine)
2 x Nexus 9396 (Leaf)
8 x UCS C240 M3
2 x Xeon(R) CPU E5-2690 0 @ 2.90GHz
100G RAM
4 x 1TB Drives
UCS VIC 1225
• Software
Red Hat 6.2
Cloudera 5
NXOS 6.1(2)I2(1)
- 29. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
• Open Framework
Modules will be on Github
• Based on
Apache Hadoop Rest APIs
NXOS Python APIs
- 30. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32
leaf-001# python bootflash:hadoopModule/vpmHadoop.py localNodes
---------------------------------------------------------------
Node Name Port Speed InRate OutRate Buffer
---------------------------------------------------------------
c240-m3-017 Eth1/1 10 1385580232 1128094464 0
c240-m3-018 Eth1/2 10 1882651664 1273181224 50
c240-m3-020 Eth1/3 10 1236743432 1238902664 0
c240-m3-021 Eth1/4 10 675482600 1292790704 0
---------------------------------------------------------------
In/Out Rate is the avg of last 30 seconds in Gbits/sec
NXAPI©
• Open Framework
• Based on
Apache Hadoop Rest APIs
NXOS Python APIs
Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins
- 31. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
leaf-001# python bootflash:hadoopModule/vpmHadoop.py allNodesJobs
Node Name Port Neighbor Speed
--------------------------------------
c240-m3-001 Eth2/2 spine-001 40
c240-m3-002 Eth2/2 spine-001 40
c240-m3-004 Eth2/2 spine-001 40
c240-m3-005 Eth2/2 spine-001 40
c240-m3-017 Eth1/1 local 10
c240-m3-018 Eth1/2 local 10
c240-m3-020 Eth1/3 local 10
c240-m3-021 Eth1/4 local 10
Job ID Job Name Host Name Application Progress %
---------------------------------------------------------------------
0887 TeraGen c240-m3-001 MAPREDUCE 37.7623
---------------------------------------------------------------------
Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins
NXAPI©
• Another example
With Job info
- 32. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
Instantaneous Results
Historic Data using
GraphiteTopology
- 34. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36
0
500
1000
1500
2000
2500
SSD Non-SSD
TimeTakeninSeconds
Time taken for 1TB Sort
SSD vs. Non-SSD Drives
- 35. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37
0
500
1000
1500
2000
2500
3000
Never Always
TimeTakeninSeconds
Time taken for 1TB Sort
THP Always vs THP Never
To disable:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
- 37. 39
Network Attributes
Architecture
Availability
Capacity, Scale &
Oversubscription
Flexibility
Management & Visibility
Integration Considerations
- 38. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40
Availability
• Single NIC failure doubles the job completion time.
• Dual NIC has no impact on job completion time
• Effective load-sharing of traffic flow on two NICs. NIC bonding configured at Linux – with LACP mode of bonding
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-
sharing
40
1161 min
286 min
- 39. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41
Single 1GE
100% Utilized
Dual 1GE
75% Utilized
10GE
40% Utilized
- 40. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42
Big Data @ Cisco -
www.cisco.com/go/bigdata
Q & A
For further questions and a
chance to win Beats headset
meet us in Booth P12