1) Twitter migrated their Hadoop infrastructure from Hadoop 1 to Hadoop 2 to address scalability challenges and improve efficiency from tens of thousands of servers and hundreds of petabytes of data.
2) The migration involved extensive testing of Hadoop 2 configurations and dependent components before rolling out to production clusters supporting various use cases such as analytics, personalization, and backups.
3) Challenges in the migration process included addressing hard-coded filesystem references and enabling interoperability between Hadoop 1 and Hadoop 2 jobs, which required tools like HDFS ViewFS.
2. About this talk
Share @twitterhadoop’s efforts, experience and learning in
moving thousand users and multi petabyte workloads from
Hadoop 1 to Hadoop 2
@twitterhadoop
2 / 29 v1.0
3. Use cases
Personalization
Graph analysis, Recommendations, Trends, User/topic modeling
Analytics
a/b testing, user behavior analysis, api analytics
Growth
Network Digest, People Recommendations, Email
Revenue
Engagement prediction, Ad targeting, ads analytics, marketplace optimization
Nielsen Twitter TV Rating
Tweet impressions processing
Backups & Scribe Logs
MySQL backups, Manhattan backups, FrontEnd scribe logs
Many more...
@twitterhadoop
3 / 29 v1.0
4. Hadoop and Data pipeline
TFE
hadoop real
time
hadoop
processing
hadoop
warehouse
hadoop
cold
hadoop
backupsSearch,
Ads, etc Partners
MySQL
hadoop
hbase
Vertica
Manhatta
n
hadoop
tst
@twitterhadoop
SVN, Git,
...
hadoop
tst
4 / 29 v1.0
5. Elephant Scale
➔ Tens of thousands Hadoop servers
(Mix of hardware)
➔ Hundreds of thousands of disk drives
➔ Few hundred PB data stored in
HDFS
➔ Hundreds of thousands of daily
hadoop jobs
➔ Tens of millions of daily hadoop tasks
@twitterhadoop
Individual Cluster Stats
➔ More than 3500 nodes
➔ 30-50+ PB data stored in HDFS
➔ 35K RPC/second on NNs
➔ 30K+ jobs per day
➔ 10M+ tasks per day
➔ 6PB+ data crunched per day
5 / 29 v1.0
6. Hadoop 1 Challenges (Q4-2012)
Growth:
Supporting twitter growth,
Request for new features on
older branch, new JAVA
Scalability:
NameNode files/blocks, NN
Operations, GC pause,
Checkpointing
JobTracker GC pause, task
assignment
Reliability:
SPOF NN and JT, NameNode
restart delays
Efficiency:
Slot utilization, QoS, Multi
Tenant, New features &
frameworks
Maintenance:
Old codebase, Numerous issues
fixed in later versions, dev
branch
. @twitterhadoop
6 / 29 v1.0
8. Hadoop 2 Migration (Q2-Q4 2013)
Phase 1 :
Testing
Phase 3 :
Production
Phase 2 :
Semi production
➔ Apache 2.0.3 branch
➔ New Hardware*, New
OS and JVM
➔ Benchmarks and user
jobs (lots of them…)
➔ Dependent
component updates
➔ Data movement
between different
versions
➔ Metrics, Alerts and tools
➔ Production use cases
running in 2 clusters in
parallel.
➔ Tuning/parameter updates
and learnings
➔ Started contributing fixes
back to community
➔ Educating users about new
version and changes
➔ Benefits of Hadoop 2
➔ Stable Apache 2.0.5
release with many
fixes and backports
➔ Multiple internal
releases
➔ Template for new
clusters
➔ Ready to roll Apache
2.3 release
*http://www.slideshare.net/Hadoop_Summit/hadoop-hardware-twitter-size-does-matter
@twitterhadoop
8 / 29 v1.0
9. CPU Utilization
Hadoop 1 CPU
Utilization for
one day. (45%
peaks)
Hadoop 2 CPU
Utilization for
one day. (85%
peaks)
@twitterhadoop
9 / 29 v1.0
11. Migration Challenge: web-based FS
Need a web-based FS to deal with H1/H2 interactions
● Hftp based on cross-DC LogMover experience
● Apps broken due to no FNF on non-existing paths
HDFS-6143
● Faced challenges cross-version checksums
@twitterhadoop
11 / 29 v1.0
12. Migration Challenge: hard-coded FS
1000’s of occurrences hdfs://${NN}/path and absolute URIs
● For cluster1 dial hdfs://hadoop-cluster1-nn.dc CNAME
● For cluster2 dial …
Ideal: use logical paths and viewfs as defaultFS
More realistic and faster:
● HDFSCompatibleViewFS HADOOP-9985
@twitterhadoop
12 / 29 v1.0
13. Migration Challenge: Interoperability
Migration in progress: H1 job requires input from H2
● hftp://OMGwhatNN/has/my/path problem
● ideal: use viewfs on H1 resolving to correct H2-NN
● realistic: see above “hardcoded FS”
● Even if you know OMGwhatNN, is it active?
@twitterhadoop
13 / 29 v1.0
14. StandbyActive
Cluster
CNAME
H1 client
Active Standby Active Standby
Load client-side mounttable on
the server side:
1. redirect to the right
namespace
2. redirect to active within
namespace
@twitterhadoop
14 / 29 v1.0
15. Migration: Tools and Ecosystem
● Port/recompile/package:
o Data Access Layer/HCatalog,
o Pig,
o Cascading/Scalding
o ElephantBird
o hadoop-lzo
● PIG-3913 (local mode counters),
● Analytics team fixed PIG-2888 (performance)
● hRaven fixes:
o translation between slot_millis and mb_millis
@twitterhadoop
15 / 29 v1.0
16. HadOops found and fixed
● ViewFS can’t be used for public DistributedCache (DC)
o HADOOP-10191, YARN-1542
● getFileStatus RPC storm on public DC:
o YARN-1771
● No user-specified progress string in MR-AM UI task
o MAPREDUCE-5550
● Uberized jobs for scheduling small jobs great but ...
o can you kill them? MAPREDUCE-5841
o size correctly for map-only? YARN-1190
@twitterhadoop
16 / 29 v1.0
17. More HadOops
Incident: a job blacklists nodes by logging terabytes
● need capping, but userlog.limit.kb loses valuable log tail
● RollingFileAppender for MR-AM/tasks MAPREDUCE-
5672
@twitterhadoop
17 / 29 v1.0
18. Diagnostics improvement
App/Job/Task kill:
● DAG processors/users can say why
o MAPREDUCE-5648, YARN-1551
● MR-AM: “speculation”, “reducer preemption”
o MAPREDUCE-5692, MAPREDUCE-5825
● Thread Dumps
o On task timeout: MAPREDUCE-5044
o On demand from CLI/UI: MAPREDUCE-5784, ...
@twitterhadoop
18 / 29 v1.0
19. UX/UI improvements
● NameNode state and cluster stats
● App size in MB on RM Apps Page
● RM Scheduler UI improvements: queue descriptions,
bugs min/max resource calc.
● Task Attempt state filtering in MR-AM
HDFS-5928, YARN-1945, HDFS-5296...
@twitterhadoop
19 / 29 v1.0
20. YARN reliability improvements
● Unhealthy nodes / positive feedback
o drain containers instead of killing: YARN-1996
o don’t rerun maps when all reduces committed: MAPREDUCE-5817
● RM crashes JIRA fixed either just internally or public
o YARN-351, YARN-502
@twitterhadoop
20 / 29 v1.0
21. MapReduce usability
● Memory.mb as a single tunable: Xmx, sort.mb auto-set
o mb is optimized on case-by-case basis
o MAPREDUCE-5785
● Users want newer artifacts like guava: job.classloader
o MAPREDUCE-5146 / 5751 / 5813 / 5814
● Help users debug
o thread dump on timeout, and on demand via UI
o educate users about heap dumps on OOM and java profiling
@twitterhadoop
21 / 29 v1.0
22. Multi-DC environment
MR clients across latency boundaries. Submit fast:
● moving split calculation to MR-AM: MAPREDUCE-207
DSCP bit coloring for DataXfer
● HDFS-5175
● Hftp (switched to Apache Commons HttpClient)
DataXfer throttling (client RW)
22 / 29 v1.0
23. YARN: Beyond Java & MapReduce
● MR-AM and other REST API’s across the stack for easy
integration in non-JVM tools.
● Vowpal Wabbit: (production)
o no extra spanning tree step
● Spark (semi-production)
@twitterhadoop
23 / 29 v1.0
24. Ongoing Project: Shared Cache
MapReduce function shipping: computation->data
● Teams have jobs with 100’s of jars uploaded via libjars
o Ideal: manage a jar repo on HDFS
o Reference jars via DistributedCache instead of uploading
o Real: currently hard to coordinate
● YARN-1492: Manage artifacts cache transparently
● Measure it:
o YARN-1529: Localization overhead/cache hits NM metrics
o MAPREDUCE-5696: Job localization counters
@twitterhadoop
24 / 29 v1.0
25. Upcoming Challenges
● Reduce ops complexity:
o grow to 10K+-node clusters
o try to avoid adding more clusters
● Scalability limits for NN, RM
● NN heap sizes: large Java heap vs namespace splitting
● RPC QoS Issues
● NN startup: long initial block report processing
● Integrating non-MR frameworks with hRaven
@twitterhadoop
25 / 29 v1.0
26. Future Work Ideas
● Productize RM HA and work-preserving restart
● HDFS Readable Standby NN
● Whole DAG in a single NN namespace
● Contribute to HDFS-5477 - Dedicated BM service
● NN SLA: fairshare for RPC queues: HADOOP-10598
● Finer lock granularity in NN
@twitterhadoop
26 / 29 v1.0
27. Summary: Hadoop 2 @ Twitter
● No JT bottleneck: Lightweight RM + MR-AM
● High compute density with flexible slots
● Reduced NN bottleneck using Federation
● HDFS HA removes the angst to try out new NN configs
● Much closer to upstream to consume/contribute fixes
o Development on 2.3 branch
● Adopting new frameworks on YARN
@twitterhadoop
27 / 29 v1.0
28. Conclusion
Migrating 1000+ users/use cases is anything but trivial
… however,
● Hadoop 2 made it worthwhile
● Hadoop 2 contributions:
o 40+ patches committed
o ~40 in review
@twitterhadoop
28 / 29 v1.0
29. Thank you! Questions
@JoinTheFlock about.twitter.com/careers
@TwitterHadoop
Catch up with us in person
@LohitVijayaRenu
@GeraShegalov
@twitterhadoop
29 / 29 v1.0
Editor's Notes
With scale and growth like this, twitter faced different kind of challenges with Hadoop 1.JT used to run >20K jobs per day.
JobTracker caches number of jobs per users and does not take into account size of job. Frequent JT full GCs.
Reasoning behind why Twitter had to chose different namespaces. As of now all Datanodes talk to all NameNodes, we have been thinking about different combinations where subset of DataNodes can talk to different namespaces as well.
We had decided to build new Hadoop 2 clusters instead of worrying about migrating/upgrading Hadoop 1 clusters. Saved huge downtime issues. Around phase two is when users started seeing benefits of moving to Hadoop 2. Simple fixes when long way helping lots of customers.