Reverse aging has been a subject of ambiguity and curiosity amongst Hollywood and in the flights of fantasies of Fitzgerald. Hadoop at Verizon Wireless has been a interesting case study, both from a scale and adoption perspective. Technology adoption typically follows a linear progressive curve with time comprising of feature additions, bug fixes, upgrades, etc. In this case study we examine a case of Hadoop adoption that oscillates in a space-time continuum exhibiting characteristics of traditional growth patterns in addition to reverse aging.
The use case highlights the factors, causes, and impacts that can cause such a extraordinary phenomenon to be commonplace in any environment. The conditions leading to this phenomena might vary for different use cases, industries, and environments. This use case discusses and highlights the technical aspects leading to the ultimate path to technical redemption, which in turn engineers a well designed and performance tuned infrastructure for continuous productivity. SHIVINDER SINGH, Distinguished Member Technical Staff, Verizon
2. 2
About Verizon
The best, most reliable networks in the industry
The largest U.S. wireless company with the largest
4G LTE network
The largest and fastest all-fiber network in the U.S.
One of the largest, most reliable and secure global
networks
Using technology to address big challenges
Verizon Innovation Center in San Francisco, CA
3. 3
Dedicated Corporate Citizen
Creating a platform for long-term growth for our
customers, shareowners and society
Using our talent and technology to address
society’s biggest challenges
Focusing on finding new ways our technology can
improve healthcare, education and energy
management
Focusing our philanthropic resources on becoming
a channel for innovation and social change
Applying innovative technology to social issues
4. 4
Big Data in the Enterprise
As the enterprise masters Big Data, it will become part of the enterprise solution framework
6. 6
Effective strategies answer three key questions:
How will we
Deliver value?
How will we
Create value?
How will we
Capture value?
7. 7
Unix Inode Management
mode
owners (2)
timestamps (3)
size block
count
direct blocks
single indirect
double indirect
triple indirect
data
data
data
data
data
data
data
data
data
data
8. 8
Block Size comparison Data lake vs Single Client
DATA LAKE TOP 20
DB Size
(GB)
DB Name Total Files Total Blocks Average Block
Size (bytes)
328,807 /apps/hive/warehouse/prd1.db 32,461,500 30,283,722 11,678,898
180,361 /apps/hive/warehouse/prd2.db 7,030,688 6,568,455 29,498,992
114,237 /apps/hive/warehouse/prd3db 7,218,443 7,663,817 16,004,037
113,144 /apps/hive/warehouse/prd4.db 2,041,641 2,830,226 42,925,340
42,535 /apps/hive/warehouse/prd5.db 169,111 504,297 90,567,016
30,615 /apps/hive/warehouse/prd6.db 86,923 297,950 110,331,894
21,433 /apps/hive/warehouse/prd7.db 637,283 730,173 31,520,262
21,401 /apps/hive/warehouse/prd8.db 29,971 188,875 121,668,441
11,564 /apps/hive/warehouse/prd9.db 30,873 110,838 119,432,578
11,184 /apps/hive/warehouse/prd10.db 157,975 196,467 61,127,078
10,301 /apps/hive/warehouse/prd11.db 9,713,823 8,953,109 1,236,123
8,972 /apps/hive/warehouse/prd12.db 20,236 80,666 119,426,068
8,711 /apps/hive/warehouse/prd13.db 352,294 390,780 23,994,662
8,359 /apps/hive/warehouse/prd14.db 21,175 70,756 126,829,445
7,920 /apps/hive/warehouse/prd15.db 1,316,631 1,215,234 7,017,294
5,843 /apps/hive/warehouse/prd16.db 1,055,270 468,010 13,406,724
5,829 /apps/hive/warehouse/prd17.db 552,918 486,693 12,881,117
5,669 /apps/hive/warehouse/prd18.db 1,605 46,147 131,925,260
5,652 /apps/hive/warehouse/prd19.db 5,362,238 5,360,747 1,135,249
987 /apps/hive/warehouse/prd20.db 565,537 571,859 1,854,672
Single Client
DB Size
(GB)
DB Name Total Files Total Blocks Average Block
Size (bytes)
315,866 /apps/hive/warehouse/prd.db 2,245,257 2,574,897 131,717,734
9. 9
Small File Namenode Impact
High GC pauses
High RPC running into minutes
Cluster Unresponsive
Jobs stalled
Full downtime
10. 10
The S-curve Maps Major Transitions
Performance
Time
Ferment
Takeoff
Maturity Reverse Aging
12. 12
Root Cause and fix
Deep dive for 40 data lakes clients
Review of 456 Databases
Review of 373,083 tables
Review of 5K jobs
Fix
Reduce job frequency
Block size parameters for hive and yarn
Zookeeper tuning
15. 15
Other considerations
ZK is most critical components
Numerous third party components
Znodes being written outside of HDP components
ZK image size 10 gb
5 M znodes
Fix
Targeted purge of znodes to 100 K
Znode image size down to 100 Mb
Ongoing ZK tuning
16. 16
Stack Selection
Physical limit?
Performance is ultimately constrained
by physical limits
E.g.:
Sailing ships & the power of the wind
Copper wire & transmission capability
Semiconductors & the speed of the electron
Performance
Time
17. 17
Once Upon a Time There Was a Inode…
• Redemption…
Andy Dufresne: ”He's a phantom, an apartition, second cousin
to Harvey the Rabbit.”
Unix Kernel is a basic !
Packaging changes, basic remains the same
Small files a technology limitation
Data Democracy can be boon or a bane
Issues are platform agnostic
18. 18
Q & A
You can reach us at
shivinder.singh@vzw.com
Go to www.verizon.com/about/ for more information and news
about our company, social responsibility, investor relations and
careers.