SlideShare una empresa de Scribd logo
1 de 16
Gaining Support for Hadoop
in a Large Corporate
Environment
Tuesday, June 3, 2014
Hadoop for Business Apps, Hadoop Summit
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Overview.
2
• Create the team
- Who are We
• Research challenge.
• Evaluate the data
- Resource Evaluation
• What did we learn?
- New Analytics
- New Benefits
- New Data
- New Infrastructure
• How did we move out of Research and into the Enterprise?
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
About Me.
3
• Jennifer Lim has over 14 years of experience in large enterprise data warehousing and
analytics. Most recently, she was a Research Scientist for the Sprint Advanced Analytics Lab
and is now acting as a Lead Technology Architect, focusing on upgrading the enterprise
analytics infrastructure in support of all those great use cases being discovered in the research
lab. She has an MBA from Avila University, with a BS from Iowa State University.
Jennifer.Lim@sprint.com
• Sprint is widely recognized for developing, engineering and deploying innovative technologies,
including the first wireless 4G service from a national carrier in the United States; offering
industry-leading mobile data services, leading prepaid brands including Virgin Mobile USA,
Boost Mobile, and Assurance Wireless; instant national and international push-to-talk
capabilities; and a global Tier 1 Internet backbone. www.sprint.com
About Sprint.
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
The Team.
Advanced Analytics Lab
4
• The CTO took a team focused on Network Technology Research and refocused them onto the
new “gold”: Data.
• Data Research Scientists and RF Engineers engaged in
- Mobile Internet Research
• Security & Privacy
• Location: location accuracy, population estimation
• Social Connection: social networks, influence, churn
- Network Research
• Wireless and IP Networks
• Wireless and wireline security: fraud prevention
- Architecture Research
• Performing data platform & tool evaluations
- Prototype Development
• Use Case Development
• Demonstration of new technologies & capabilities
Summer 2011
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Our Journey…
5
 from data optimization
 to a research idea
 to a realization - was our data in the right place?
 to developing a Hadoop-based analysis environment
 to enhancing the technical capabilities of the enterprise data
warehouse
…to create Actionable Insights.
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Historically –
Data utilized for Optimization Tasks.
6
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
The Research Challenge.
7
XDRs
Voice
Texting
IP
Video
Websites Visited
Location
Applications Used
Social Networks
Calls & Texts
Find Insights
Available
No Where Else
Find New Use Cases
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Proof of Concept.
8
Transition --- from optimizing to asking questions about the data
October 2011
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Prototype Infrastructure.
9
• Current Enterprise infrastructure couldn’t be used to build the prototypes
- No formal IT project, so we couldn’t use IT resources.
- We didn’t have the funding to buy the latest & greatest.
- We needed something that could store a lot of data without a lot of prep.
- We wanted to experiment.
• Current Lab infrastructure couldn’t be used to build the prototypes
- Network focused
- File based, focused on finding specific traffic in same geo-location
• Look around, found some servers, dusted them off…grabbed open source Hadoop.
- 5 TBs, our servers were all memory & no disc
- 5 data nodes & 1 manager node
What Did We Learn?
New Analytics
New Benefits
New Data
New Infrastructure
10
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Analytics.
11
and creating this…
Finding ways to take network events…
Using Network data to create new Products,
Increase Customer Satisfaction, Attract new
Customers by providing actionable insights to
Customers and Enterprise decision makers
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Benefits. New Data.
12
• Incorporation of new, insightful data sets
• Incorporation of new, specialized business rules
• Geospatial! Techniques
• Examination of new
Business Intelligence
and Visualization tools
Becoming the Advocates & Demonstrators for new analytics
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Infrastructure.
Lab Cluster.
13
• Trials of distributions & server setups.
• Training of internal resources. Big Data User Group.
• Expansion of teams able to run Prototypes on the cluster.
- Usage Based Cost / Finance
- Application data transforms / Product
- Location Accuracy Improvement / Network
- Pathing Analysis / Marketing
- Device Behavior Analysis / Device
- Customer Text Analytics / Care
- ….
- approximately 1 Petabyte, our servers have 4 TB data drives and 256GB RAM
- 30 nodes…23 data nodes, with management nodes & visualization nodes
June 2013
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
New Infrastructure.
Production Cluster.
14
• Standard Visualizations and Analytics Tools Integrated.
• Funding Proven Use Cases.
• IT process & controls related to continuous data loading,
transformations, and reliability.
• Standards established.
• Resources scaled – from a team of 5 supporting the lab cluster to more
than 5 teams responsible for the system.
- Over 2 Petabytes, our servers have 4 TB data drives and 256GB RAM
(same as the lab cluster)
- 52 data nodes, with management nodes & visualization nodes
May 2014
©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted,
confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization.
Enterprise Analytics Architecture.
Changes.
Agile:
• Enable faster development cycle
• Deal with structured & unstructured data
Scalable Hadoop environment:
• Billions of objects, high read/write volume, terabytes / petabytes
• Distribution model & consistency
Partnering Across the Enterprise. Big Data User Group.
• Marketing – Loyalty & Retention
• Network Development & Engineering
• Network Planning & Forecasting
• Finance Accounting
• Product – Consumer Aps & Entertainment
• Product – Messaging & Instant Communications
• Enterprise Architecture
• IT Application Development & Operations
• IT Data Management…
16

Más contenido relacionado

La actualidad más candente

Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
Hortonworks
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
DataWorks Summit
 

La actualidad más candente (20)

Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Connectivity to business outcomes
Connectivity to business outcomesConnectivity to business outcomes
Connectivity to business outcomes
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?IDC Retail Insights - What's Possible with a Modern Data Architecture?
IDC Retail Insights - What's Possible with a Modern Data Architecture?
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Ben Marden - Making sense of Big Data
Ben Marden - Making sense of Big Data Ben Marden - Making sense of Big Data
Ben Marden - Making sense of Big Data
 
Advanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big AnswersAdvanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big Answers
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
Protecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environmentsProtecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environments
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
 

Destacado

Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...
Banking at Ho Chi Minh city
 
David_Amzallag _NFV and the future of the OSS - TMF2013
David_Amzallag  _NFV and the future of the OSS - TMF2013David_Amzallag  _NFV and the future of the OSS - TMF2013
David_Amzallag _NFV and the future of the OSS - TMF2013
David Amzallag
 
2014 Interns Prototypes vFinal
2014 Interns Prototypes vFinal2014 Interns Prototypes vFinal
2014 Interns Prototypes vFinal
Ameya Parab
 
Building A Winning Strategy For Open Source Company Beijing Nov2009
Building A Winning Strategy For Open Source Company Beijing Nov2009Building A Winning Strategy For Open Source Company Beijing Nov2009
Building A Winning Strategy For Open Source Company Beijing Nov2009
OpenSourceCamp
 
Carrier Strategies for Backbone Traffic Engineering and QoS
Carrier Strategies for Backbone Traffic Engineering and QoSCarrier Strategies for Backbone Traffic Engineering and QoS
Carrier Strategies for Backbone Traffic Engineering and QoS
Vishal Sharma, Ph.D.
 

Destacado (16)

Sprint - Cloud Services
Sprint - Cloud ServicesSprint - Cloud Services
Sprint - Cloud Services
 
DPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun RajagopalDPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun Rajagopal
 
StartPoint - Sprint 1
StartPoint - Sprint 1StartPoint - Sprint 1
StartPoint - Sprint 1
 
Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...Migrating to netcool precision for ip networks --best practices for migrating...
Migrating to netcool precision for ip networks --best practices for migrating...
 
David_Amzallag _NFV and the future of the OSS - TMF2013
David_Amzallag  _NFV and the future of the OSS - TMF2013David_Amzallag  _NFV and the future of the OSS - TMF2013
David_Amzallag _NFV and the future of the OSS - TMF2013
 
2014 Interns Prototypes vFinal
2014 Interns Prototypes vFinal2014 Interns Prototypes vFinal
2014 Interns Prototypes vFinal
 
Network Vision Sprint Direct Connect
Network Vision   Sprint Direct ConnectNetwork Vision   Sprint Direct Connect
Network Vision Sprint Direct Connect
 
Building A Winning Strategy For Open Source Company Beijing Nov2009
Building A Winning Strategy For Open Source Company Beijing Nov2009Building A Winning Strategy For Open Source Company Beijing Nov2009
Building A Winning Strategy For Open Source Company Beijing Nov2009
 
Carrier Strategies for Backbone Traffic Engineering and QoS
Carrier Strategies for Backbone Traffic Engineering and QoSCarrier Strategies for Backbone Traffic Engineering and QoS
Carrier Strategies for Backbone Traffic Engineering and QoS
 
Sprint 48 review
Sprint 48 reviewSprint 48 review
Sprint 48 review
 
Rws 120032 final
Rws 120032 finalRws 120032 final
Rws 120032 final
 
The State of Open Source BI Adoption
The State of Open Source BI AdoptionThe State of Open Source BI Adoption
The State of Open Source BI Adoption
 
Sprint 38 review
Sprint 38 reviewSprint 38 review
Sprint 38 review
 
NFV management and orchestration framework architecture
NFV management and orchestration framework architectureNFV management and orchestration framework architecture
NFV management and orchestration framework architecture
 
Case Study: Sprint Monitors Its Mega-Network for Voice/Video/Data Service Ass...
Case Study: Sprint Monitors Its Mega-Network for Voice/Video/Data Service Ass...Case Study: Sprint Monitors Its Mega-Network for Voice/Video/Data Service Ass...
Case Study: Sprint Monitors Its Mega-Network for Voice/Video/Data Service Ass...
 
Latest trends in information technology
Latest trends in information technologyLatest trends in information technology
Latest trends in information technology
 

Similar a Gaining Support for Hadoop in a Large Corporate Environment

Workplace-as-a-Service Presentation
Workplace-as-a-Service PresentationWorkplace-as-a-Service Presentation
Workplace-as-a-Service Presentation
Danny Runnels
 
Single Glass of Pain: See Your World, Maybe You Wish You Hadn't
Single Glass of Pain: See Your World, Maybe You Wish You Hadn'tSingle Glass of Pain: See Your World, Maybe You Wish You Hadn't
Single Glass of Pain: See Your World, Maybe You Wish You Hadn't
Zivaro Inc
 
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-CedarR_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
Richard George
 

Similar a Gaining Support for Hadoop in a Large Corporate Environment (20)

How Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized ProgressHow Verizon Uses Disruptive Developments for Organized Progress
How Verizon Uses Disruptive Developments for Organized Progress
 
MaaS Value Prop
MaaS Value PropMaaS Value Prop
MaaS Value Prop
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Fast Data Overview for Data Science Maryland Meetup
Fast Data Overview for Data Science Maryland MeetupFast Data Overview for Data Science Maryland Meetup
Fast Data Overview for Data Science Maryland Meetup
 
Workplace-as-a-Service Presentation
Workplace-as-a-Service PresentationWorkplace-as-a-Service Presentation
Workplace-as-a-Service Presentation
 
Single Glass of Pain: See Your World, Maybe You Wish You Hadn't
Single Glass of Pain: See Your World, Maybe You Wish You Hadn'tSingle Glass of Pain: See Your World, Maybe You Wish You Hadn't
Single Glass of Pain: See Your World, Maybe You Wish You Hadn't
 
Big Data Expo 2015 - HP Information Management & Governance
Big Data Expo 2015 - HP Information Management & GovernanceBig Data Expo 2015 - HP Information Management & Governance
Big Data Expo 2015 - HP Information Management & Governance
 
The Analytics Continuum
The Analytics ContinuumThe Analytics Continuum
The Analytics Continuum
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Sqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big AppSqrrl March Webinar: How to Build a Big App
Sqrrl March Webinar: How to Build a Big App
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Oracle’s PeopleSoft Interaction Hub
Oracle’s PeopleSoft Interaction HubOracle’s PeopleSoft Interaction Hub
Oracle’s PeopleSoft Interaction Hub
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-CedarR_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
R_George_CAS4329-PS_Fluid_Gallaudet_Sierra-Cedar
 
Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Último (20)

Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 

Gaining Support for Hadoop in a Large Corporate Environment

  • 1. Gaining Support for Hadoop in a Large Corporate Environment Tuesday, June 3, 2014 Hadoop for Business Apps, Hadoop Summit
  • 2. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Overview. 2 • Create the team - Who are We • Research challenge. • Evaluate the data - Resource Evaluation • What did we learn? - New Analytics - New Benefits - New Data - New Infrastructure • How did we move out of Research and into the Enterprise?
  • 3. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. About Me. 3 • Jennifer Lim has over 14 years of experience in large enterprise data warehousing and analytics. Most recently, she was a Research Scientist for the Sprint Advanced Analytics Lab and is now acting as a Lead Technology Architect, focusing on upgrading the enterprise analytics infrastructure in support of all those great use cases being discovered in the research lab. She has an MBA from Avila University, with a BS from Iowa State University. Jennifer.Lim@sprint.com • Sprint is widely recognized for developing, engineering and deploying innovative technologies, including the first wireless 4G service from a national carrier in the United States; offering industry-leading mobile data services, leading prepaid brands including Virgin Mobile USA, Boost Mobile, and Assurance Wireless; instant national and international push-to-talk capabilities; and a global Tier 1 Internet backbone. www.sprint.com About Sprint.
  • 4. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. The Team. Advanced Analytics Lab 4 • The CTO took a team focused on Network Technology Research and refocused them onto the new “gold”: Data. • Data Research Scientists and RF Engineers engaged in - Mobile Internet Research • Security & Privacy • Location: location accuracy, population estimation • Social Connection: social networks, influence, churn - Network Research • Wireless and IP Networks • Wireless and wireline security: fraud prevention - Architecture Research • Performing data platform & tool evaluations - Prototype Development • Use Case Development • Demonstration of new technologies & capabilities Summer 2011
  • 5. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Our Journey… 5  from data optimization  to a research idea  to a realization - was our data in the right place?  to developing a Hadoop-based analysis environment  to enhancing the technical capabilities of the enterprise data warehouse …to create Actionable Insights.
  • 6. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Historically – Data utilized for Optimization Tasks. 6
  • 7. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. The Research Challenge. 7 XDRs Voice Texting IP Video Websites Visited Location Applications Used Social Networks Calls & Texts Find Insights Available No Where Else Find New Use Cases
  • 8. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Proof of Concept. 8 Transition --- from optimizing to asking questions about the data October 2011
  • 9. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Prototype Infrastructure. 9 • Current Enterprise infrastructure couldn’t be used to build the prototypes - No formal IT project, so we couldn’t use IT resources. - We didn’t have the funding to buy the latest & greatest. - We needed something that could store a lot of data without a lot of prep. - We wanted to experiment. • Current Lab infrastructure couldn’t be used to build the prototypes - Network focused - File based, focused on finding specific traffic in same geo-location • Look around, found some servers, dusted them off…grabbed open source Hadoop. - 5 TBs, our servers were all memory & no disc - 5 data nodes & 1 manager node
  • 10. What Did We Learn? New Analytics New Benefits New Data New Infrastructure 10
  • 11. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Analytics. 11 and creating this… Finding ways to take network events… Using Network data to create new Products, Increase Customer Satisfaction, Attract new Customers by providing actionable insights to Customers and Enterprise decision makers
  • 12. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Benefits. New Data. 12 • Incorporation of new, insightful data sets • Incorporation of new, specialized business rules • Geospatial! Techniques • Examination of new Business Intelligence and Visualization tools Becoming the Advocates & Demonstrators for new analytics
  • 13. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Infrastructure. Lab Cluster. 13 • Trials of distributions & server setups. • Training of internal resources. Big Data User Group. • Expansion of teams able to run Prototypes on the cluster. - Usage Based Cost / Finance - Application data transforms / Product - Location Accuracy Improvement / Network - Pathing Analysis / Marketing - Device Behavior Analysis / Device - Customer Text Analytics / Care - …. - approximately 1 Petabyte, our servers have 4 TB data drives and 256GB RAM - 30 nodes…23 data nodes, with management nodes & visualization nodes June 2013
  • 14. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Infrastructure. Production Cluster. 14 • Standard Visualizations and Analytics Tools Integrated. • Funding Proven Use Cases. • IT process & controls related to continuous data loading, transformations, and reliability. • Standards established. • Resources scaled – from a team of 5 supporting the lab cluster to more than 5 teams responsible for the system. - Over 2 Petabytes, our servers have 4 TB data drives and 256GB RAM (same as the lab cluster) - 52 data nodes, with management nodes & visualization nodes May 2014
  • 15. ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Enterprise Analytics Architecture. Changes. Agile: • Enable faster development cycle • Deal with structured & unstructured data Scalable Hadoop environment: • Billions of objects, high read/write volume, terabytes / petabytes • Distribution model & consistency Partnering Across the Enterprise. Big Data User Group. • Marketing – Loyalty & Retention • Network Development & Engineering • Network Planning & Forecasting • Finance Accounting • Product – Consumer Aps & Entertainment • Product – Messaging & Instant Communications • Enterprise Architecture • IT Application Development & Operations • IT Data Management…
  • 16. 16

Notas del editor

  1. As a presenter of advanced analytics proof of concepts to other corporations, I am questioned most frequently on the “how” by my audiences. Not the “how” about the technology or the data we used, but “how” we were able to gain momentum and support in a large corporate enterprise to incorporate new technology and practices in analytics. I will share with you how a major telecommunications company, Sprint, created a research team of just 8 people who were able to infect the Enterprise with new infrastructure, new data, and new analytics and transforming them into new business benefits.   When I speak with other companies on advanced analytics proof of concepts, the focus of their questions skips quickly past the “what” onto the “how” – how did we gain support, how did we find success, how did we decide which technology to select. I will share with you some of the lessons we learned as well as answer many of these questions. This discussion will showcase how Sprint, a major telecommunications company, went from issuing a research challenge to enabling the entire enterprise in the area of analytics. I’ll walk you through how we repurposed an existing team and started with our first Proof of Concept on Hadoop. We are now in the midst of setting up a multi-petabyte enterprise supported Hadoop system with multiple funded projects, are augmenting our research facilities, and have a long list of use case trials in the works.
  2. Capture data “before” it is processed by the Enterprise databases Merge streaming Data with static data from existing databases Include geospatial tools from the start Allow standard query language to allow anyone to access & use Make it easy to create UDFs Use off the shelf hardware and open source where possible Use off the shelf visualization tools