SlideShare una empresa de Scribd logo
1 de 18
Leveraging Hadoop Cluster for Carrier grade application




                             Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012
No Personalization


Service
discovery




                      Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   2
   600- 800 GB of CDR per day
                    ◦   GPRS Signaling 50GB/day
                    ◦   3G Signaling 300GB/day
                    ◦   Voice 100GB/day
                    ◦   SMS 200GB/day
                   100 - 200 GB/day of Web Data



Mammoth Data
                                         Data Analysis




               Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   3
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   4
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   5
   Framework for distributed processing of large data sets
    across clusters
   Consists of
    ◦ Hadoop Distributed File System aka HDFS (File system)
    ◦ Hadoop MapReduce (programming model )
   Characteristics
    ◦ Performance shall scale linearly
    ◦ Compute should move to data
    ◦ Simple core, Modular and Extensible



                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   6
   Current Bottleneck

    ◦ Data resides in multiple nodes/zones/VM instance & no elegant,
      reliable and efficient way of extracting data

    ◦ Loading terabytes of data into database is slow

    ◦ Parallel computing not a possibility in Conventional BI ETL

    ◦ User profile and application data resides in DB which can scale
      only vertically




                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   7
   Structured Data



         sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as-
          sequencefile



   Un Structured Data




                                        Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   8
   A Distributed data Collection server
    ◦   Scalable
    ◦   Configurable
    ◦   Extensible
    ◦   Manageable


   Built around the concept of flows
    ◦ A single flow corresponds to a type of data source
    ◦ Supports compression, batching & reliability setups per flow


   Data come in through a source
    ◦ Optionally processed by one or more decorators
    ◦ And transmitted out via sink




                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   9
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   10
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   11
   Map Reduce is very powerful, but:
    ◦ It requires a Java programmer
    ◦ User has to re-invent common
    ◦ functionality (join, filter, etc.)

   Execution engine atop Hadoop

   Pig provides a higher level language Pig Latin

   Opens the system to non-Java programmers

   Provides common operations like join, group, filter, sort




                                       Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   12
   Web log processing.
   Data processing for web search platforms.
   Ad hoc queries across large data sets.
   Rapid prototyping of algorithms for processing large data
    sets.
   Pig runs on local machine and job gets executed in hadoop
    cluster
       $ cd /usr/share/cloudera/pig/
       $ bin/pig –x local
       grunt>
           Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);
           grpd = GROUP log BY user;
           cntd = FOREACH grpd GENERATE group, COUNT(log);
           STORE cntd INTO ‘output’;




                                        Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   13
   System for querying and managing structured data
   Built on top of hadoop
   Uses map reduce for execution
   SQL like syntax; supports
    ◦   From clause subquery
    ◦   ANSO Join (equi join )
    ◦   Multi-table insert
    ◦   Multi group-by
    ◦   Sampling
    ◦   Object traversal
   Engagement
    ◦ Summarization
    ◦ Ad hoc analysis
    ◦ Spam detection



                                 Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   14
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   15
Feature                          Hive                              Pig
Language                         SQL-like                          PigLatin
Schemas/Types                    Yes (explicit)                    Yes (implicit)
Partitions                       Yes                               No
Server                           Optional(thirft)                  No
User Defined Functions           Yes                               Yes
Custom Serializer/Deserializer   Yes                               Yes
DFS Direct Access                Yes (implicit)                    Yes (explicit)
Join/Order/Sort                  Yes                               Yes
Shell                            Yes                               Yes
Streaming                        Yes                               No
Web Interface                    Yes                               No
JDBC/ODBC                        Yes (limited)                     No




                                       Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   16
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   17
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   18

Más contenido relacionado

Similar a Hadoop for carrier

Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
Flytxt
 

Similar a Hadoop for carrier (20)

Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
An Introduction to Spring Data
An Introduction to Spring DataAn Introduction to Spring Data
An Introduction to Spring Data
 
GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production Environments
 
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
 
Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
HTML5 WebSocket Introduction
HTML5 WebSocket IntroductionHTML5 WebSocket Introduction
HTML5 WebSocket Introduction
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearning
 
Flume intro-100717
Flume intro-100717Flume intro-100717
Flume intro-100717
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
 

Más de Flytxt

Más de Flytxt (12)

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochure
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experience
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital age
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing Engagement
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analytics
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochure
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experience
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Hadoop for carrier

  • 1. Leveraging Hadoop Cluster for Carrier grade application Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012
  • 2. No Personalization Service discovery Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 2
  • 3. 600- 800 GB of CDR per day ◦ GPRS Signaling 50GB/day ◦ 3G Signaling 300GB/day ◦ Voice 100GB/day ◦ SMS 200GB/day  100 - 200 GB/day of Web Data Mammoth Data Data Analysis Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 3
  • 4. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 4
  • 5. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 5
  • 6. Framework for distributed processing of large data sets across clusters  Consists of ◦ Hadoop Distributed File System aka HDFS (File system) ◦ Hadoop MapReduce (programming model )  Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data ◦ Simple core, Modular and Extensible Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 6
  • 7. Current Bottleneck ◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data ◦ Loading terabytes of data into database is slow ◦ Parallel computing not a possibility in Conventional BI ETL ◦ User profile and application data resides in DB which can scale only vertically Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 7
  • 8. Structured Data  sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as- sequencefile  Un Structured Data Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 8
  • 9. A Distributed data Collection server ◦ Scalable ◦ Configurable ◦ Extensible ◦ Manageable  Built around the concept of flows ◦ A single flow corresponds to a type of data source ◦ Supports compression, batching & reliability setups per flow  Data come in through a source ◦ Optionally processed by one or more decorators ◦ And transmitted out via sink Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 9
  • 10. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 10
  • 11. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 11
  • 12. Map Reduce is very powerful, but: ◦ It requires a Java programmer ◦ User has to re-invent common ◦ functionality (join, filter, etc.)  Execution engine atop Hadoop  Pig provides a higher level language Pig Latin  Opens the system to non-Java programmers  Provides common operations like join, group, filter, sort Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 12
  • 13. Web log processing.  Data processing for web search platforms.  Ad hoc queries across large data sets.  Rapid prototyping of algorithms for processing large data sets.  Pig runs on local machine and job gets executed in hadoop cluster  $ cd /usr/share/cloudera/pig/  $ bin/pig –x local  grunt>  Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);  grpd = GROUP log BY user;  cntd = FOREACH grpd GENERATE group, COUNT(log);  STORE cntd INTO ‘output’; Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 13
  • 14. System for querying and managing structured data  Built on top of hadoop  Uses map reduce for execution  SQL like syntax; supports ◦ From clause subquery ◦ ANSO Join (equi join ) ◦ Multi-table insert ◦ Multi group-by ◦ Sampling ◦ Object traversal  Engagement ◦ Summarization ◦ Ad hoc analysis ◦ Spam detection Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 14
  • 15. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 15
  • 16. Feature Hive Pig Language SQL-like PigLatin Schemas/Types Yes (explicit) Yes (implicit) Partitions Yes No Server Optional(thirft) No User Defined Functions Yes Yes Custom Serializer/Deserializer Yes Yes DFS Direct Access Yes (implicit) Yes (explicit) Join/Order/Sort Yes Yes Shell Yes Yes Streaming Yes No Web Interface Yes No JDBC/ODBC Yes (limited) No Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 16
  • 17. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 17
  • 18. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 18