SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Big Data CDR Analyzer



Project Supervisors-            080201N – M.K.P.R. Jayawardhana
Mr. Thilina Anjitha – hSenid
                                080254D – P.K.A.M. Kumara
Dr.Shahani Markus Weerawarana
                                080331L – W.D.A.I. Paranawithana
                                080357V – T.D.K. Perera
Overview
•   Background
•   Current Situation
•   Scope and Assumptions
•   Kanthaka – big data CDR Analyzer System
•   Technology Comparison
       - Map Reduce
       - No SQL Databases
•   Architecture
•   Project Plan
•   Risks and Possible Remedies
•   References
Background
Mobile Promotions
Current Situation
• Promotions based only on their network usage
• Use only active call switch for triggering
  promotions
• No way of analyzing and processing high
  volume CDR records
• No efficient CDR analyzing method
• No access to historical data
• Complex rules not supported
                                             &@$*
                                             #
to rescue
• Selecting eligible users for both commercial
  organizations based and network usage based
  promotions.
  Eg- giving 20% discount for pizza lovers within age group 16-40 who
     have called pizza hut more than 5 times a month
• High volume CDR analysis.
• Near real time selection of eligible users for
  promotions.
• CDR Analyzer system which
 ▫ can process 30 million records per day
 ▫ can produce results within 10-15 seconds
 ▫ provides a GUI to define dynamic rules
 ▫ can be used to offer real-time sales promotions
   for mobile subscribers
Scope and Assumptions
Scope




  30 M                    30 M
  Multiple Rules          Single Rule
  Offer Promotion         Select eligibilities
                            for promotion only

  Real system operation    Operation expect by Kanthaka
Assumptions

• CDR records can be only in .CSV format.
• Event type can be in different types like SMS,
  Voice call, MMS, USSD, Top-up, GPRS, LBS.
• CDR can be received as batches to the system
  asynchronously.
• Only 6 attributes out of many attributes will be
  considered during processing.
Technology Comparison
Lot of data + higher speed
                  --> Scale out system
Map Reduce
  Hadoop map-reduce
 • Can handle lot of data
 • Latency is high that not suitable where results are expected in near real time




To count words of size of 100KB file
          Start time                 = 01.04.44
          End time                   =01.05.12
          Total time                 = 28 sec
DB Technology Comparison

• RDMS
 ▫   Provide ACID properties
 ▫   Use sharding to scale up
 ▫   Managing overhead is huge in scaling up
 ▫   Performance degrade with higher data load
 ▫   Less partition tolerant
DB Technology Comparison Ctd.

• NoSQL
 ▫ Lot of available options(Cassandra, HBase,
   MongoDB, Hive)
 ▫ Promised easy scale up(Lot of big users –
   Facebook, Twitter)
 ▫ Provide BASE properties under CAP theorem
 ▫ Hard to model the system into limited data model
 ▫ Partition tolerant
 ▫ More memory --> Higher performance
DB Technology Comparison Ctd.
• NewSQL
 ▫   Provide ACID properties
 ▫   Familiar relational data model
 ▫   Options available(ScaleDB, VoltDB)
 ▫   Totally run on memory, hence need lot of memory
 ▫   Promised speed
 ▫   Persistency achieved by replaying logs
With persistency, less restricted hardware,
           proven performance,
        best to try out is NoSQL.

• Cassandra – a key-value pair column family
  store(Used at Facebook, Twitter, eBay)
• HBase – a key value pair column family store
  (Facebook)
• MongoDB – document store(Adobe)
• Hive – HDFS based database
YCSB Benchmarks




• With more big users, active mailing lists, most
  promising technologies (secondary index,
  counters) best to try out is Cassandra.
Technology selection
Technologies left behind         Technologies selected

• Complex Event Processing       • NoSQL DB - Cassandra
  engines(CEP)
  ▫ No persistency
• Rules Engine
  ▫ More layers  More latency
• Hadoop
• NoSQL DB- Hbase, MongoDB,
  Hive
Architecture
Project Plan
Milestones                              Target date   Status
First chapters of final report                -       Done
ERU abstracts                                 -       Accepted
ERU Paper                               31/07/2012    Due
Architecture                            06/06/2012    Done
Setting up the Cassandra cluster        06/06/2012    Done
GUI for rule define                     15/06/2012    On going
Bulk data load to Cassandra             15/06/2012    On going
System Requirement Specification        20/06/2012    Due
Query data from database periodically   26/06/2012    Due
Initial Design Document                 27/06/2012    Due
Algorithm for Pre-processing            10/07/2012    Due
Testing                                 10/07/2012    Due
Final report                            10/08/2012    Due
Risks and Possible
Remedies

• NoSQL databases
  High performance More memory
Use an external cluster with descent memory

• In the long run
  Performance degrade  More data
Archiving
• Concurrency issues handling
  Low speed  Locking database
Use shadow copy

• NoSQL fails to achieve requirements
  Options :
  NewSQL– VoltDB (totally run on memory)
  CEP (Need actions to preserve persistency )

• Handling sudden peaks
  Should have an auto balancing mechanism ready
Final Deliverables
• Big Data CDR Analyzer system
• Research Paper
• Final Report
References

• http://www.slideshare.net/gvdinesh/cap-and-
  base-8169489
• B. F. Cooper, A. Silberstein, E. Tam, R.
  Ramakrishnan, and R. Sears, “Benchmarking
  cloud serving systems with YCSB,” 2010, pp.
  143–154.

Visit us at Kanthaka
Thank You!

Más contenido relacionado

Destacado

Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Armando Vieira
 
MARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEMARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEhdalkie
 
TOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsTOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsMOTC Qatar
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics pptvineeth menon
 
Road map for_education_results(ccer)_may
Road map for_education_results(ccer)_mayRoad map for_education_results(ccer)_may
Road map for_education_results(ccer)_maysremala
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationRising Media Ltd.
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersDataWorks Summit
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
 
Inventory Control Final Ppt
Inventory Control Final PptInventory Control Final Ppt
Inventory Control Final Pptrajnikant
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?John Spencer
 

Destacado (18)

Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning
 
Social Media Road Map
Social Media Road MapSocial Media Road Map
Social Media Road Map
 
MARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEMARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDE
 
Ativ1 4 rafaelaam
Ativ1 4 rafaelaamAtiv1 4 rafaelaam
Ativ1 4 rafaelaam
 
TOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsTOP TEN Road and Travel Apps
TOP TEN Road and Travel Apps
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics ppt
 
Data Science Strategy
Data Science StrategyData Science Strategy
Data Science Strategy
 
Road map for_education_results(ccer)_may
Road map for_education_results(ccer)_mayRoad map for_education_results(ccer)_may
Road map for_education_results(ccer)_may
 
Big Data Telecom
Big Data TelecomBig Data Telecom
Big Data Telecom
 
A Road Map To Perfect Duplication
A Road Map To Perfect DuplicationA Road Map To Perfect Duplication
A Road Map To Perfect Duplication
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Inventory Control
Inventory ControlInventory Control
Inventory Control
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Inventory Control Final Ppt
Inventory Control Final PptInventory Control Final Ppt
Inventory Control Final Ppt
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?
 

Similar a Kanthaka - High Volume CDR Analyzer

Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemSerendio Inc.
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraNenad Bozic
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemDataWorks Summit
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 

Similar a Kanthaka - High Volume CDR Analyzer (20)

Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Oracle Big Data Cloud service
Oracle Big Data Cloud serviceOracle Big Data Cloud service
Oracle Big Data Cloud service
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache Cassandra
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data engineering
Data engineeringData engineering
Data engineering
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop Ecosystem
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 

Más de Pushpalanka Jayawardhana

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous systemPushpalanka Jayawardhana
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand Pushpalanka Jayawardhana
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity busPushpalanka Jayawardhana
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsPushpalanka Jayawardhana
 
Approximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningApproximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningPushpalanka Jayawardhana
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gatewayPushpalanka Jayawardhana
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systemsPushpalanka Jayawardhana
 

Más de Pushpalanka Jayawardhana (10)

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous system
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand
 
Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity bus
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread Pools
 
Approximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningApproximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule Mining
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gateway
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systems
 
Experience at WSO2 as an Intern
Experience at WSO2 as an InternExperience at WSO2 as an Intern
Experience at WSO2 as an Intern
 
Cosmology in general
Cosmology in generalCosmology in general
Cosmology in general
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Kanthaka - High Volume CDR Analyzer

  • 1. Big Data CDR Analyzer Project Supervisors- 080201N – M.K.P.R. Jayawardhana Mr. Thilina Anjitha – hSenid 080254D – P.K.A.M. Kumara Dr.Shahani Markus Weerawarana 080331L – W.D.A.I. Paranawithana 080357V – T.D.K. Perera
  • 2. Overview • Background • Current Situation • Scope and Assumptions • Kanthaka – big data CDR Analyzer System • Technology Comparison - Map Reduce - No SQL Databases • Architecture • Project Plan • Risks and Possible Remedies • References
  • 4. Current Situation • Promotions based only on their network usage • Use only active call switch for triggering promotions • No way of analyzing and processing high volume CDR records • No efficient CDR analyzing method • No access to historical data • Complex rules not supported &@$* #
  • 5. to rescue • Selecting eligible users for both commercial organizations based and network usage based promotions. Eg- giving 20% discount for pizza lovers within age group 16-40 who have called pizza hut more than 5 times a month • High volume CDR analysis. • Near real time selection of eligible users for promotions.
  • 6. • CDR Analyzer system which ▫ can process 30 million records per day ▫ can produce results within 10-15 seconds ▫ provides a GUI to define dynamic rules ▫ can be used to offer real-time sales promotions for mobile subscribers
  • 7. Scope and Assumptions Scope  30 M  30 M  Multiple Rules  Single Rule  Offer Promotion  Select eligibilities for promotion only Real system operation Operation expect by Kanthaka
  • 8. Assumptions • CDR records can be only in .CSV format. • Event type can be in different types like SMS, Voice call, MMS, USSD, Top-up, GPRS, LBS. • CDR can be received as batches to the system asynchronously. • Only 6 attributes out of many attributes will be considered during processing.
  • 10. Lot of data + higher speed --> Scale out system
  • 11. Map Reduce Hadoop map-reduce • Can handle lot of data • Latency is high that not suitable where results are expected in near real time To count words of size of 100KB file Start time = 01.04.44 End time =01.05.12 Total time = 28 sec
  • 12. DB Technology Comparison • RDMS ▫ Provide ACID properties ▫ Use sharding to scale up ▫ Managing overhead is huge in scaling up ▫ Performance degrade with higher data load ▫ Less partition tolerant
  • 13. DB Technology Comparison Ctd. • NoSQL ▫ Lot of available options(Cassandra, HBase, MongoDB, Hive) ▫ Promised easy scale up(Lot of big users – Facebook, Twitter) ▫ Provide BASE properties under CAP theorem ▫ Hard to model the system into limited data model ▫ Partition tolerant ▫ More memory --> Higher performance
  • 14. DB Technology Comparison Ctd. • NewSQL ▫ Provide ACID properties ▫ Familiar relational data model ▫ Options available(ScaleDB, VoltDB) ▫ Totally run on memory, hence need lot of memory ▫ Promised speed ▫ Persistency achieved by replaying logs
  • 15. With persistency, less restricted hardware, proven performance, best to try out is NoSQL. • Cassandra – a key-value pair column family store(Used at Facebook, Twitter, eBay) • HBase – a key value pair column family store (Facebook) • MongoDB – document store(Adobe) • Hive – HDFS based database
  • 16. YCSB Benchmarks • With more big users, active mailing lists, most promising technologies (secondary index, counters) best to try out is Cassandra.
  • 17. Technology selection Technologies left behind Technologies selected • Complex Event Processing • NoSQL DB - Cassandra engines(CEP) ▫ No persistency • Rules Engine ▫ More layers  More latency • Hadoop • NoSQL DB- Hbase, MongoDB, Hive
  • 19. Project Plan Milestones Target date Status First chapters of final report - Done ERU abstracts - Accepted ERU Paper 31/07/2012 Due Architecture 06/06/2012 Done Setting up the Cassandra cluster 06/06/2012 Done GUI for rule define 15/06/2012 On going Bulk data load to Cassandra 15/06/2012 On going System Requirement Specification 20/06/2012 Due Query data from database periodically 26/06/2012 Due Initial Design Document 27/06/2012 Due Algorithm for Pre-processing 10/07/2012 Due Testing 10/07/2012 Due Final report 10/08/2012 Due
  • 20. Risks and Possible Remedies • NoSQL databases High performance More memory Use an external cluster with descent memory • In the long run Performance degrade  More data Archiving
  • 21. • Concurrency issues handling Low speed  Locking database Use shadow copy • NoSQL fails to achieve requirements Options : NewSQL– VoltDB (totally run on memory) CEP (Need actions to preserve persistency ) • Handling sudden peaks Should have an auto balancing mechanism ready
  • 22. Final Deliverables • Big Data CDR Analyzer system • Research Paper • Final Report
  • 23. References • http://www.slideshare.net/gvdinesh/cap-and- base-8169489 • B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” 2010, pp. 143–154. Visit us at Kanthaka