Hadoop for carrier

•Descargar como PPTX, PDF•

0 recomendaciones•1,006 vistas

Flytxt

Harnessing Hadoop for Big Data, Series II

Tecnología

Leveraging Hadoop Cluster for Carrier grade application

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012

No Personalization

Service
discovery

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 2

 600- 800 GB of CDR per day
◦ GPRS Signaling 50GB/day
◦ 3G Signaling 300GB/day
◦ Voice 100GB/day
◦ SMS 200GB/day
 100 - 200 GB/day of Web Data

Mammoth Data
Data Analysis

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 3

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 4

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 5

 Framework for distributed processing of large data sets
across clusters
 Consists of
◦ Hadoop Distributed File System aka HDFS (File system)
◦ Hadoop MapReduce (programming model )
 Characteristics
◦ Performance shall scale linearly
◦ Compute should move to data
◦ Simple core, Modular and Extensible

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 6

 Structured Data

 sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as-
sequencefile

 Un Structured Data

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 8

 A Distributed data Collection server
◦ Scalable
◦ Configurable
◦ Extensible
◦ Manageable

 Built around the concept of flows
◦ A single flow corresponds to a type of data source
◦ Supports compression, batching & reliability setups per flow

 Data come in through a source
◦ Optionally processed by one or more decorators
◦ And transmitted out via sink

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 9

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 10

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 11

 Map Reduce is very powerful, but:
◦ It requires a Java programmer
◦ User has to re-invent common
◦ functionality (join, filter, etc.)

 Execution engine atop Hadoop

 Pig provides a higher level language Pig Latin

 Opens the system to non-Java programmers

 Provides common operations like join, group, filter, sort

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 12

 Web log processing.
 Data processing for web search platforms.
 Ad hoc queries across large data sets.
 Rapid prototyping of algorithms for processing large data
sets.
 Pig runs on local machine and job gets executed in hadoop
cluster
 $ cd /usr/share/cloudera/pig/
 $ bin/pig –x local
 grunt>
 Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);
 grpd = GROUP log BY user;
 cntd = FOREACH grpd GENERATE group, COUNT(log);
 STORE cntd INTO ‘output’;

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 13

 System for querying and managing structured data
 Built on top of hadoop
 Uses map reduce for execution
 SQL like syntax; supports
◦ From clause subquery
◦ ANSO Join (equi join )
◦ Multi-table insert
◦ Multi group-by
◦ Sampling
◦ Object traversal
 Engagement
◦ Summarization
◦ Ad hoc analysis
◦ Spam detection

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 14

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 15

Feature Hive Pig
Language SQL-like PigLatin
Schemas/Types Yes (explicit) Yes (implicit)
Partitions Yes No
Server Optional(thirft) No
User Defined Functions Yes Yes
Custom Serializer/Deserializer Yes Yes
DFS Direct Access Yes (implicit) Yes (explicit)
Join/Order/Sort Yes Yes
Shell Yes Yes
Streaming Yes No
Web Interface Yes No
JDBC/ODBC Yes (limited) No

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 16

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 17

Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 18

Más contenido relacionado

Similar a Hadoop for carrier

Co-existence or competition - RDBMS and Hadoop

Flytxt

Co existence or Competitions? RDBMS and Hadoop

Flytxt

Hadoop Analytics on Isilon Deep Dive

ClaudioFahey1

Sql on everything with drill

Julien Le Dem

Run Your First Hadoop 2.x Program

Skillspeed

An Introduction to Spring Data

Oliver Gierke

GlassFish in Production Environments

Bruno Borges

Following the release of ClusterControl 1.2.10, we were excited to demonstrate this latest version of the product. Our CTO, Johan Andersson discussed and demonstrated the new ClusterControl DSL, Integrated Developer Studio and Database Advisors, which are some of the cool new features we’ve introduced with ClusterControl 1.2.10. Highlights of ClusterControl 1.2.10 include: * ClusterControl DSL (Domain Specific Language) * Integrated Developer Studio (Developer IDE) * Database Advisors/JS bundle * On-premise Deployment of MySQL / MariaDB Galera Cluster (New implementation) * Detection of long running and deadlocked transactions (Galera) * Detection of most advanced (last committed) node in case of cluster failure (Galera) * Registration of manually added nodes with ClusterControl * Failover and Slave Promotion in MySQL 5.6 Replication setups * General front-end optimizations

Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...

Severalnines

Tom Kyte and and Cary Milsap - 2013

Connor McDonald

Lego Cloud SAP Virtualization Week 2012

Benoit Hudzia

HTML5 WebSocket Introduction

Marcelo Jabali

Many companies today move mountains of data using ETL (extract, transform, load) technology. But data volumes are growing too large to move, customers are now expecting real-time data, and ETL costs now account for 10-15% of computing capacity. In this slide presentation, you can see how data virtualization enables data structures that were designed independently to be leveraged together, in real time, and without data movement, reducing complexity, lowering IT costs, and minimizing risk.

Data Virtualization and ETL

Lily Luo

Introducing Apache Geode and Spring Data GemFire

John Blum

Open stackbrief happylearning

Ligong Duan

Flume intro-100717

Cloudera, Inc.

Flume intro-100715

Cloudera, Inc.

Java EE 7 - Embracing the Cloud and HTML 5

Amit Naik

Flume in 10minutes

dwmclary

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...

Big Data Spain

026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...

Neo4j

Similar a Hadoop for carrier (20)

Co-existence or competition - RDBMS and Hadoop

Co existence or Competitions? RDBMS and Hadoop

Hadoop Analytics on Isilon Deep Dive

Sql on everything with drill

Run Your First Hadoop 2.x Program

An Introduction to Spring Data

GlassFish in Production Environments

Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...

Tom Kyte and and Cary Milsap - 2013

Lego Cloud SAP Virtualization Week 2012

HTML5 WebSocket Introduction

Data Virtualization and ETL

Introducing Apache Geode and Spring Data GemFire

Open stackbrief happylearning

Flume intro-100717

Flume intro-100715

Java EE 7 - Embracing the Cloud and HTML 5

Flume in 10minutes

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...

026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...

Más de Flytxt

Flytxt corporate brochure

Flytxt

Data analytics is a game changer for telcos in the digital era

Flytxt

Omni channel customer experience

Flytxt

Analytics tools drive customer experience in the digital age

Flytxt

Enhancing Connected Customer Experience through Mobile Consumer Analytics

Flytxt

Flytxt: Personalizing Engagement

Flytxt

Flytxt a unique success story in big data analytics

Flytxt

Flytxt brochure

Flytxt

Roadmap to realizing the value of telco data – opportunities, challenges, use...

Flytxt

Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing

Flytxt

Deriving economic value for CSPs with Big Data [read-only]

Flytxt

Warid uganda big data experience

Flytxt

Más de Flytxt (12)

Flytxt corporate brochure

Data analytics is a game changer for telcos in the digital era

Omni channel customer experience

Analytics tools drive customer experience in the digital age

Enhancing Connected Customer Experience through Mobile Consumer Analytics

Flytxt: Personalizing Engagement

Flytxt a unique success story in big data analytics

Flytxt brochure

Roadmap to realizing the value of telco data – opportunities, challenges, use...

Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing

Deriving economic value for CSPs with Big Data [read-only]

Warid uganda big data experience

Último

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

A Principled Technologies deployment guide Conclusion Deploying VMware Cloud Foundation 5.1 on next gen Dell PowerEdge servers brings together critical virtualization capabilities and high-performing hardware infrastructure. Relying on our hands-on experience, this deployment guide offers a comprehensive roadmap that can guide your organization through the seamless integration of advanced VMware cloud solutions with the performance and reliability of Dell PowerEdge servers. In addition to the deployment efficiency, the Cloud Foundation 5.1 and PowerEdge solution delivered strong performance while running a MySQL database workload. By leveraging VMware Cloud Foundation 5.1 and PowerEdge servers, you could help your organization embrace cloud computing with confidence, potentially unlocking a new level of agility, scalability, and efficiency in your data center operations.

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Principled Technologies

Manulife - Insurer Innovation Award 2024

The Digital Insurer

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

Increase engagement and revenue with Muvi Live Paywall! In this presentation, we will explore the five key benefits of using Muvi Live Paywall to monetize your live streams. You'll learn how Muvi Live Paywall can help you: Monetize your live content easily: Set up pay-per-view access to your live streams and start generating revenue from your content. Increase audience engagement: Provide exclusive, premium content behind the paywall to keep your viewers engaged. Gain valuable viewer insights: Track viewer data and analytics to better understand your audience and tailor your content accordingly. Reduce content piracy: Muvi Live Paywall's security features help protect your content from unauthorized distribution. Streamline your workflow: The all-in-one platform simplifies the process of managing and monetizing your live streams. With Muvi Live Paywall, you can take control of your live stream monetization and create a sustainable business model for your content. Learn more about Muvi Live Paywall and start generating revenue from your live streams today!

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Roshan Dwivedi

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Hadoop for carrier

3.  600- 800 GB of CDR per day ◦ GPRS Signaling 50GB/day ◦ 3G Signaling 300GB/day ◦ Voice 100GB/day ◦ SMS 200GB/day  100 - 200 GB/day of Web Data Mammoth Data Data Analysis Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 3

6.  Framework for distributed processing of large data sets across clusters  Consists of ◦ Hadoop Distributed File System aka HDFS (File system) ◦ Hadoop MapReduce (programming model )  Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data ◦ Simple core, Modular and Extensible Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 6

7.  Current Bottleneck ◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data ◦ Loading terabytes of data into database is slow ◦ Parallel computing not a possibility in Conventional BI ETL ◦ User profile and application data resides in DB which can scale only vertically Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 7

9.  A Distributed data Collection server ◦ Scalable ◦ Configurable ◦ Extensible ◦ Manageable  Built around the concept of flows ◦ A single flow corresponds to a type of data source ◦ Supports compression, batching & reliability setups per flow  Data come in through a source ◦ Optionally processed by one or more decorators ◦ And transmitted out via sink Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 9

12.  Map Reduce is very powerful, but: ◦ It requires a Java programmer ◦ User has to re-invent common ◦ functionality (join, filter, etc.)  Execution engine atop Hadoop  Pig provides a higher level language Pig Latin  Opens the system to non-Java programmers  Provides common operations like join, group, filter, sort Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 12

13.  Web log processing.  Data processing for web search platforms.  Ad hoc queries across large data sets.  Rapid prototyping of algorithms for processing large data sets.  Pig runs on local machine and job gets executed in hadoop cluster  $ cd /usr/share/cloudera/pig/  $ bin/pig –x local  grunt>  Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);  grpd = GROUP log BY user;  cntd = FOREACH grpd GENERATE group, COUNT(log);  STORE cntd INTO ‘output’; Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 13

14.  System for querying and managing structured data  Built on top of hadoop  Uses map reduce for execution  SQL like syntax; supports ◦ From clause subquery ◦ ANSO Join (equi join ) ◦ Multi-table insert ◦ Multi group-by ◦ Sampling ◦ Object traversal  Engagement ◦ Summarization ◦ Ad hoc analysis ◦ Spam detection Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 14

16. Feature Hive Pig Language SQL-like PigLatin Schemas/Types Yes (explicit) Yes (implicit) Partitions Yes No Server Optional(thirft) No User Defined Functions Yes Yes Custom Serializer/Deserializer Yes Yes DFS Direct Access Yes (implicit) Yes (explicit) Join/Order/Sort Yes Yes Shell Yes Yes Streaming Yes No Web Interface Yes No JDBC/ODBC Yes (limited) No Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 16

Hadoop for carrier

Recomendados

Recomendados

Más contenido relacionado

Similar a Hadoop for carrier

Similar a Hadoop for carrier (20)

Más de Flytxt

Más de Flytxt (12)

Último

Último (20)

Hadoop for carrier