SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
August
                                              31,
                                             2012



•  Why Hadoop and HBase?                     2


•  Social Media Monitoring
   •  Prospective Search and Coprocessors
•  Challenges & Lessons Learned
•  Resources to get started




Agenda
August
                                             31,
                                            2012



•  Spin-off of MeMo News AG, the             3

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
   HBase and Solr
•  Objective: Transforming data into
   insights




About Sentric
CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD	
  
August
                                                                        31,
                                                                       2012


                                                                       5




     Information        Information     Analysis &        Insight
      Gathering          Processing   Interpretation   Presentation




Why Hadoop and HBase?

Social Media Monitoring Process
August
                                                                  31,
                                                                 2012


                                                                 6
                                      Cost
                                    effective



                                                     High
                    Freshness
                                                   scalable




                                   SMM
                        Reliable                  RT Alerting



                                    Analytical
                                   capabilities




Why Hadoop and HBase?

Requirements
August
                                           31,
                                          2012



•  HDFS + MapReduce                       7


•  Based on Google Papers
•  Distributed Storage and Computation
   Framework
•  Affordable Hardware, Free Software
•  Significant Adoption




Why Hadoop and HBase?

Hadoop
August
                                               31,
                                              2012



•    Non-Relational, Distributed Database     8


•    Column-Oriented
•    Multi-Dimensional
•    High Availability
•    High Performance
•    Build on top of HDFS as storage layer




Why Hadoop and HBase?

HBase
August
                                                      31,
                                                     2012


                                                     9

Storage                      HBase /HDFS



Search                           Solr



Analytics               Hadoop             Mahout



Event mechanism (MQ)         HBase RowLog



Real-time alerting         Prospective search



Why Hadoop and HBase?

Technology Stack
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
August
                                                                                   31,
                                                                                  2012


                                                                                 11
Downloaded Articles




                                   match?


Search Agents




Output

                          Web-UI     Reports   RT Alerts

                                                  Icons by http://dryicons.com

Social Media Monitoring

Overview
August
                                                                          31,
                                                                         2012


                                                                        12
                                     n News Agents




                            REST
                                        HBase

                                       Coprocessor
                   Web-UI




                   MySQL      Solr      RT Alerts

                                         Icons by http://dryicons.com

Social Media Monitoring

Solution Architecture
August
                                                                                  31,
                                                                                 2012


                                                                                13




                                           Processing

 Put operations




                             Prospective
                               Search
               HRegion                           RT Alerts
             HRegionServer

                                                 Icons by http://dryicons.com

Social Media Monitoring

Prospective Search with Coprocessors
August
                                                    31,
                                                   2012




•  Monthly growth                                 14


      •  Index: 200GB
           •  50 Mio. docs/month
      •  HBase: 600 GB
           •  Raw data, meta data and extracted
              data
•  A few 1000 map-reduce jobs/
   month

Social Media Monitoring

Key Figures
CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L
Augus
                                                                                         t 31,
                                                                                        2012
1     Benchmarks - workloads
2     Supervision
                                                                                        16
3     Keys and shards – Schema design /LG
4     Timestamps, the 4th dimension
5     Short ColumnFamily names->
6     File handles. OS
7     JVM Tuning, GC !!!
8     Scaling region servers, data locality!
9     Automatic vs manual splits, compaction
10    Do not use HBase as rock solid in prod
11    Forget feuerwehr aktionen, it takes some time
12    Use Hbase for a apropriate use case
13    Tune and tweak – it‘s not a project – it‘s a process
14    You need devops in production
15    Huge know-how curve, you need to know the hole ecosystem
16    Use a distribution, ist packed, tested and supports migration, enterprise grade
17    Virtualisierung, Hardware
18    Dont struggle to much, there is a good community
19    Share your knowledge
20    It‘s early state, many tools around, a few still missing




Challenges & Lessons Learned
August
                                                     31,
                                                    2012



•  Everyone is still learning                      17

•  Some issues only appear at scale
      •  At scale, nothing works as advertised
•  Production cluster configuration
      •  Hardware issues
      •  Tuning cluster configuration to our work
         loads
•  HBase stability
•  Monitoring health of HBase

Challenges & Lessons Learned

Challenges
August
                                                31,
                                               2012



•  Do not rely on HBase as frontend           18

   storage layer. It’s not going to be rock
   solid
•  Don’t struggle to much, there is a
   good community
•  Share your knowledge
•  It‘s early stage, many tools around, a
   few still missing


Challenges & Lessons Learned

Lessons - General
August
                                                        31,
                                                       2012


•  Use HBase for an appropriate use case              19

•  Use a distribution, its packed, tested and
   supports migration, enterprise grade
•  Benchmarks – know your workloads &
   query patterns
      •    YCSB
•  Schema & Key Design
      •    What’s queried together should be stored
           together
•  Scaling region servers, data locality!
•  Virtualization vs. Real Hardware

Challenges & Lessons Learned

Lessons - Planning
August
                                                                       31,
                                                                      2012

•      Number of CF < 10                                             20
      •    Compaction + Flushing I/O intensive
•      Short ColumnFamily names
      •    HFile index size occupying aloc RAM (storefileindexSize)
•      OS file handles
      •    ulimit –n 32768
•      JVM Tuning, GC !!!
      •    HMaster 1024 MB
      •    RegionServer 8192 MB
      •    -XX:+UseConcMarkSweepGC
      •    -XX:+CMSIncrementalMode
•      Automatic vs. manual splits
•      Be careful with expensive operations in coprocessors
•      Play with all the configurations and benchmark for tuning


Challenges & Lessons Learned

Lessons - Performance Tuning
August
                                                  31,
                                                 2012


•  Monitoring/Operational tooling is most       21

   important
•  Forget “emergency actions”, it takes
   some time
•  Tune and tweak – it‘s not a project – it‘s
   a process
•  You need DevOps in production
•  Huge know-how curve, you need to
   know the whole ecosystem
      •    Hadoop, HDFS, MapRed


Challenges & Lessons Learned

Lessons - Operation
August
                                          31,
                                         2012


•  http://hbase.apache.org/book.html    22

•  http://www.sentric.ch/blog/best-
   practice-why-monitoring-hbase-is-
   important
•  http://www.sentric.ch/blog/hadoop-
   overview-of-top-3-distributions
•  http://www.sentric.ch/blog/hadoop-
   best-practice-cluster-checklist
•  http://outerthought.org/blog/465-
   ot.html



Resources to get started
August
                                                            31,
                                                           2012


                                                          23




                       Questions?
         Christian Gügi, christian.guegi@sentric.ch
       Jean-Pierre König, jean-pierre.koenig@sentric.ch




NoSQL Roadshow Basel

Thank you!
Augus
           t 31,
          2012

          24




Masters

Cluster
Augus
           t 31,
          2012

          25




Worker

Cluster

Más contenido relacionado

Destacado (14)

Model-driven prototyping for corporate software specification
Model-driven prototyping for corporate software specification Model-driven prototyping for corporate software specification
Model-driven prototyping for corporate software specification
 
LEARN SPANISH AT ILERI SPANISH SCHOOL
LEARN SPANISH AT ILERI SPANISH SCHOOLLEARN SPANISH AT ILERI SPANISH SCHOOL
LEARN SPANISH AT ILERI SPANISH SCHOOL
 
a decisão judicial
a decisão judiciala decisão judicial
a decisão judicial
 
Tp EducacióN FíSica.Profesora Denise
Tp EducacióN FíSica.Profesora DeniseTp EducacióN FíSica.Profesora Denise
Tp EducacióN FíSica.Profesora Denise
 
Meran-o Magazine 2010
Meran-o Magazine 2010Meran-o Magazine 2010
Meran-o Magazine 2010
 
Web 2.0, una introducción
Web 2.0, una introducciónWeb 2.0, una introducción
Web 2.0, una introducción
 
Venom
VenomVenom
Venom
 
Circuito turistico
Circuito turisticoCircuito turistico
Circuito turistico
 
Antes e depois NEWZINC
Antes e depois NEWZINCAntes e depois NEWZINC
Antes e depois NEWZINC
 
Presentación Foro Contratación Socialmente Responsabler 2014.03
Presentación Foro Contratación Socialmente Responsabler 2014.03Presentación Foro Contratación Socialmente Responsabler 2014.03
Presentación Foro Contratación Socialmente Responsabler 2014.03
 
Oportunidades de financiación a empresas en el sector agroalimentario
Oportunidades de financiación a empresas en el sector agroalimentarioOportunidades de financiación a empresas en el sector agroalimentario
Oportunidades de financiación a empresas en el sector agroalimentario
 
CV Vibert Aout 2015
CV Vibert Aout 2015CV Vibert Aout 2015
CV Vibert Aout 2015
 
A passagem muloji
A passagem mulojiA passagem muloji
A passagem muloji
 
Manual de Perguntas e Respostas sobre Empresas Familiares
Manual de Perguntas e Respostas sobre Empresas FamiliaresManual de Perguntas e Respostas sobre Empresas Familiares
Manual de Perguntas e Respostas sobre Empresas Familiares
 

Similar a Near Real Time Processing of Social Media Data with HBase

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Christian Gügi
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesHortonworks
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project GuidanceVarad Meru
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 
UK - Agile Data Applications on Hadoop
UK - Agile Data Applications on HadoopUK - Agile Data Applications on Hadoop
UK - Agile Data Applications on HadoopHortonworks
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012m_hepburn
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Big data and you
Big data and you Big data and you
Big data and you IBM
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...AlmereDataCapital
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) dataOscar Renalias
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.
 

Similar a Near Real Time Processing of Social Media Data with HBase (20)

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
UK - Agile Data Applications on Hadoop
UK - Agile Data Applications on HadoopUK - Agile Data Applications on Hadoop
UK - Agile Data Applications on Hadoop
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Big data and you
Big data and you Big data and you
Big data and you
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) data
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 

Más de Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store AnalysisChristian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaChristian Gügi
 

Más de Christian Gügi (6)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 

Último

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Near Real Time Processing of Social Media Data with HBase

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. August 31, 2012 •  Why Hadoop and HBase? 2 •  Social Media Monitoring •  Prospective Search and Coprocessors •  Challenges & Lessons Learned •  Resources to get started Agenda
  • 3. August 31, 2012 •  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland •  Big Data expert, focused on Hadoop, HBase and Solr •  Objective: Transforming data into insights About Sentric
  • 4. CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD  
  • 5. August 31, 2012 5 Information Information Analysis & Insight Gathering Processing Interpretation Presentation Why Hadoop and HBase? Social Media Monitoring Process
  • 6. August 31, 2012 6 Cost effective High Freshness scalable SMM Reliable RT Alerting Analytical capabilities Why Hadoop and HBase? Requirements
  • 7. August 31, 2012 •  HDFS + MapReduce 7 •  Based on Google Papers •  Distributed Storage and Computation Framework •  Affordable Hardware, Free Software •  Significant Adoption Why Hadoop and HBase? Hadoop
  • 8. August 31, 2012 •  Non-Relational, Distributed Database 8 •  Column-Oriented •  Multi-Dimensional •  High Availability •  High Performance •  Build on top of HDFS as storage layer Why Hadoop and HBase? HBase
  • 9. August 31, 2012 9 Storage HBase /HDFS Search Solr Analytics Hadoop Mahout Event mechanism (MQ) HBase RowLog Real-time alerting Prospective search Why Hadoop and HBase? Technology Stack
  • 10. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 11. August 31, 2012 11 Downloaded Articles match? Search Agents Output Web-UI Reports RT Alerts Icons by http://dryicons.com Social Media Monitoring Overview
  • 12. August 31, 2012 12 n News Agents REST HBase Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.com Social Media Monitoring Solution Architecture
  • 13. August 31, 2012 13 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.com Social Media Monitoring Prospective Search with Coprocessors
  • 14. August 31, 2012 •  Monthly growth 14 •  Index: 200GB •  50 Mio. docs/month •  HBase: 600 GB •  Raw data, meta data and extracted data •  A few 1000 map-reduce jobs/ month Social Media Monitoring Key Figures
  • 15. CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L
  • 16. Augus t 31, 2012 1  Benchmarks - workloads 2  Supervision 16 3  Keys and shards – Schema design /LG 4  Timestamps, the 4th dimension 5  Short ColumnFamily names-> 6  File handles. OS 7  JVM Tuning, GC !!! 8  Scaling region servers, data locality! 9  Automatic vs manual splits, compaction 10  Do not use HBase as rock solid in prod 11  Forget feuerwehr aktionen, it takes some time 12  Use Hbase for a apropriate use case 13  Tune and tweak – it‘s not a project – it‘s a process 14  You need devops in production 15  Huge know-how curve, you need to know the hole ecosystem 16  Use a distribution, ist packed, tested and supports migration, enterprise grade 17  Virtualisierung, Hardware 18  Dont struggle to much, there is a good community 19  Share your knowledge 20  It‘s early state, many tools around, a few still missing Challenges & Lessons Learned
  • 17. August 31, 2012 •  Everyone is still learning 17 •  Some issues only appear at scale •  At scale, nothing works as advertised •  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work loads •  HBase stability •  Monitoring health of HBase Challenges & Lessons Learned Challenges
  • 18. August 31, 2012 •  Do not rely on HBase as frontend 18 storage layer. It’s not going to be rock solid •  Don’t struggle to much, there is a good community •  Share your knowledge •  It‘s early stage, many tools around, a few still missing Challenges & Lessons Learned Lessons - General
  • 19. August 31, 2012 •  Use HBase for an appropriate use case 19 •  Use a distribution, its packed, tested and supports migration, enterprise grade •  Benchmarks – know your workloads & query patterns •  YCSB •  Schema & Key Design •  What’s queried together should be stored together •  Scaling region servers, data locality! •  Virtualization vs. Real Hardware Challenges & Lessons Learned Lessons - Planning
  • 20. August 31, 2012 •  Number of CF < 10 20 •  Compaction + Flushing I/O intensive •  Short ColumnFamily names •  HFile index size occupying aloc RAM (storefileindexSize) •  OS file handles •  ulimit –n 32768 •  JVM Tuning, GC !!! •  HMaster 1024 MB •  RegionServer 8192 MB •  -XX:+UseConcMarkSweepGC •  -XX:+CMSIncrementalMode •  Automatic vs. manual splits •  Be careful with expensive operations in coprocessors •  Play with all the configurations and benchmark for tuning Challenges & Lessons Learned Lessons - Performance Tuning
  • 21. August 31, 2012 •  Monitoring/Operational tooling is most 21 important •  Forget “emergency actions”, it takes some time •  Tune and tweak – it‘s not a project – it‘s a process •  You need DevOps in production •  Huge know-how curve, you need to know the whole ecosystem •  Hadoop, HDFS, MapRed Challenges & Lessons Learned Lessons - Operation
  • 22. August 31, 2012 •  http://hbase.apache.org/book.html 22 •  http://www.sentric.ch/blog/best- practice-why-monitoring-hbase-is- important •  http://www.sentric.ch/blog/hadoop- overview-of-top-3-distributions •  http://www.sentric.ch/blog/hadoop- best-practice-cluster-checklist •  http://outerthought.org/blog/465- ot.html Resources to get started
  • 23. August 31, 2012 23 Questions? Christian Gügi, christian.guegi@sentric.ch Jean-Pierre König, jean-pierre.koenig@sentric.ch NoSQL Roadshow Basel Thank you!
  • 24. Augus t 31, 2012 24 Masters Cluster
  • 25. Augus t 31, 2012 25 Worker Cluster