SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Ameba




NeutralTechnologyGroup
•               (                )

•   27

•
    NeutralTechnologyGroup(              4       )



•

•     twitter       @brfrn169   hatena       brfrn169
•   Hadoop/Hive     Patriot



•   Flume + HBase
    Stinger
Hadoop/Hive
              Patirot
Patriot
[2009   ]
11
        →

                →   GO
[2010   ]
3
7
11      WebUI
•
•
    -
•
    -
    -
Patriot
•   Ameba



•
•
•
    -   HDFS



•
    -   Hive(Map/Reduce)



•
    -   Patriot WebUI



•
    -   Hue
Patirot
•
•

•
Hadoop
•   HDFS
    -                           (            64M)

    -


•   MapReuce
    -
    -
    -   map   (=   )   reduce       (=   )
Hadoop             (                           )
                             Hadoop


   HDFS                                                HDFS
                 HDFS API             DataNode

                                                                 HDFS
                                                    map/reduce
Secondary                             TaskTracker
NameNode                                                         HDFS
                                                    map/reduce
  HDFS
                NameNode

                                                       HDFS
                JobTracker            DataNode

                                                                 HDFS
                                                    map/reduce
                                      TaskTracker
 MapReduce                                          map/reduce   HDFS

                 JobClient
Hive
•   Hadoop

•   Facebook



•   HiveQL       SQL        MapReduce

•                Pig(   )
Hive
•
    ‣   HiveQL
        -  SQL                   MapReduce



    ‣
        -
        -                Derby
        -   Patriot     MySQL



    ‣
        -   Partition
Hive
•              pigg_login.log


       2011-05-13 00:12:34	

   yamada_taro
       2011-05-13 02:23:45	

   suzuki_ichiro
       2011-05-13 03:34:56	

   brfrn169
       2011-05-13 04:56:34	

   yamada_taro
       2011-05-13 05:23:45	

   suzuki_ichiro
       2011-05-13 06:45:56	

   yamada_taro
       2011-05-13 07:56:23	

   yamada_hanako
       2011-05-13 08:45:56	

   yamada_taro
       2011-05-13 09:12:34	

   yamada_hanako
Hive
•   DDL
      CREATE TABLE pigg_login (
      time STRING,
      ameba_id STRING)
      PARTITIONED BY (dt STRING),
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’,
      STORED AS TEXTFILE;




•
      LOAD DATA INPATH ‘/path/pigg_login.log’
      INTO TABLE pigg_login
      PARTITION (dt=‘2011-05-13’);
Hive
•   HiveQL

    ‣                   UU(                         )
        -   SELECT count(distinct ameba_id) FROM pigg_login WHERE
            dt=‘2011-05-13’;

        -   SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE
            ‘2011-05-__’;


    ‣                             UU (JOIN, GROUP BY)
        -   SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p
            on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
Patriot

•
    -     4



    -

    -
    -
Patriot

•

                           Hive
                              Hive Job   Hadoop




    View
                     DB   MySQL
Patriot

    •                          namenode
                               2CoreCPU
                               16GB RAM



secondary namenode                            jobtracker
    2CoreCPU                                  4CoreCPU
    16GB RAM                                  24GB RAM




                               datanode/jobtracker × 18
                                     4CoreCPU
                                     16GB RAM
                                    1TB HDD × 4
Patriot

•
    -   CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0)

    -   Puppet

    -   Nagios, Ganglia


    -   ExtJS3.2.1

    -   Hinemos 3.2
Patriot

•
    DB


         SCP

                                    Hadoop



          •
          • gzip,SeqenceFile HDFS
          • Hive
Patriot

•                             DSL (1)

    import {
     service "gyaos"
     backup_dir "/data/log/gyaos"
     data {
      type "scp" ←            mysql           hdfs
         servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "]
         user "cy_httpd"
         path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*"
         limit 10000
     }
Patriot

•                          DSL (2)
    load {
      type "hive" ←     mysql
     table {
       name "game_login"
       regexp "^[^t]*t([^t]*)tlogin"
       output "$1"
       partition :dt => "#{$dt}", :service => "gyaos"
     }
     table {
       name "game_user"
       regexp "^([^t]*)t([^t]*)tregist_game"
       output "$2t$1"
       partition :dt => "#{$dt}", :service => "gyaos"
     }}}
Patriot

•                       Hadoop




             Hive Job
    Batch




                            DB
                         MySQL
Patriot

  •                                     DSL
mysql {
  host "localhost"
  port 3306
  username "patriot-batch"
  password "xxx
  database "gyaos"
}
analyze {
  name "gyaos_new_user_num_daily"
  primary "dt"
  hive_ql "select count(1), '#{$dt}' from game_user where dt='#{$dt}' and service='gyaos'"
}
analyze {
  name "gyaos_unregist_user_num_daily"
  primary "dt"
  hive_ql "select count(1), '#{$dt}' from game_user g join ameba_member a on (g.ameba_id =
a.ameba_id) where a.unregist_date <> '' and to_date(a.unregist_date)='#{$dt}' and
g.service='gyaos'"
}
Patriot

•
    ‣ HiveQL
      -
      -
    ‣               20GB

    ‣
Patriot

• WebUI(         )
Patriot

• WebUI(         )
Patriot

• WebUI(         )
Patriot

• Hue
 ‣ HiveQL    WebUI

 ‣
Patriot

• Hue
Patriot
•
•
    -   Flume

•   DSL

•
•
    -
Flume + HBase


      Stinger
Stinger

•
•

•   Flume + HBase
Flume

•
    ‣
    ‣ Cloudera

    ‣ Flume Agent
    ‣ Flume Collector
    ‣ Flume Master
Flume
HBase

•
    ‣   Google BigTable

    ‣   HDFS

    ‣   HDFS              /
HBase
•
    ‣        Row Key(   )

    ‣
        -
        -
        -
    ‣
        -
HBase
•
HBase

•
HBase
•
    ‣
    ‣
        -
    ‣
        -           /
Stinger

•
                                                 log



    flume master                     flume agent                flume collector

                                                       increment


       push                   polling

    websocket

                node + soket.io
                                         HBase
Stinger

• HBase
 ‣ Row Key
  - md5(                        ID +         )+                 ID +


  Column Family : hourly               Column Family : daily
  12 am 12 am 12 am 12 am              total   login   male    20’s
  total login male 20’s

  100    35     10         12          100     35      10      12
Stinger
•
    ‣
Stinger
•
•
•   Patriot

    •
    •   Hadoop/Hive


•   Stinger

    •
    •   Flume + HBase
•

Más contenido relacionado

La actualidad más candente

End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 

La actualidad más candente (20)

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Apache drill
Apache drillApache drill
Apache drill
 
מיכאל
מיכאלמיכאל
מיכאל
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 

Destacado

Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理
maruyama097
 
20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門
Toshiyuki Manabe
 
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnightYahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo!デベロッパーネットワーク
 

Destacado (18)

Log解析の基礎@phpcon2014
Log解析の基礎@phpcon2014Log解析の基礎@phpcon2014
Log解析の基礎@phpcon2014
 
Hadoopデータプラットフォーム #cwt2013
Hadoopデータプラットフォーム #cwt2013Hadoopデータプラットフォーム #cwt2013
Hadoopデータプラットフォーム #cwt2013
 
Hadoop入門
Hadoop入門Hadoop入門
Hadoop入門
 
Kuduを調べてみた #dogenzakalt
Kuduを調べてみた #dogenzakaltKuduを調べてみた #dogenzakalt
Kuduを調べてみた #dogenzakalt
 
Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理
 
Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~
 
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
 
Hadoopを用いた大規模ログ解析
Hadoopを用いた大規模ログ解析Hadoopを用いた大規模ログ解析
Hadoopを用いた大規模ログ解析
 
20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門
 
Flumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システムFlumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システム
 
Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例
 
HBaseを用いたグラフDB「Hornet」の設計と運用
HBaseを用いたグラフDB「Hornet」の設計と運用HBaseを用いたグラフDB「Hornet」の設計と運用
HBaseを用いたグラフDB「Hornet」の設計と運用
 
変わる!? リクルートグループのデータ解析基盤
変わる!? リクルートグループのデータ解析基盤変わる!? リクルートグループのデータ解析基盤
変わる!? リクルートグループのデータ解析基盤
 
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
 
Yahoo! JAPANのデータ基盤とHadoop #dbts2016
Yahoo! JAPANのデータ基盤とHadoop #dbts2016Yahoo! JAPANのデータ基盤とHadoop #dbts2016
Yahoo! JAPANのデータ基盤とHadoop #dbts2016
 
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnightYahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話
 

Similar a Amebaサービスのログ解析基盤

800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
Tatsuya Sasaki
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
S S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
Tatsuya Sasaki
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 

Similar a Amebaサービスのログ解析基盤 (20)

Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 

Más de Toshihiro Suzuki (8)

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
第25回 Hadoopソースコードリーディング 「HBase 最新情報」
第25回 Hadoopソースコードリーディング 「HBase 最新情報」第25回 Hadoopソースコードリーディング 「HBase 最新情報」
第25回 Hadoopソースコードリーディング 「HBase 最新情報」
 
HDP ハンズオンセミナー
HDP ハンズオンセミナーHDP ハンズオンセミナー
HDP ハンズオンセミナー
 
HBase at Ameba
HBase at AmebaHBase at Ameba
HBase at Ameba
 
HBaseを用いたグラフDB「Hornet」
HBaseを用いたグラフDB「Hornet」HBaseを用いたグラフDB「Hornet」
HBaseを用いたグラフDB「Hornet」
 
HBaseでグラフ構造を扱う(開発中)
HBaseでグラフ構造を扱う(開発中)HBaseでグラフ構造を扱う(開発中)
HBaseでグラフ構造を扱う(開発中)
 
MySQLによってタフになる会12章
MySQLによってタフになる会12章MySQLによってタフになる会12章
MySQLによってタフになる会12章
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Amebaサービスのログ解析基盤

  • 2. ( ) • 27 • NeutralTechnologyGroup( 4 ) • • twitter @brfrn169 hatena brfrn169
  • 3. Hadoop/Hive Patriot • Flume + HBase Stinger
  • 4. Hadoop/Hive Patirot
  • 5. Patriot [2009 ] 11 → → GO [2010 ] 3 7 11 WebUI
  • 6. • • - • - -
  • 7. Patriot • Ameba • •
  • 8. - HDFS • - Hive(Map/Reduce) • - Patriot WebUI • - Hue
  • 10. Hadoop • HDFS - ( 64M) - • MapReuce - - - map (= ) reduce (= )
  • 11. Hadoop ( ) Hadoop HDFS HDFS HDFS API DataNode HDFS map/reduce Secondary TaskTracker NameNode HDFS map/reduce HDFS NameNode HDFS JobTracker DataNode HDFS map/reduce TaskTracker MapReduce map/reduce HDFS JobClient
  • 12. Hive • Hadoop • Facebook • HiveQL SQL MapReduce • Pig( )
  • 13. Hive • ‣ HiveQL - SQL MapReduce ‣ - - Derby - Patriot MySQL ‣ - Partition
  • 14. Hive • pigg_login.log 2011-05-13 00:12:34 yamada_taro 2011-05-13 02:23:45 suzuki_ichiro 2011-05-13 03:34:56 brfrn169 2011-05-13 04:56:34 yamada_taro 2011-05-13 05:23:45 suzuki_ichiro 2011-05-13 06:45:56 yamada_taro 2011-05-13 07:56:23 yamada_hanako 2011-05-13 08:45:56 yamada_taro 2011-05-13 09:12:34 yamada_hanako
  • 15. Hive • DDL CREATE TABLE pigg_login ( time STRING, ameba_id STRING) PARTITIONED BY (dt STRING), ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’, STORED AS TEXTFILE; • LOAD DATA INPATH ‘/path/pigg_login.log’ INTO TABLE pigg_login PARTITION (dt=‘2011-05-13’);
  • 16. Hive • HiveQL ‣ UU( ) - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt=‘2011-05-13’; - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE ‘2011-05-__’; ‣ UU (JOIN, GROUP BY) - SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
  • 17. Patriot • - 4 - - -
  • 18. Patriot • Hive Hive Job Hadoop View DB MySQL
  • 19. Patriot • namenode 2CoreCPU 16GB RAM secondary namenode jobtracker 2CoreCPU 4CoreCPU 16GB RAM 24GB RAM datanode/jobtracker × 18 4CoreCPU 16GB RAM 1TB HDD × 4
  • 20. Patriot • - CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0) - Puppet - Nagios, Ganglia - ExtJS3.2.1 - Hinemos 3.2
  • 21. Patriot • DB SCP Hadoop • • gzip,SeqenceFile HDFS • Hive
  • 22. Patriot • DSL (1) import { service "gyaos" backup_dir "/data/log/gyaos" data { type "scp" ← mysql hdfs servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "] user "cy_httpd" path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*" limit 10000 }
  • 23. Patriot • DSL (2) load { type "hive" ← mysql table { name "game_login" regexp "^[^t]*t([^t]*)tlogin" output "$1" partition :dt => "#{$dt}", :service => "gyaos" } table { name "game_user" regexp "^([^t]*)t([^t]*)tregist_game" output "$2t$1" partition :dt => "#{$dt}", :service => "gyaos" }}}
  • 24. Patriot • Hadoop Hive Job Batch DB MySQL
  • 25. Patriot • DSL mysql { host "localhost" port 3306 username "patriot-batch" password "xxx database "gyaos" } analyze { name "gyaos_new_user_num_daily" primary "dt" hive_ql "select count(1), '#{$dt}' from game_user where dt='#{$dt}' and service='gyaos'" } analyze { name "gyaos_unregist_user_num_daily" primary "dt" hive_ql "select count(1), '#{$dt}' from game_user g join ameba_member a on (g.ameba_id = a.ameba_id) where a.unregist_date <> '' and to_date(a.unregist_date)='#{$dt}' and g.service='gyaos'" }
  • 26. Patriot • ‣ HiveQL - - ‣ 20GB ‣
  • 30. Patriot • Hue ‣ HiveQL WebUI ‣
  • 32. Patriot • • - Flume • DSL • • -
  • 33. Flume + HBase Stinger
  • 34. Stinger • • • Flume + HBase
  • 35. Flume • ‣ ‣ Cloudera ‣ Flume Agent ‣ Flume Collector ‣ Flume Master
  • 36. Flume
  • 37. HBase • ‣ Google BigTable ‣ HDFS ‣ HDFS /
  • 38. HBase • ‣ Row Key( ) ‣ - - - ‣ -
  • 41. HBase • ‣ ‣ - ‣ - /
  • 42. Stinger • log flume master flume agent flume collector increment push polling websocket node + soket.io HBase
  • 43. Stinger • HBase ‣ Row Key - md5( ID + )+ ID + Column Family : hourly Column Family : daily 12 am 12 am 12 am 12 am total login male 20’s total login male 20’s 100 35 10 12 100 35 10 12
  • 44. Stinger •
  • 46. Patriot • • Hadoop/Hive • Stinger • • Flume + HBase
  • 47.