SlideShare una empresa de Scribd logo
1 de 43
Treasure Data
                      The architecture of data analytics PaaS on AWS



                                    Masahiro Nakagawa

                                   JAWS Days: 2013/03/16




Friday, April 5, 13
Who are you?
          Masahiro Nakagawa
              • @repeatedly / masa@treasure-data.com


          Treasure Data, Inc.
              • Senior Software Engineer, since 2012/11

          Open Source projects
              •   D Programming Language
              •   MessagePack: D, Python, etc...
              •   Fluentd: Core, mongo, etc...
              •   etc...

                                                          2

Friday, April 5, 13
Introduction to
          Treasure Data




Friday, April 5, 13
Company Overview
          Silicon Valley-based Company
              • All Founders are Japanese
                      • Hironobu Yoshikawa
                      • Kazuki Ohta
                      • Sadayuki Furuhashi


          OSS Enthusiasts
              • MessagePack, Fluentd, etc.




                                             4

Friday, April 5, 13
Investors
             Bill Tai
             Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO
             Othman Laraki - Former VP Growth at Twitter
             James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
              Founders
             Anand Babu Periasamy, Hitesh Chellani - Gluster Founders
             Yukihiro “Matz” Matsumoto - Creator of Ruby
             Dan Scheinman - Director of Arista Networks
             Jerry Yang - Founder of Yahoo!
             + 10 more people
              • and....
                                                                         5

Friday, April 5, 13
Treasure Data = Cloud + Big Data
     Cloud                                                                            Big Data-as-a-Service



                            Database-as-a-service




                                             Enterprise
                      Lightweight             RDBMS           Traditional
                        RDBMS                               Data Warehouse

                                                    DB2
  On-Premise
                                    $34B                                     $10B
                                    market                                   market


                                                          1Bil entry                             Data Volume
                                                          Or 10TB


          © 2012 Forrester Research, Inc. Reproduction Prohibited                                              6

Friday, April 5, 13
Why Cloud? ‘Time’ is Money
                             Ideal
    Customer              Expectation
     Value

                                                        Obsolete
                                                        over time


                                           Reality
                                        (On-Premise)


                                                             Upgrade
                      HW/SW Selection, PoC, Deploy...
                                                                       Time
      Sign-up or PO




                                                                         7

Friday, April 5, 13
Big Data Adoption Stages
                        Optimization           What’s the best?
                      Predictive Analysis      What’s a trend?     Analytics
                      Statistical Analysis         Treasure Data’s FOCUS
                                                    Why?
                            Alerts                  Error?(80% of needs)
                      Drill Down Query         Where exactly?
                                                                       Reporting
                      Ad-hoc Reports               Where?
                      Standard Reports         What happened?

                                     Intelligence Sophistication
                                                                               8

Friday, April 5, 13
Full Stack Support for Big Data Reporting

        Our best-in-class architecture       Data from almost any source
        and operations team ensure the       can be securely and reliably
        integrity and availability of your   uploaded using td-agent in
        data.                                streaming or batch mode.




        Our SQL, REST, JDBC, ODBC            You can store gigabytes to
        and command-line interfaces          petabytes of data efficiently and
        support all major query tools        securely in our cloud-based
        and approaches.                      columnar datastore.




                                                                       9

Friday, April 5, 13
Vision: Single Analytics Platform for the World
                                                                   10

Friday, April 5, 13
11

         Our Customers – Fortune Global 500 leaders and
         start-ups including:




Friday, April 5, 13
Treasure Data’s
          Service Architecture




Friday, April 5, 13
Treasure Data = Collect + Store + Query
                                                                13

Friday, April 5, 13
Example in AdTech: MobFox




           1. Europe’s largest independent mobile ad exchange.
           2. 20 billion imps/month (circa Jan. 2013)
           3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
           4. Needed Big Data Analytics infrastructure ASAP.

                                                                  14

Friday, April 5, 13
Two Weeks From Start to Finish!




                                                        15

Friday, April 5, 13
Used AWS Products (1)
          RDS
              • Store user information, job status, etc...
              • Store metadata of our columnar database
              • Queue of worker (perfectqueue / perfectsched)


          EC2
              • API servers
              • Hadoop clusters
              • Job workers
                      • Using Chef to deploy


                                                                16

Friday, April 5, 13
Used AWS Products (2)
          ELB
              • Load balancing of API servers
              • Load balancing of td-agents


          S3
              • Columnar storage built on top of S3
                      • MessagePack columnar format
                      • realtime / archive storage
              • Our Result feature supports S3 output.

                  No EMR, SQS and other products !
                                                         17

Friday, April 5, 13
Architecture Breakdown



      Data Collection             Data Store/Analytics        Connectivity
      • Increasing variety of     • Remaining complexity in   • Required to ensure
        data sources                both traditional DWH        connectivity with
      • No single data schema       and Hadoop (very slow       existing BI/visualization/
      • Lack of streaming data      time to market)             apps by JDBC, REST
        collection method         • Challenges in scaling       and ODBC.
      • 60% of Big Data project     data volume and           • Output ot other services,
        resource consumed           expanding cost.             e.g. S3, RDBMS, etc.




                                                                                         18

Friday, April 5, 13
1) Data Collection
          60% of BI project resource is consumed here
          Most ‘underestimated’ and ‘unsexy’ but MOST important
          Fluentd: OSS lightweight but robust Log Collector
              • http://fluentd.org/




                                                               19

Friday, April 5, 13
Fluentd
                      the missing log collector



                               fluentd.org

                                                  20

Friday, April 5, 13
In short
             Open sourced log collector written in Ruby
             Using rubygems ecosystem for plugins



                  It’s like syslogd, but
              uses JSON for log messages

                                                           21

Friday, April 5, 13
Time       2012-02-04 01:33:51
        Apache                                                               Tag          apache.log
                                                                            Record {
                                                                                       "host": "127.0.0.1",
                                                                        tail           "method": "GET",
                                                                                       "path": "/",
                       write                                                           ...
                                                                                   }

                                                                                             insert
  127.0.0.1
  127.0.0.1
  127.0.0.1
              -
              -
              -
                  -
                  -
                  -
                      [11/Dec/2012:07:26:27]
                      [11/Dec/2012:07:26:30]
                      [11/Dec/2012:07:26:32]
                                               "GET
                                               "GET
                                               "GET
                                                      /
                                                      /
                                                      /
                                                          ...
                                                          ...
                                                          ...
                                                                       Fluentd
  127.0.0.1   -   -   [11/Dec/2012:07:26:40]   "GET   /   ...
  127.0.0.1   -   -   [11/Dec/2012:07:27:01]   "GET   /   ...
                               ...




                                                                 event
                                                                buffering
                                                                                       Mongo
                                                                                                         22

Friday, April 5, 13
Architecture
             Pluggable     Pluggable   Pluggable



                  Input     Buffer     Output

             > Forward     > Memory    > Forward
             > HTTP        > File      > File
             > File tail               > Amazon S3
             > dstat                   > MongoDB
             > ...                     > ...

                                                     23

Friday, April 5, 13
Before Fluentd
              Server1           Server2               Server3

          Application         Application           Application


                        ・・・               ・・・                    ・・・




                                                High Latency!
                                                must wait for a day...
                               Fluent
                              Log Server
                                                                  24

Friday, April 5, 13
After Fluentd
              Server1                Server2              Server3

          Application            Application             Application


               Fluentd   ・・・         Fluentd   ・・・        Fluentd   ・・・




                                                     In streaming!

                           Fluentd             Fluentd

                                                                       25

Friday, April 5, 13
Access logs                                   Alerting
     Apache                                        Nagios

    App logs                                      Analysis
     Frontend                                      MongoDB
     Backend
                                                   MySQL

    System logs                                    Hadoop
      syslogd
                      filter / buffer / routing
                                                  Archiving
    Databases                                      Amazon S3
                                                             26

Friday, April 5, 13
td-agent
             Open sourced distribution package of fluentd
             ETL part of Treasure Data
             Including useful components
                 • ruby, jemalloc, fluentd
                 • 3rd party gems: td, mongo, webhdfs, etc...
                      •   td plugin is for Treasure Data

             http://packages.treasure-data.com/



                                                                27

Friday, April 5, 13
Treasure Data Service Architecture
                                                                 This!

                  Apache

                      App                                                        Treasure Data
                                              td-agent                           columnar data
                      App       RDBMS                                             warehouse

                  Other data sources

                                                                                        MAPREDUCE JOBS

                                         HIVE, PIG (to be supported)
                            td-command
                                                                                      Query
                                                                         Query
                                                                                      Processing
                                                                          API
                                         JDBC, REST                                   Cluster
            User             BI apps




                                                                                                    28

Friday, April 5, 13
AWS plugins
             S3
             SNS
             SQS
             DynamoDB
             foward-aws
             RDS                       http://fluentd.org/plugin/
             RedShift
             CloudWatch
             Yet Another Cloud Watch
             CloudWatch Lite

                                                                29

Friday, April 5, 13
2) Data Store / Analytics - Columnar Storage




                                                    30

Friday, April 5, 13
Treasure Data Service Processing Flow
                                                Worker
             Frontend
                                    Job Queue                     Hadoop




                                                                  Hadoop


              Applications push
              metrics to Fluentd
                                                               sums up data minutes
              (via local Fluentd)    Fluentd    Fluentd         (partial aggregation)



                      Treasure
                                                          Librato Metrics
                          Data
         for historical analysis                           for realtime analysis

                                                                                        31

Friday, April 5, 13
Friday, April 5, 13
Structure of Columnar Storages

               import             bulk import                     SELECT ...



            Import Storage         Bulk Import Storage


                             Realtime Storage              Archive Storage

                                                         merge (every 1 hour)

                         23c82b0ba3405d4c15aa85d2190e     2013-03-15 00:23:00 912ec80
                         6d7b1482412ab14f0332b8aee119     2013-03-16 00:01:00 277a259
                         8a7bc848b2791b8fd603c719e54f                   ...
                         0e3d402b17638477c9a7977e7dab
                                     ...



                                                                                        33

Friday, April 5, 13
Query Language




                      Query Execution




                      Columnar Data




                      Object Storage




                                 34

Friday, April 5, 13
1/4: Compile SQL into MapReduce

                         SQL Statement
                                  SELECT COUNT(DISTINCT ip) FROM tbl;



                              Hive
                      SQL - to - MapReduce




                                                                   35

Friday, April 5, 13
2/4: MapReduce is executed in parallel

                                                           SELECT COUNT(DISTINCT ip) FROM tbl;




                      cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)



                                                                                                 36

Friday, April 5, 13
3/4: Columnar Data Access

                                                              SELECT COUNT(DISTINCT ip) FROM tbl;




                      10Gbps Network




                                       Read ONLY the Required Part of Data


                                                                                                    37

Friday, April 5, 13
4/4: Object-based Storage




                                     38

Friday, April 5, 13
Data first, Schema later


            SELECT           54 (int)    “test” (string)        120 (int)         NULL




            Schema           user:int        name:string       value:int        host:int




            Raw data(JSON)   {“user”:54, “name”:”test”, “value”:”120”, “host”:”local”}




                                                                                           39

Friday, April 5, 13
3) Connectivity

                                   REST API
                      td-command
                                                                 Query
                                                       Query
      Query                                             API
                                                                 Processing
                                   JDBC, ODBC Driver             Cluster
                       BI apps




                       Web App
                                                           Treasure Data
         Result         MySQL                             Columnar Storage

                         S3
                        …




                                                                              40

Friday, April 5, 13
Multi-Tenancy
    All customers share the Hadoop clusters (Multi Data Centers)
    Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade

                                                                       Job Submission
                                                                       + Plan Change
                                     Local FairScheduler

                      datacenter A

                                     Local FairScheduler
                                                               Global
                      datacenter B
                                                              Scheduler
                                     Local FairScheduler

                      datacenter C                            On-Demand
                                                           Resouce Allocation
                                     Local FairScheduler
                      datacenter D


                                                                                  41

Friday, April 5, 13
Conclusion
          Treasure Data
              • Cloud based Big-data analytics platform
              • Provide Machete for Big data reporting

          Big Data processing
              • Collect / Store / Analytics / Visualization
                       Our focus!
          Our used AWS products
              • EC2, S3, RDS, ELB
              • Building Treasure Data specific systems on AWS


                                                                 42

Friday, April 5, 13
Big Data for the Rest of Us

                      www.treasure-data.com | @TreasureData




Friday, April 5, 13

Más contenido relacionado

La actualidad más candente

Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
AutoGluonではじめるAutoML
AutoGluonではじめるAutoMLAutoGluonではじめるAutoML
AutoGluonではじめるAutoML西岡 賢一郎
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptxssuser55cbdb
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps]
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps] Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps]
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps] Google Cloud Platform - Japan
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 

La actualidad más candente (20)

Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
AutoGluonではじめるAutoML
AutoGluonではじめるAutoMLAutoGluonではじめるAutoML
AutoGluonではじめるAutoML
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptx
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps]
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps] Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps]
Social GAME における AI 活用事例 [第 4 回 Google Cloud INSIDE Games & Apps]
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 

Destacado

Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16PolarSeven Pty Ltd
 
应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍Hardway Hou
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyTreasure Data, Inc.
 
PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016brendandburns
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overviewTokyo Azure Meetup
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
Define y desarrolla tu primera api
Define y desarrolla tu primera apiDefine y desarrolla tu primera api
Define y desarrolla tu primera apiCloudAppi
 
Big data y las apis (big data spain)
Big data y las apis (big data spain)Big data y las apis (big data spain)
Big data y las apis (big data spain)CloudAppi
 
Big Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesBig Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesPankaj Khattar
 
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Amazon Web Services
 
Database Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantDatabase Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantPini Dibask
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSSN Masahiro
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Microsoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformMicrosoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformEsri
 
(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureDataToru Takahashi
 
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)pmluque
 

Destacado (20)

Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16
 
应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-Tenancy
 
Azure: PaaS or IaaS
Azure: PaaS or IaaSAzure: PaaS or IaaS
Azure: PaaS or IaaS
 
PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overview
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Define y desarrolla tu primera api
Define y desarrolla tu primera apiDefine y desarrolla tu primera api
Define y desarrolla tu primera api
 
D naiyer resume
D naiyer resumeD naiyer resume
D naiyer resume
 
Big data y las apis (big data spain)
Big data y las apis (big data spain)Big data y las apis (big data spain)
Big data y las apis (big data spain)
 
Big Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesBig Data as PaaS in Enterprises
Big Data as PaaS in Enterprises
 
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
 
Database Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantDatabase Consolidation using Oracle Multitenant
Database Consolidation using Oracle Multitenant
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Microsoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformMicrosoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure Platform
 
Bi risk services 2013
Bi risk services 2013Bi risk services 2013
Bi risk services 2013
 
(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData
 
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
 

Similar a Treasure Data PaaS Architecture on AWS

Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)John Adams
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure DataTreasure Data, Inc.
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Makoto Yui
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemallMakoto Yui
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Karthik Murugesan
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Bratamay Majumder
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value Splunk
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 

Similar a Treasure Data PaaS Architecture on AWS (20)

Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big Data
Big DataBig Data
Big Data
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
20100301icde
20100301icde20100301icde
20100301icde
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0
 
Future of BI Deck
Future of BI Deck Future of BI Deck
Future of BI Deck
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 

Más de Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 

Más de Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Treasure Data PaaS Architecture on AWS

  • 1. Treasure Data The architecture of data analytics PaaS on AWS Masahiro Nakagawa JAWS Days: 2013/03/16 Friday, April 5, 13
  • 2. Who are you?  Masahiro Nakagawa • @repeatedly / masa@treasure-data.com  Treasure Data, Inc. • Senior Software Engineer, since 2012/11  Open Source projects • D Programming Language • MessagePack: D, Python, etc... • Fluentd: Core, mongo, etc... • etc... 2 Friday, April 5, 13
  • 3. Introduction to Treasure Data Friday, April 5, 13
  • 4. Company Overview  Silicon Valley-based Company • All Founders are Japanese • Hironobu Yoshikawa • Kazuki Ohta • Sadayuki Furuhashi  OSS Enthusiasts • MessagePack, Fluentd, etc. 4 Friday, April 5, 13
  • 5. Investors  Bill Tai  Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO  Othman Laraki - Former VP Growth at Twitter  James Lindenbaum, Adam Wiggins, Orion Henry - Heroku Founders  Anand Babu Periasamy, Hitesh Chellani - Gluster Founders  Yukihiro “Matz” Matsumoto - Creator of Ruby  Dan Scheinman - Director of Arista Networks  Jerry Yang - Founder of Yahoo!  + 10 more people • and.... 5 Friday, April 5, 13
  • 6. Treasure Data = Cloud + Big Data Cloud Big Data-as-a-Service Database-as-a-service Enterprise Lightweight RDBMS Traditional RDBMS Data Warehouse DB2 On-Premise $34B $10B market market 1Bil entry Data Volume Or 10TB © 2012 Forrester Research, Inc. Reproduction Prohibited 6 Friday, April 5, 13
  • 7. Why Cloud? ‘Time’ is Money Ideal Customer Expectation Value Obsolete over time Reality (On-Premise) Upgrade HW/SW Selection, PoC, Deploy... Time Sign-up or PO 7 Friday, April 5, 13
  • 8. Big Data Adoption Stages Optimization What’s the best? Predictive Analysis What’s a trend? Analytics Statistical Analysis Treasure Data’s FOCUS Why? Alerts Error?(80% of needs) Drill Down Query Where exactly? Reporting Ad-hoc Reports Where? Standard Reports What happened? Intelligence Sophistication 8 Friday, April 5, 13
  • 9. Full Stack Support for Big Data Reporting Our best-in-class architecture Data from almost any source and operations team ensure the can be securely and reliably integrity and availability of your uploaded using td-agent in data. streaming or batch mode. Our SQL, REST, JDBC, ODBC You can store gigabytes to and command-line interfaces petabytes of data efficiently and support all major query tools securely in our cloud-based and approaches. columnar datastore. 9 Friday, April 5, 13
  • 10. Vision: Single Analytics Platform for the World 10 Friday, April 5, 13
  • 11. 11 Our Customers – Fortune Global 500 leaders and start-ups including: Friday, April 5, 13
  • 12. Treasure Data’s Service Architecture Friday, April 5, 13
  • 13. Treasure Data = Collect + Store + Query 13 Friday, April 5, 13
  • 14. Example in AdTech: MobFox 1. Europe’s largest independent mobile ad exchange. 2. 20 billion imps/month (circa Jan. 2013) 3. Serving ads for 15,000+ mobile apps (circa Jan. 2013) 4. Needed Big Data Analytics infrastructure ASAP. 14 Friday, April 5, 13
  • 15. Two Weeks From Start to Finish! 15 Friday, April 5, 13
  • 16. Used AWS Products (1)  RDS • Store user information, job status, etc... • Store metadata of our columnar database • Queue of worker (perfectqueue / perfectsched)  EC2 • API servers • Hadoop clusters • Job workers • Using Chef to deploy 16 Friday, April 5, 13
  • 17. Used AWS Products (2)  ELB • Load balancing of API servers • Load balancing of td-agents  S3 • Columnar storage built on top of S3 • MessagePack columnar format • realtime / archive storage • Our Result feature supports S3 output. No EMR, SQS and other products ! 17 Friday, April 5, 13
  • 18. Architecture Breakdown Data Collection Data Store/Analytics Connectivity • Increasing variety of • Remaining complexity in • Required to ensure data sources both traditional DWH connectivity with • No single data schema and Hadoop (very slow existing BI/visualization/ • Lack of streaming data time to market) apps by JDBC, REST collection method • Challenges in scaling and ODBC. • 60% of Big Data project data volume and • Output ot other services, resource consumed expanding cost. e.g. S3, RDBMS, etc. 18 Friday, April 5, 13
  • 19. 1) Data Collection  60% of BI project resource is consumed here  Most ‘underestimated’ and ‘unsexy’ but MOST important  Fluentd: OSS lightweight but robust Log Collector • http://fluentd.org/ 19 Friday, April 5, 13
  • 20. Fluentd the missing log collector fluentd.org 20 Friday, April 5, 13
  • 21. In short  Open sourced log collector written in Ruby  Using rubygems ecosystem for plugins It’s like syslogd, but uses JSON for log messages 21 Friday, April 5, 13
  • 22. Time 2012-02-04 01:33:51 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert 127.0.0.1 127.0.0.1 127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd 127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo 22 Friday, April 5, 13
  • 23. Architecture Pluggable Pluggable Pluggable Input Buffer Output > Forward > Memory > Forward > HTTP > File > File > File tail > Amazon S3 > dstat > MongoDB > ... > ... 23 Friday, April 5, 13
  • 24. Before Fluentd Server1 Server2 Server3 Application Application Application ・・・ ・・・ ・・・ High Latency! must wait for a day... Fluent Log Server 24 Friday, April 5, 13
  • 25. After Fluentd Server1 Server2 Server3 Application Application Application Fluentd ・・・ Fluentd ・・・ Fluentd ・・・ In streaming! Fluentd Fluentd 25 Friday, April 5, 13
  • 26. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3 26 Friday, April 5, 13
  • 27. td-agent  Open sourced distribution package of fluentd  ETL part of Treasure Data  Including useful components • ruby, jemalloc, fluentd • 3rd party gems: td, mongo, webhdfs, etc... • td plugin is for Treasure Data  http://packages.treasure-data.com/ 27 Friday, April 5, 13
  • 28. Treasure Data Service Architecture This! Apache App Treasure Data td-agent columnar data App RDBMS warehouse Other data sources MAPREDUCE JOBS HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST Cluster User BI apps 28 Friday, April 5, 13
  • 29. AWS plugins  S3  SNS  SQS  DynamoDB  foward-aws  RDS http://fluentd.org/plugin/  RedShift  CloudWatch  Yet Another Cloud Watch  CloudWatch Lite 29 Friday, April 5, 13
  • 30. 2) Data Store / Analytics - Columnar Storage 30 Friday, April 5, 13
  • 31. Treasure Data Service Processing Flow Worker Frontend Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Metrics Data for historical analysis for realtime analysis 31 Friday, April 5, 13
  • 33. Structure of Columnar Storages import bulk import SELECT ... Import Storage Bulk Import Storage Realtime Storage Archive Storage merge (every 1 hour) 23c82b0ba3405d4c15aa85d2190e 2013-03-15 00:23:00 912ec80 6d7b1482412ab14f0332b8aee119 2013-03-16 00:01:00 277a259 8a7bc848b2791b8fd603c719e54f ... 0e3d402b17638477c9a7977e7dab ... 33 Friday, April 5, 13
  • 34. Query Language Query Execution Columnar Data Object Storage 34 Friday, April 5, 13
  • 35. 1/4: Compile SQL into MapReduce SQL Statement SELECT COUNT(DISTINCT ip) FROM tbl; Hive SQL - to - MapReduce 35 Friday, April 5, 13
  • 36. 2/4: MapReduce is executed in parallel SELECT COUNT(DISTINCT ip) FROM tbl; cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads) 36 Friday, April 5, 13
  • 37. 3/4: Columnar Data Access SELECT COUNT(DISTINCT ip) FROM tbl; 10Gbps Network Read ONLY the Required Part of Data 37 Friday, April 5, 13
  • 38. 4/4: Object-based Storage 38 Friday, April 5, 13
  • 39. Data first, Schema later SELECT 54 (int) “test” (string) 120 (int) NULL Schema user:int name:string value:int host:int Raw data(JSON) {“user”:54, “name”:”test”, “value”:”120”, “host”:”local”} 39 Friday, April 5, 13
  • 40. 3) Connectivity REST API td-command Query Query Query API Processing JDBC, ODBC Driver Cluster BI apps Web App Treasure Data Result MySQL Columnar Storage S3 … 40 Friday, April 5, 13
  • 41. Multi-Tenancy  All customers share the Hadoop clusters (Multi Data Centers)  Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade Job Submission + Plan Change Local FairScheduler datacenter A Local FairScheduler Global datacenter B Scheduler Local FairScheduler datacenter C On-Demand Resouce Allocation Local FairScheduler datacenter D 41 Friday, April 5, 13
  • 42. Conclusion  Treasure Data • Cloud based Big-data analytics platform • Provide Machete for Big data reporting  Big Data processing • Collect / Store / Analytics / Visualization Our focus!  Our used AWS products • EC2, S3, RDS, ELB • Building Treasure Data specific systems on AWS 42 Friday, April 5, 13
  • 43. Big Data for the Rest of Us www.treasure-data.com | @TreasureData Friday, April 5, 13