SlideShare una empresa de Scribd logo
1 de 60
How to
   Win at Scale
      and its
Influence on People

     Philip (flip) Kromer
    CTO, Infochimps.com
Big Data is Inevitable

It Demands a New Approach
There’s Another Way
There’s Another Way

You’re Going to Have to
        follow It
There’s Another Way

You’re Going to Have to
        follow It

It Might be a Better Way
The Other Way
Massive component count
Federated Truth
   email
                        MySQL            HBase         s3
spreadsheets
                     elasticsearch    elasticsearch
                                                      HDFS
           hipchat
                         redis           mongo

                      MongoDB           log files
 salesforce
                                         zabbix
                      hubspot
                                                       ADP
    Chargify
                                                      BC/BS
                     ZenDesk         google docs
Low Coupling
Reliable   Resilient
• Manage 100s of machines: architecture as code
• Contain system complexity: relentlessly decouple
• Maintain coherency: federated truth
• Manage true costs: optimize for people not machines
• Manage failure & change:resiliency engineering
The Other Way

Declarative, not Homogenous
Decoupled, not Standardized
 Federated, not Centralized
    Simple, not Performant
  Resilient, not Reliable
Declarative
Architecture as Code
           Lightweight           Lightweight
            Dashboard
                                 Dashboard
                                                                      HBase
                                                      HBase


                                                                       API
          Data Transport
                           ESh            flume

                                                   ElasticSearch   ElasticSearch


           Operations               Application


Ironfan
   +               ops               ics.com      Hadoop            On-Demand
                                                                     Hadoop




  Chef
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




   HM NN ZK

              RS                        RS

              RS                        RS

              RS                        RS
provision machine

run state

settings

standard components

cluster-specific

facet groups
Lightweight
  Dashboard
                       Lightweight
                       Dashboard
                                            HBase
                                                            HBase
                                                                         HM NN ZK
                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch

                                                                         RS   RS
                                                         ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




                                                                         RS   RS

                                                                         RS   RS




                                        regionserver                           ssh
                                                                               nfs
                                                         datanode
                                                                               zbx
                                                         stargate              log
                                               tasktracker                      fw

                                                    zookeeper
Wins
from Declarative
   Lightweight           Lightweight
    Dashboard
                         Dashboard
                                                              HBase
                                              HBase


                                                               API
  Data Transport
                   ESh            flume

                                           ElasticSearch   ElasticSearch


   Operations               Application




           ops               ics.com      Hadoop            On-Demand
                                                             Hadoop
Recapitulatable
Portable
Decoupled
Our Stack
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
Our Stack
Our Stack
Engineer : System = 1:10


• >60 distinct components
• 50-150 machines
• 1 ops + 5 hackers + 1 analyst
Self-similar
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
                                                                         HM NN ZK

                                                                         RS   RS                   ssh                      ssh
                                                                                                             hb 2d mstr
                                                                                     hb master     nfs                      nfs
                                                                         RS   RS    namenode       zbx          2d nn       zbx
                                                                                                   log        jobtracker    log
                                                                                     zookeeper
                                                                         RS   RS                    fw        zookeeper      fw
                                                                                                     alpha                        beta



                                                                                    regionserver   ssh       regionserver   ssh
                                                                                                   nfs                      nfs
                                                                                     datanode                 datanode
                                                                                                   zbx                      zbx
                                                                                      stargate     log         stargate     log
                                                                                    tasktracker     fw       tasktracker     fw

                                                                                     zookeeper       gamma                    delta
Example: Scraper

Scraper     disk   tail’er   decorator     sink



 Jobs                                    database
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:
   get_job
   fetch_url
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures
   get_job      reliable
   fetch_url    delivery
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse
   get_job      reliable      raw
   fetch_url    delivery      =>
   dump_to_disk               objects
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse       store
   get_job      reliable      raw         object
   fetch_url    delivery      =>          =>
   dump_to_disk               objects     database
alice


alice

bob

alice

bob


bob
Simple
• Immediately Understandable
• Clear Interface
• Few Moving Parts
Federated
Data Stores in Production

• HBase           • MySQL
• ElasticSearch   • Redis
• Cassandra       • sqlite
• TokyoTyrant     • whisper (graphite)
• SimpleDB        • file system
• MongoDB         • S3
Programs Used for This Talk

• Emacs        • Skitch
• Keynote      • finder
• Preview      • flickr.com
• Chrome       • google image search
• ruby (pry)   • ssh
How’s my Batch Job Going?

• 1 x Job Status
• 1 x Counters & App Metrics
• N x Task Status
• M x Machine System Stats
• 1 x Cloud Status
• 1 x Chef Server
Dataflow is All
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow                                                 Org Chart
Robots are Cheap

People are Important
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
1 Intern, part-time: 	

$   1,500 /mo
Scalability
    is
  People
Monolithic Software




 means Meetings
Meetings




are Death
Decentralize. Decouple.
n^2 law of coupling




100 things   5 + 3 + 2 things
                    + 2 (tax)
n^2 law of coupling
                       2500
                           +
                        900
                           +
                        400
                           +
                        400
                           =
10,000 things    4200 things
to go wrong     to go wrong
Infochimps.com 2011
                  text search

                                Planet of the
                  API acct'g
                                    APIs

 infochimps.com     models


                  A/B testing


                     cloud
                    services
Infochimps.com 2012
           datasets    catalog API

           API docs
                       text search
           content

          dashboards                 Planet of the
                       API acct'g
                                         APIs
 auth &    payment
 layout
           console
                         models

                       A/B testing
             blog
            press         cloud
                         services
          collateral
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
this drawing fits in my head


  datasets      catalog API



 this app fits in my head,
 and my laptop
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
fin.

     http://infochimps.com
http://github.com/infochimps-labs

Más contenido relacionado

La actualidad más candente

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...Shun Nakamura
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Building Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with CascadingBuilding Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with CascadingPaco Nathan
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Paco Nathan
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseYahoo Developer Network
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and rSAP Technology
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillMapR Technologies
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillMapR Technologies
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 

La actualidad más candente (20)

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Building Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with CascadingBuilding Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with Cascading
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Apache drill
Apache drillApache drill
Apache drill
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and r
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
User Group Bi
User Group BiUser Group Bi
User Group Bi
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 

Destacado

Configuration management best practices
Configuration management best practicesConfiguration management best practices
Configuration management best practicesHyunil Shin
 
하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기beom kyun choi
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기beom kyun choi
 
20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제Tae Young Lee
 
Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)Sparta Systems
 

Destacado (7)

Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Configuration management best practices
Configuration management best practicesConfiguration management best practices
Configuration management best practices
 
하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
Zookeeper 소개
Zookeeper 소개Zookeeper 소개
Zookeeper 소개
 
20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제
 
Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)
 

Similar a The Other Way of Doing Big Data

Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Gavin Lin
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBradford Stephens
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Hw09 Making Hadoop Easy On Amazon Web Services
Hw09   Making Hadoop Easy On Amazon Web ServicesHw09   Making Hadoop Easy On Amazon Web Services
Hw09 Making Hadoop Easy On Amazon Web ServicesCloudera, Inc.
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 

Similar a The Other Way of Doing Big Data (20)

Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Hurence
HurenceHurence
Hurence
 
Hw09 Making Hadoop Easy On Amazon Web Services
Hw09   Making Hadoop Easy On Amazon Web ServicesHw09   Making Hadoop Easy On Amazon Web Services
Hw09 Making Hadoop Easy On Amazon Web Services
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 

Más de Infochimps, a CSC Big Data Business

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...Infochimps, a CSC Big Data Business
 

Más de Infochimps, a CSC Big Data Business (17)

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data InfrastructureIronfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data Infrastructure
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Último

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

The Other Way of Doing Big Data

  • 1. How to Win at Scale and its Influence on People Philip (flip) Kromer CTO, Infochimps.com
  • 2. Big Data is Inevitable It Demands a New Approach
  • 4. There’s Another Way You’re Going to Have to follow It
  • 5. There’s Another Way You’re Going to Have to follow It It Might be a Better Way
  • 8. Federated Truth email MySQL HBase s3 spreadsheets elasticsearch elasticsearch HDFS hipchat redis mongo MongoDB log files salesforce zabbix hubspot ADP Chargify BC/BS ZenDesk google docs
  • 10. Reliable Resilient
  • 11. • Manage 100s of machines: architecture as code • Contain system complexity: relentlessly decouple • Maintain coherency: federated truth • Manage true costs: optimize for people not machines • Manage failure & change:resiliency engineering
  • 12. The Other Way Declarative, not Homogenous Decoupled, not Standardized Federated, not Centralized Simple, not Performant Resilient, not Reliable
  • 14. Architecture as Code Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application Ironfan + ops ics.com Hadoop On-Demand Hadoop Chef
  • 15. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS RS RS RS RS
  • 16. provision machine run state settings standard components cluster-specific facet groups
  • 17.
  • 18. Lightweight Dashboard Lightweight Dashboard HBase HBase HM NN ZK API Data Transport ESh flume ElasticSearch RS RS ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop RS RS RS RS regionserver ssh nfs datanode zbx stargate log tasktracker fw zookeeper
  • 19. Wins from Declarative Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 23. Our Stack Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 26. Engineer : System = 1:10 • >60 distinct components • 50-150 machines • 1 ops + 5 hackers + 1 analyst
  • 27. Self-similar Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS ssh ssh hb 2d mstr hb master nfs nfs RS RS namenode zbx 2d nn zbx log jobtracker log zookeeper RS RS fw zookeeper fw alpha beta regionserver ssh regionserver ssh nfs nfs datanode datanode zbx zbx stargate log stargate log tasktracker fw tasktracker fw zookeeper gamma delta
  • 28. Example: Scraper Scraper disk tail’er decorator sink Jobs database
  • 29. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: get_job fetch_url dump_to_disk
  • 30. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures get_job reliable fetch_url delivery dump_to_disk
  • 31. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse get_job reliable raw fetch_url delivery => dump_to_disk objects
  • 32. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse store get_job reliable raw object fetch_url delivery => => dump_to_disk objects database
  • 35.
  • 36. • Immediately Understandable • Clear Interface • Few Moving Parts
  • 38. Data Stores in Production • HBase • MySQL • ElasticSearch • Redis • Cassandra • sqlite • TokyoTyrant • whisper (graphite) • SimpleDB • file system • MongoDB • S3
  • 39. Programs Used for This Talk • Emacs • Skitch • Keynote • finder • Preview • flickr.com • Chrome • google image search • ruby (pry) • ssh
  • 40. How’s my Batch Job Going? • 1 x Job Status • 1 x Counters & App Metrics • N x Task Status • M x Machine System Stats • 1 x Cloud Status • 1 x Chef Server
  • 42. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow
  • 43. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow Org Chart
  • 44. Robots are Cheap People are Important
  • 45. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo
  • 46. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo 1 Intern, part-time: $ 1,500 /mo
  • 47. Scalability is People
  • 48.
  • 52. n^2 law of coupling 100 things 5 + 3 + 2 things + 2 (tax)
  • 53. n^2 law of coupling 2500 + 900 + 400 + 400 = 10,000 things 4200 things to go wrong to go wrong
  • 54.
  • 55. Infochimps.com 2011 text search Planet of the API acct'g APIs infochimps.com models A/B testing cloud services
  • 56. Infochimps.com 2012 datasets catalog API API docs text search content dashboards Planet of the API acct'g APIs auth & payment layout console models A/B testing blog press cloud services collateral
  • 57. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 58. this drawing fits in my head datasets catalog API this app fits in my head, and my laptop
  • 59. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 60. fin. http://infochimps.com http://github.com/infochimps-labs

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. This is on a 15-person organization\nFederated, meaning the data is semantically disparate\n
  9. \n
  10. \n
  11. people are walking around as if we used to have one kind of database and now we have two\nThe important fact isn’t that one of them is sharded \nThe important fact is that they’re proliferating -- and that’s a good thing.\n
  12. Google, Facebook, Amazon had to solve the scalability problem\n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  39. Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  46. $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n