Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Google Cloud and Data Pipeline Patterns

3.948 visualizaciones

Publicado el

deck from talk for YOW Nights Australia, on GCP (Google Cloud Platform) and Data Pipeline patterns

Publicado en: Tecnología
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Google Cloud and Data Pipeline Patterns

  1. 1. 1 Google Cloud & Data Pipeline Patterns @LynnLangit
  2. 2. 2 Google Cloud in Australia Data center here in 2017
  3. 3. 3 GCP and Patterns Developer-first • Fast, flexible and cheap • Virtual Machines / GCE • Storage / GCS Servers ➡ Containers ➡ Functions • Data Warehouse • Internet of Things (IoT) • Bioinformatics 1. Modern Cloud by Example 2. GCP Data Pipeline Patterns **And also, something New…
  4. 4. 4Confidential & ProprietaryGoogle Cloud Platform 4 Demo – Storage / GCS
  5. 5. 5
  6. 6. 6Confidential & ProprietaryGoogle Cloud Platform 6 Demo – Virtual Machines / GCE
  7. 7. 7 Virtual Machines / GCE • Fast • Spin up in seconds • Tools - SSH, gcloud console • Flexible • Custom sizing – slider  • OS variety – Linux or Windows • Cheap and Simple • Auto discount for use • Pre-emptible Storage / GCS • Fast • Very fast within region • Tools included • Flexible • 4 storage options • Simple to use / understand • Cheap • Pricing by type
  8. 8. 8
  9. 9. 9 Pipeline Architectures
  10. 10. 10Google Cloud Platform 10 Data Warehousing
  11. 11. 11 Big Data > Data Warehouse Reference table Query / Compute BigQuery Customer Lists / Reference Data Export Ad Data Cloud Storage Id matching Cloud Dataflow Marketing List DoubleClick Campaign Manager Google Analytics Relevant Users Cloud Storage Analysts DataStudio 360 Dashboards
  12. 12. 12Confidential & ProprietaryGoogle Cloud Platform 12 Demo – BigQuery
  13. 13. 13 Batch Streaming Big Data > Log Processing Log Storage Cloud Storage Log Streaming Cloud Pub/Sub Log Analytics BigQuery Log Processing Cloud Dataflow
  14. 14. 14 Cloud Dataflow / Apache Beam
  15. 15. 15 Big Data > Time Series Analysis Batch Storage BigQuery Storage Cloud Storage Time Series Processing Cloud Dataflow Analysis Cloud Datalab Storage Cloud Bigtable* Processing Cloud Dataproc Time Series Files Cloud Storage ML Cloud ML Streaming Time Series Streaming Cloud Pub/Sub *Note: Use Bigtable with NoSQL workloads of 1 TB or more
  16. 16. 16 Streaming Big Data > Complex Event Processing Cloud Apps Compute Engine Streamin g Batch Push to Devices App Engine Rules Engine Cloud Dataflow Data Analysis Cloud Datalab Mobile Devices Push Notifications Report & Share Business Analysis Cloud Apps Compute Engine On-Premises Databases On-Premises Applications Processed Events Cloud Bigtable Events Time Series Data Warehouse BigQuery Execution Results Streaming Cloud Pub/Sub Transactions Processing Cloud Dataflow Transaction Streams Messaging Cloud Pub/Sub Rules Actions ETL Cloud Dataflow Transform Data Cloud Data Cloud Storage Rules Engine Cloud Dataproc
  17. 17. 1717 Files • Cloud Storage Compute • Big Query • Cloud Dataflow Other • 3rd party ETL • 3rd party dashboards Core Products for Data Warehousing More on Big Query… • Interactive or Batch query • ANSI SQL compliant • Cost control - Purchase ‘slots’ • NoOps Data Warehouse
  18. 18. 18Google Cloud Platform 18 Big Relational
  19. 19. 1919 What is Spanner?
  20. 20. 20Confidential & ProprietaryGoogle Cloud Platform 20 Demo – Cloud Spanner
  21. 21. 21Google Cloud Platform 21 Internet of Things
  22. 22. 22 Internet of Things > MQTT IoT Warehouse BigQuery IoT Application App Engine Stream Analytics Cloud Dataflow IoT Topic Cloud Pub/Sub MQTT Devices Auto-scaled Broker Tier Custom MQTT broker MQTT Broker Compute Engine RabbitMQ Cloud Load Balancing
  23. 23. 23 Ingest Pipelines Storage Analytics Application & Presentation Standard Devices HTTPS Constraine d Devices Non-TCP e.g. BLE Gateway Internet of Things > Sensor stream ingest and processing App Engine Container Engine Cloud Storage Cloud Pub/Sub Cloud Dataflow Monitoring Logging Cloud Dataflow Cloud Datastore Cloud Bigtable BigQuer y Cloud Dataproc Cloud Datalab Compute Engine
  24. 24. 24 Retail > Beacons and Targeted Marketing Events Cloud Bigtable Proximity Events Analytics BigQuery Data Warehouse Messaging Cloud Pub/Sub Proximity Streams Processing Cloud Dataflow Stream Processing Notifications App Engine Push to Devices Mobile-Push Notifications Office Business Systems Beacons Proximity Notifications Messaging Cloud Pub/Sub Queued Notifications
  25. 25. 2525 Files & Storage • Cloud Storage • Big Table Compute & Ingest • Cloud Pub/Sub • Big Query • Cloud Dataflow Core Products for IoT
  26. 26. 26Confidential & ProprietaryGoogle Cloud Platform 26 Demo – Machine Learning
  27. 27. 27Google Cloud Platform 27 Bioinformatics
  28. 28. 28 Patient Analytics Life Sciences > Patient Monitoring Analytics Process Data Prediction API Ingest Cloud Pub/Sub Storage Cloud Bigtable Alerts Notifications Cloud Pub/Sub Health Care Professional Patient Monitors (pulse, blood sugar, exercise)
  29. 29. 29 Private Datasets Public Datasets Life Sciences > Variant Analysis MSSNG Autism Cloud Storage Scientist High Throughput Genome Sequencers 1000 Genomes Cloud Storage Patient Data Cloud Storage Illumina Platform Cloud Storage Ref Genomes Cloud Storage TCGA Cloud Storage Analytics Online Analytics BigQuery Batch Analytics Cloud Dataflow Lab Notebooks Cloud Datalab Data Ingest Genomics BAM FAST Q
  30. 30. 30 Ingest Elastic Cluster Storage Analytics Life Sciences > Genomics, Secondary Analysis Carrier Interconnect High Throughput Genome Sequencer s Scientist Raw Datafiles Cloud Storage Processed Data Cloud Storage Metadata Cloud SQL Lab notebooks Cloud Datalab HPC Cluster Compute Engine 10 Nodes Ingest Server Compute Engine Online Analytics BigQuery Cloud Load Balancing Cloud Network
  31. 31. 3131 • Cloud Storage • Big Query • Compute Engine • Cloud Dataflow • Public datasets on GCP Core Products for Bioinformatics
  32. 32. 33 “The Future is Functional” @LynnLangit