SlideShare una empresa de Scribd logo
1 de 56
Serverless Data Architecture at
scale on Google Cloud Platform
Lorenzo Ridi
MILAN 25-26 NOVEMBER 2016
What’s the date today?
Black Friday (ˈblæk fraɪdɪ)
noun
The day following Thanksgiving Day in the
United States. Since 1932, it has been
regarded as the beginning of the Christmas
shopping season.
Black Friday in the US
2012 - 2016
source: Google Trends, November 23rd 2016
Black Friday in Italy
2012 - 2016
source: Google Trends, November 23rd 2016
What are we doing
Processing
+ analytics
Tweets about
black friday
insights
How we’re gonna do it
How we’re gonna do it
Pub/Sub
Container
Engine
(Kubernetes)
How we’re gonna do it
What is Google Cloud Pub/Sub?
● Google Cloud Pub/Sub is a
fully-managed real-time
messaging service.
○ Guaranteed delivery
■ “At least once” semantics
○ Reliable at scale
■ Messages are replicated in
different zones
From Twitter to Pub/Sub
$ gcloud beta pubsub topics create blackfridaytweets
Created topic [blackfridaytweets].
SHELL
From Twitter to Pub/Sub
?
Pub/Sub Topic
Subscription A
Subscription B
Subscription C
Consumer A
Consumer B
Consumer C
From Twitter to Pub/Sub
● Simple Python application using the TweePy library
# somewhere in the code, track a given set of keywords
stream = Stream(auth, listener)
stream.filter(track=['blackfriday', [...]])
[...]
# somewhere else, write messages to Pub/Sub
for line in data_lines:
pub = base64.urlsafe_b64encode(line)
messages.append({'data': pub})
body = {'messages': messages}
resp = client.projects().topics().publish(
topic='blackfridaytweets', body=body).execute(num_retries=NUM_RETRIES)
PYTHON
From Twitter to Pub/Sub
App
+
Libs
VM
From Twitter to Pub/Sub
App
+
Libs
VM
From Twitter to Pub/Sub
App
+
Libs
From Twitter to Pub/Sub
App
+
Libs
Container
From Twitter to Pub/Sub
App
+
Libs
Container
FROM google/python
RUN pip install --upgrade pip
RUN pip install pyopenssl ndg-httpsclient pyasn1
RUN pip install tweepy
RUN pip install --upgrade google-api-python-client
RUN pip install python-dateutil
ADD twitter-to-pubsub.py /twitter-to-pubsub.py
ADD utils.py /utils.py
CMD python twitter-to-pubsub.py
DOCKERFILE
From Twitter to Pub/Sub
App
+
Libs
Container
From Twitter to Pub/Sub
App
+
Libs
Container Pod
What is Kubernetes (K8S)?
● An orchestration tool for managing a
cluster of containers across multiple
hosts
○ Scaling, rolling upgrades, A/B testing, etc.
● Declarative – not procedural
○ Auto-scales and self-heals to desired
state
● Supports multiple container runtimes,
currently Docker and CoreOS Rkt
● Open-source: github.com/kubernetes
From Twitter to Pub/Sub
App
+
Libs
Container Pod
apiVersion: v1
kind: ReplicationController
metadata:
[...]
Spec:
replicas: 1
template:
metadata:
labels:
name: twitter-stream
spec:
containers:
- name: twitter-to-pubsub
image: gcr.io/codemotion-2016-demo/pubsub_pipeline
env:
- name: PUBSUB_TOPIC
value: ...
YAML
From Twitter to Pub/Sub
App
+
Libs
Container Pod
From Twitter to Pub/Sub
App
+
Libs
Container Pod Node
Node
From Twitter to Pub/Sub
Pod A Pod B
From Twitter to Pub/Sub
Node 1
Node 2
From Twitter to Pub/Sub
$ gcloud container clusters create codemotion-2016-demo-cluster
Creating cluster cluster-1...done.
Created [...projects/codemotion-2016-demo/.../clusters/codemotion-2016-demo-cluster].
$ gcloud container clusters get-credentials codemotion-2016-demo-cluster
Fetching cluster endpoint and auth data.
kubeconfig entry generated for cluster-1.
$ kubectl create -f ~/git/kube-pubsub-bq/pubsub/twitter-stream.yaml
replicationcontroller “twitter-stream” created.
SHELL
Pub/Sub
Kubernetes
How we’re gonna do it
Pub/Sub
Kubernetes
Dataflow
How we’re gonna do it
Pub/Sub
Kubernetes
Dataflow
BigQuery
How we’re gonna do it
What is Google Cloud Dataflow?
● Cloud Dataflow is a collection
of open source SDKs to
implement parallel processing
pipelines.
○ same programming model for
streaming and batch pipelines
● Cloud Dataflow is a managed
service to run parallel
processing pipelines on
Google Cloud Platform
What is Google BigQuery?
● Google BigQuery is a fully-
managed Analytic Data
Warehouse solution allowing
real-time analysis of Petabyte-
scale datasets.
● Enterprise-grade features
○ Batch and streaming (100K
rows/sec) data ingestion
○ JDBC/ODBC connectors
○ Rich SQL-2011-compliant query
language
○ Supports updates and deletes
new!
new!
From Pub/Sub to BigQuery
Pub/Sub Topic
Subscription
Read tweets
from
Pub/Sub
Format
tweets for
BigQuery
Write tweets
on BigQuery
BigQuery
Table
Dataflow Pipeline
From Pub/Sub to BigQuery
● A Dataflow pipeline is a Java program.
// TwitterProcessor.java
public static void main(String[] args) {
Pipeline p = Pipeline.create();
PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets"));
PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat()));
formattedTweets.apply(BigQueryIO.Write.to(tableReference));
p.run();
}
JAVA
From Pub/Sub to BigQuery
● A Dataflow pipeline is a Java program.
// TwitterProcessor.java
// Do Function (to be used within a ParDo)
private static final class DoFormat extends DoFn<String, TableRow> {
private static final long serialVersionUID = 1L;
@Override
public void processElement(DoFn<String, TableRow>.ProcessContext c) {
c.output(createTableRow(c.element()));
}
}
// Helper method
private static TableRow createTableRow(String tweet) throws IOException {
return JacksonFactory.getDefaultInstance().fromString(tweet, TableRow.class);
}
JAVA
From Pub/Sub to BigQuery
● Use Maven to build, deploy or update the Pipeline.
$ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor
-Dexec.args="--streaming"
[...]
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11-
19_15_49_53-5264074060979116717
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.131s
[INFO] Finished at: Sun Nov 20 00:49:54 CET 2016
[INFO] Final Memory: 28M/362M
[INFO] ------------------------------------------------------------------------
SHELL
From Pub/Sub to BigQuery
● You can monitor your pipelines from Cloud Console.
From Pub/Sub to BigQuery
● Data start flowing into BigQuery tables. You can run queries
from the CLI or the Web Interface.
Pub/Sub
Kubernetes
Dataflow
BigQuery
How we’re gonna do it
Pub/Sub
Kubernetes
Dataflow
BigQuery
Data
Studio
How we’re gonna do it
Pub/Sub
Kubernetes
Dataflow
BigQuery
How we’re gonna do it
Data
Studio
Pub/Sub
Kubernetes
Dataflow
BigQuery
How we’re gonna do it
Natural
Language
API
Data
Studio
Sentiment Analysis with Natural Language API
Polarity: [-1,1]
Magnitude: [0,+inf)
Text
Sentiment Analysis with Natural Language API
Polarity: [-1,1]
Magnitude: [0,+inf)
Text
sentiment = polarity x magnitude
Sentiment Analysis with Natural Language API
Pub/Sub Topic
Read tweets
from
Pub/Sub
Write tweets
on BigQuery BigQuery
Tables
Dataflow Pipeline
Filter and
Evaluate
sentiment
Format
tweets for
BigQuery
Write tweets
on BigQuery
Format
tweets for
BigQuery
From Pub/Sub to BigQuery
● We just add the additional necessary steps.
// TwitterProcessor.java
public static void main(String[] args) {
Pipeline p = Pipeline.create();
PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets"));
PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess()));
PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat()));
formSentTweets.apply(BigQueryIO.Write.to(sentTableReference));
PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat()));
formattedTweets.apply(BigQueryIO.Write.to(tableReference));
p.run();
}
JAVA
PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess()));
PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat()));
formSentTweets.apply(BigQueryIO.Write.to(sentTableReference));
From Pub/Sub to BigQuery
● The update process preserves all in-flight data.
$ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor
-Dexec.args="--streaming --update --jobName=twitterprocessor-lorenzo-1107222550"
[...]
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11-
19_15_49_53-5264074060979116717
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.131s
[INFO] Finished at: Sun Nov 20 00:49:54 CET 2016
[INFO] Final Memory: 28M/362M
[INFO] ------------------------------------------------------------------------
SHELL
From Pub/Sub to BigQuery
Pub/Sub
Kubernetes
Dataflow
BigQuery
Data
Studio
We did it!
Natural
Language
API
Pub/Sub
Kubernetes
Dataflow
BigQuery
Data
Studio
We did it!
Natural
Language
API
Live demo
Polarity: -1.0
Magnitude: 1.5
Polarity: -1.0
Magnitude: 2.1
Thank you!

Más contenido relacionado

La actualidad más candente

Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaAltinity Ltd
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudlohitvijayarenu
 
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...InfluxData
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloudlohitvijayarenu
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormMike Lohmann
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Vrushali Channapattan
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...InfluxData
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Databricks
 
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...InfluxData
 

La actualidad más candente (20)

Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...
Ronald McCollam [Grafana] | Flux Queries in Grafana 7 | InfluxDays Virtual Ex...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...
How an Open Marine Standard, InfluxDB and Grafana Are Used to Improve Boating...
 

Destacado

A Crush on Design Thinking
A Crush on Design ThinkingA Crush on Design Thinking
A Crush on Design ThinkingMatteo Burgassi
 
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Codemotion
 
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Codemotion
 
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...Codemotion
 
Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Codemotion
 
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Codemotion
 
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Codemotion
 
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...Codemotion
 
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016Codemotion
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Codemotion
 
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016Codemotion
 
Outthink: machines coping with humans. A journey into the cognitive world - E...
Outthink: machines coping with humans. A journey into the cognitive world - E...Outthink: machines coping with humans. A journey into the cognitive world - E...
Outthink: machines coping with humans. A journey into the cognitive world - E...Codemotion
 
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016Codemotion
 
Bias Driven Development - Mario Fusco - Codemotion Milan 2016
Bias Driven Development - Mario Fusco - Codemotion Milan 2016Bias Driven Development - Mario Fusco - Codemotion Milan 2016
Bias Driven Development - Mario Fusco - Codemotion Milan 2016Codemotion
 
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...Codemotion
 
Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Codemotion
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...Codemotion
 
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...Codemotion
 
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...Codemotion
 
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016Codemotion
 

Destacado (20)

A Crush on Design Thinking
A Crush on Design ThinkingA Crush on Design Thinking
A Crush on Design Thinking
 
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
Un anno di Front End Meetup! Gioie, dolori e festeggiamenti! - Giacomo Zinett...
 
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
Graph databases and the Panama Papers - Stefan Armbruster - Codemotion Milan ...
 
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
We started with RoR, C++, C#, nodeJS and... at the end we chose GO - Maurizio...
 
Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016Coding Culture - Sven Peters - Codemotion Milan 2016
Coding Culture - Sven Peters - Codemotion Milan 2016
 
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
Getting developers hooked on your API - Nicolas Garnier - Codemotion Amsterda...
 
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
Reactive Thinking in iOS Development - Pedro Piñera Buendía - Codemotion Amst...
 
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...
Impostor syndrome and individual competence - Jessica Rose - Codemotion Amste...
 
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016
UGIdotNET Meetup - Andrea Saltarello - Codemotion Milan 2016
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
 
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016
Can Super Coders be a reality? - Atreyam Sharma - Codemotion Milan 2016
 
Outthink: machines coping with humans. A journey into the cognitive world - E...
Outthink: machines coping with humans. A journey into the cognitive world - E...Outthink: machines coping with humans. A journey into the cognitive world - E...
Outthink: machines coping with humans. A journey into the cognitive world - E...
 
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016
Build Apps for Apple Watch - Francesco Novelli - Codemotion Milan 2016
 
Bias Driven Development - Mario Fusco - Codemotion Milan 2016
Bias Driven Development - Mario Fusco - Codemotion Milan 2016Bias Driven Development - Mario Fusco - Codemotion Milan 2016
Bias Driven Development - Mario Fusco - Codemotion Milan 2016
 
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...
Angular Rebooted: Components Everywhere - Carlo Bonamico, Sonia Pini - Codemo...
 
Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...Higher order infrastructure: from Docker basics to cluster management - Nicol...
Higher order infrastructure: from Docker basics to cluster management - Nicol...
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
 
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...
Sviluppare applicazioni cross-platform con Xamarin Forms e il framework Prism...
 
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...
Cross-platform Apps using Xamarin and MvvmCross - Martijn van Dijk - Codemoti...
 
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016
Il Bot di Codemotion - Emanuele Capparelli - Codemotion Milan 2016
 

Similar a Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi - Codemotion Milan 2016

Gitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a proGitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a prosparkfabrik
 
Raspberry pi and Google Cloud
Raspberry pi and Google CloudRaspberry pi and Google Cloud
Raspberry pi and Google CloudFaisal Mehmood
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Jian-Hong Pan
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingSimon Su
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxNebulaworks
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneyWeaveworks
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024Cloud Native NoVA
 
Introduction to Cloud Foundry #JJUG
Introduction to Cloud Foundry #JJUGIntroduction to Cloud Foundry #JJUG
Introduction to Cloud Foundry #JJUGToshiaki Maki
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueHenning Jacobs
 
Docker-Ha Noi- Year end 2015 party
Docker-Ha Noi- Year end 2015 partyDocker-Ha Noi- Year end 2015 party
Docker-Ha Noi- Year end 2015 partyDocker-Hanoi
 
Docker- Ha Noi - Year end 2015 party
Docker- Ha Noi - Year end 2015 partyDocker- Ha Noi - Year end 2015 party
Docker- Ha Noi - Year end 2015 partyVan Phuc
 
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptx
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptxSH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptx
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptxMongoDB
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloudwesley chun
 
New Features in MongoDB Atlas
New Features in MongoDB AtlasNew Features in MongoDB Atlas
New Features in MongoDB AtlasMongoDB
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composerBruce Kuo
 
Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)Natalia Efimtseva
 

Similar a Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi - Codemotion Milan 2016 (20)

Gitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a proGitlab ci e kubernetes, build test and deploy your projects like a pro
Gitlab ci e kubernetes, build test and deploy your projects like a pro
 
Raspberry pi and Google Cloud
Raspberry pi and Google CloudRaspberry pi and Google Cloud
Raspberry pi and Google Cloud
 
TIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google CloudTIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google Cloud
 
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 
CI/CD with Github Actions
CI/CD with Github ActionsCI/CD with Github Actions
CI/CD with Github Actions
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
Introduction to Cloud Foundry #JJUG
Introduction to Cloud Foundry #JJUGIntroduction to Cloud Foundry #JJUG
Introduction to Cloud Foundry #JJUG
 
Kubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native PragueKubernetes + Python = ❤ - Cloud Native Prague
Kubernetes + Python = ❤ - Cloud Native Prague
 
Docker-Ha Noi- Year end 2015 party
Docker-Ha Noi- Year end 2015 partyDocker-Ha Noi- Year end 2015 party
Docker-Ha Noi- Year end 2015 party
 
Docker- Ha Noi - Year end 2015 party
Docker- Ha Noi - Year end 2015 partyDocker- Ha Noi - Year end 2015 party
Docker- Ha Noi - Year end 2015 party
 
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptx
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptxSH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptx
SH 1 - SES 4 - Microservices - Andrew Morgan TLV.pptx
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloud
 
New Features in MongoDB Atlas
New Features in MongoDB AtlasNew Features in MongoDB Atlas
New Features in MongoDB Atlas
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 
Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 

Más de Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Más de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi - Codemotion Milan 2016

  • 1. Serverless Data Architecture at scale on Google Cloud Platform Lorenzo Ridi MILAN 25-26 NOVEMBER 2016
  • 3.
  • 4. Black Friday (ˈblæk fraɪdɪ) noun The day following Thanksgiving Day in the United States. Since 1932, it has been regarded as the beginning of the Christmas shopping season.
  • 5. Black Friday in the US 2012 - 2016 source: Google Trends, November 23rd 2016
  • 6. Black Friday in Italy 2012 - 2016 source: Google Trends, November 23rd 2016
  • 7. What are we doing Processing + analytics Tweets about black friday insights
  • 11. What is Google Cloud Pub/Sub? ● Google Cloud Pub/Sub is a fully-managed real-time messaging service. ○ Guaranteed delivery ■ “At least once” semantics ○ Reliable at scale ■ Messages are replicated in different zones
  • 12. From Twitter to Pub/Sub $ gcloud beta pubsub topics create blackfridaytweets Created topic [blackfridaytweets]. SHELL
  • 13. From Twitter to Pub/Sub ? Pub/Sub Topic Subscription A Subscription B Subscription C Consumer A Consumer B Consumer C
  • 14. From Twitter to Pub/Sub ● Simple Python application using the TweePy library # somewhere in the code, track a given set of keywords stream = Stream(auth, listener) stream.filter(track=['blackfriday', [...]]) [...] # somewhere else, write messages to Pub/Sub for line in data_lines: pub = base64.urlsafe_b64encode(line) messages.append({'data': pub}) body = {'messages': messages} resp = client.projects().topics().publish( topic='blackfridaytweets', body=body).execute(num_retries=NUM_RETRIES) PYTHON
  • 15. From Twitter to Pub/Sub App + Libs
  • 16. VM From Twitter to Pub/Sub App + Libs
  • 17. VM From Twitter to Pub/Sub App + Libs
  • 18. From Twitter to Pub/Sub App + Libs Container
  • 19. From Twitter to Pub/Sub App + Libs Container FROM google/python RUN pip install --upgrade pip RUN pip install pyopenssl ndg-httpsclient pyasn1 RUN pip install tweepy RUN pip install --upgrade google-api-python-client RUN pip install python-dateutil ADD twitter-to-pubsub.py /twitter-to-pubsub.py ADD utils.py /utils.py CMD python twitter-to-pubsub.py DOCKERFILE
  • 20. From Twitter to Pub/Sub App + Libs Container
  • 21. From Twitter to Pub/Sub App + Libs Container Pod
  • 22. What is Kubernetes (K8S)? ● An orchestration tool for managing a cluster of containers across multiple hosts ○ Scaling, rolling upgrades, A/B testing, etc. ● Declarative – not procedural ○ Auto-scales and self-heals to desired state ● Supports multiple container runtimes, currently Docker and CoreOS Rkt ● Open-source: github.com/kubernetes
  • 23. From Twitter to Pub/Sub App + Libs Container Pod apiVersion: v1 kind: ReplicationController metadata: [...] Spec: replicas: 1 template: metadata: labels: name: twitter-stream spec: containers: - name: twitter-to-pubsub image: gcr.io/codemotion-2016-demo/pubsub_pipeline env: - name: PUBSUB_TOPIC value: ... YAML
  • 24. From Twitter to Pub/Sub App + Libs Container Pod
  • 25. From Twitter to Pub/Sub App + Libs Container Pod Node
  • 26. Node From Twitter to Pub/Sub Pod A Pod B
  • 27. From Twitter to Pub/Sub Node 1 Node 2
  • 28. From Twitter to Pub/Sub $ gcloud container clusters create codemotion-2016-demo-cluster Creating cluster cluster-1...done. Created [...projects/codemotion-2016-demo/.../clusters/codemotion-2016-demo-cluster]. $ gcloud container clusters get-credentials codemotion-2016-demo-cluster Fetching cluster endpoint and auth data. kubeconfig entry generated for cluster-1. $ kubectl create -f ~/git/kube-pubsub-bq/pubsub/twitter-stream.yaml replicationcontroller “twitter-stream” created. SHELL
  • 29.
  • 33. What is Google Cloud Dataflow? ● Cloud Dataflow is a collection of open source SDKs to implement parallel processing pipelines. ○ same programming model for streaming and batch pipelines ● Cloud Dataflow is a managed service to run parallel processing pipelines on Google Cloud Platform
  • 34. What is Google BigQuery? ● Google BigQuery is a fully- managed Analytic Data Warehouse solution allowing real-time analysis of Petabyte- scale datasets. ● Enterprise-grade features ○ Batch and streaming (100K rows/sec) data ingestion ○ JDBC/ODBC connectors ○ Rich SQL-2011-compliant query language ○ Supports updates and deletes new! new!
  • 35. From Pub/Sub to BigQuery Pub/Sub Topic Subscription Read tweets from Pub/Sub Format tweets for BigQuery Write tweets on BigQuery BigQuery Table Dataflow Pipeline
  • 36. From Pub/Sub to BigQuery ● A Dataflow pipeline is a Java program. // TwitterProcessor.java public static void main(String[] args) { Pipeline p = Pipeline.create(); PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets")); PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat())); formattedTweets.apply(BigQueryIO.Write.to(tableReference)); p.run(); } JAVA
  • 37. From Pub/Sub to BigQuery ● A Dataflow pipeline is a Java program. // TwitterProcessor.java // Do Function (to be used within a ParDo) private static final class DoFormat extends DoFn<String, TableRow> { private static final long serialVersionUID = 1L; @Override public void processElement(DoFn<String, TableRow>.ProcessContext c) { c.output(createTableRow(c.element())); } } // Helper method private static TableRow createTableRow(String tweet) throws IOException { return JacksonFactory.getDefaultInstance().fromString(tweet, TableRow.class); } JAVA
  • 38. From Pub/Sub to BigQuery ● Use Maven to build, deploy or update the Pipeline. $ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor -Dexec.args="--streaming" [...] INFO: To cancel the job using the 'gcloud' tool, run: > gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11- 19_15_49_53-5264074060979116717 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18.131s [INFO] Finished at: Sun Nov 20 00:49:54 CET 2016 [INFO] Final Memory: 28M/362M [INFO] ------------------------------------------------------------------------ SHELL
  • 39. From Pub/Sub to BigQuery ● You can monitor your pipelines from Cloud Console.
  • 40. From Pub/Sub to BigQuery ● Data start flowing into BigQuery tables. You can run queries from the CLI or the Web Interface.
  • 43.
  • 45. Pub/Sub Kubernetes Dataflow BigQuery How we’re gonna do it Natural Language API Data Studio
  • 46. Sentiment Analysis with Natural Language API Polarity: [-1,1] Magnitude: [0,+inf) Text
  • 47. Sentiment Analysis with Natural Language API Polarity: [-1,1] Magnitude: [0,+inf) Text sentiment = polarity x magnitude
  • 48. Sentiment Analysis with Natural Language API Pub/Sub Topic Read tweets from Pub/Sub Write tweets on BigQuery BigQuery Tables Dataflow Pipeline Filter and Evaluate sentiment Format tweets for BigQuery Write tweets on BigQuery Format tweets for BigQuery
  • 49. From Pub/Sub to BigQuery ● We just add the additional necessary steps. // TwitterProcessor.java public static void main(String[] args) { Pipeline p = Pipeline.create(); PCollection<String> tweets = p.apply(PubsubIO.Read.topic("...blackfridaytweets")); PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess())); PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat())); formSentTweets.apply(BigQueryIO.Write.to(sentTableReference)); PCollection<TableRow> formattedTweets = tweets.apply(ParDo.of(new DoFormat())); formattedTweets.apply(BigQueryIO.Write.to(tableReference)); p.run(); } JAVA PCollection<String> sentTweets = tweets.apply(ParDo.of(new DoFilterAndProcess())); PCollection<TableRow> formSentTweets = sentTweets.apply(ParDo.of(new DoFormat())); formSentTweets.apply(BigQueryIO.Write.to(sentTableReference));
  • 50. From Pub/Sub to BigQuery ● The update process preserves all in-flight data. $ mvn compile exec:java -Dexec.mainClass=it.noovle.dataflow.TwitterProcessor -Dexec.args="--streaming --update --jobName=twitterprocessor-lorenzo-1107222550" [...] INFO: To cancel the job using the 'gcloud' tool, run: > gcloud alpha dataflow jobs --project=codemotion-2016-demo cancel 2016-11- 19_15_49_53-5264074060979116717 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18.131s [INFO] Finished at: Sun Nov 20 00:49:54 CET 2016 [INFO] Final Memory: 28M/362M [INFO] ------------------------------------------------------------------------ SHELL
  • 51. From Pub/Sub to BigQuery

Notas del editor

  1. Yes, today is BLACK FRIDAY!
  2. Black Friday is the biggest selling event in the US, and since 1932 it demarcated the begin of the Christmas shopping season.
  3. Interest about Black Friday in the US remained unchanged in the last years, according to Google Trends.
  4. However, if we perform the same analysis in Italy, we can see that interest about Black Friday in Italy grew exponentially. That’s why there is no company (even Worldwide) who can ignore this day. Companies can take advantage of Black Friday to advertise themselves and sell more We are going to step into the shoes of a company that wants to propose some deals specific to Black Friday, so the problem is: how to make and on which channels we have to advertise the deals to maximize revenues?
  5. Social Networks like Twitter can help a lot about analyzing people trends and opinions and supporting us to making the right decision. So today we are focusing on Twitter. <selected hashtags>
  6. This is how we want to do this. The story is more or less always the same: we get some data we process it (removing unnecessary things, transforming others) we store the data in a format that is good for analysis. Complexities: We do not have so much time We have to make it work even if we don’t know the traffic we will have to handle (how high is the peak we saw before?)
  7. Our solution is to adopt a serverless architecture: We want to use services that allow us to concentrate on our solution, rather than config files and boilerplate code We do not have to configure or manage the infrastructure We choose Google Cloud Platform because its Data Analytics offering is based exactly on these foundations. Today we are going to explore almost all the tools of GCP for Data Analytics. So, let’s start this whirlwind tour!
  8. Let’s start from the beginning. For the ingestion part we are going to use two technologies: Google Container Engine, the technology that powers Kubernetes-as-a-service (who knows Kubernetes? Containers/Docker?) on GCP Google Cloud Pub/Sub, a middleware solution on the Cloud
  9. Pub/Sub is a fully managed real time messaging service. I create a topic, I can send messages to a topic, if I’m interested in a topic i can subscribe to it and I start receiving messages. Nothing new, other technologies do this. However, Pub/Sub has a few strong points: It is a service, i do not have to configure a cluster It is reliable by design It keeps being reliable at scale
  10. How do I create a Pub/Sub topic? Without going much into detail, it is a one-liner. gcloud is the command line tool that manages all Google Cloud Platform resources.
  11. This is how we are going to use Pub/Sub: we implement something that converts tweets into messages, and by means of Pub/Sub we can distribute these tweets to several subscribers with ease. Pub/Sub decouples producers and consumers: they do not have to know each others It improves the reliability of the overall system, acting as a shock-absorber even if some parts of the following infrastructure has problems. We have a missing part here: how do we capture tweets and transform them in messages?
  12. We write a simple Python app that uses the TweePy library to interact with Twitter Streaming API Somewhere we use the stream.filter method to track a list of keywords somewhere else (in the listener of TweePy events) we collect tweets, packaging them and sending them out as Pub/Sub messages (note the Pub/Sub topic name)
  13. We wrote the app, we tested it. Now we have to deploy it (and its library) somewhere. Our first temptation would be...
  14. To start a Virtual Machine, install python on it and make it run there. However...
  15. This is not the solution we want. It doesn’t scale It is hard to make fault-tolerant (if the VM crashes it doesn’t restart) It is difficult to deploy and to update (no rolling update)
  16. A much better solution is to use containers. Containers provide an higher level of abstraction (OS-level rather than HW-level), that allows us to create portable and isolated deployments that can be installed easily on on-prem or Cloud environments.
  17. We create a docker image using a dockerfile, which is a sequence of instruction that, starting from a base image, add some pieces to build our personal solution. In this case we: Install necessary libraries Add our Python files Invoke our Python executable file (the container will run as long as this command does)
  18. We build an image based on the dockerfile and we are done. But, a container solves the problem of deploy and portability, but not the one of scaling and management.
  19. We need a further layer of abstraction, and this level of abstraction is provided by Kubernetes.
  20. Kubernetes is an open source orchestration tool for managing clusters of containers. It introduces all those features that are missing from “standard” container deployments. A cool thing about Kubernetes is that it is completely declarative - you do not specify that you want one more node or one less pod, but you define a desired state and the Kubernetes Master works to reach and maintain that state.
  21. This is what we deploy on Kubernetes: a ReplicationController (or a ReplicaSet/Deployment in recent versions) is the definition of a group of container replicas that you want concurrently running. For the sake of our example we need only one replica, but also in this case a ReplicationController is useful - as it ensures that this single replica is always up and running.
  22. So we wrap our container into a Pod. The Pod is the replica unit of Kubernetes.
  23. Each Pod runs on a cluster node, but...
  24. ...more than one Pod can run on a single node. The allocation of Pods on nodes are managed by the Kubernetes Master, which is a particular cluster node. In Container Engine the K8S Master is completely managed (and free!)
  25. Since version 1.3 Kubernetes supports also autoscaling of nodes. If there isn’t sufficient resources available to keep up with Pods scaling, node pool is enlarged.
  26. Creating a Kubernetes cluster is easy: 1) we create the cluster 2) we acquire Kubernetes credentials using gcloud 3) we use kubectl (opensource CLI) to submit commands to the Kubernetes Master
  27. Once the cluster has been created, we can monitor all worker nodes from the Cloud Console. Here we have one node, that contains one Pod, that contains one Container, that contains our application, that is transforming Tweets in Pub/Sub messages.
  28. Cool! We have implemented the first piece of our processing chain. What’s next?
  29. For the processing we want something equally scalable, so we are going to use a technology named Google Cloud Dataflow and...
  30. ...for the storage we are going to use Google BigQuery.
  31. Google Cloud Dataflow is two things: A collection of open source SDKs to implement parallel processing pipelines. The cool thing of being open source is that it means that runners for Dataflow pipelines have already been implemented for other opensource processing technologies, like Apache Spark or Apache Flink. (all the code I’ve written for that demo could run in an open source environment) The project itself is now an Apache Incubator project called Apache Beam. Cloud Dataflow is also a managed service on Google Cloud Platform that runs Apache Beam pipelines.
  32. Google BigQuery is an analytic data warehouse with impressive (almost magical) performances. It comes with a series of features that make it a valid choice as an enterprise-grade DWH: The ability to ingest streaming and batch data JDBC and ODBC connectors to guarantee interoperability A rich query language, which has now been renewed to support standard ANSI SQL-2011 A new Data Manipulation Language that supports updates and deletes
  33. How we are going to make use of these tools? We will build a simple Dataflow pipeline that is composed by three steps: Read tweets from Pub/Sub Transform tweets so as to conform with BigQuery API Write tweets on BigQuery For “tweet” I do not mean only the text, but all the informations that are returned by Twitter APIs (infos about the user,etc)
  34. The implementation is very easy: this is one of the best parts of Cloud Dataflow wrt existing processing technologies like MapReduce. First, we create a Pipeline object First operation is performed invoking an apply method to the Pipeline object, and using a Source to create collections of data called PCollections. In this case, we are using a PubSub Source to create a so-called unbounded PCollection (that is, a PCollection without a limited number of elements) All subsequent operations are performed by invoking apply methods on PCollections, which in turn generate other PCollections The simplest operation you can apply on a PCollection is a ParDo (ParallelDo), that process every element of the PCollection independently from the others. We write data by applying a transform At the end, we tell the system to run the pipeline. The source (PubSubIO) determines if the pipeline is a streaming or a batch one. All the other components (like BigQueryIO) adapt themselves consequently, e.g. BigQueryIO uses Streaming APIs in streaming mode and Load Jobs in batch mode.
  35. The simplest operation you can apply on a PCollection is a ParDo (ParallelDo), that process every element of the PCollection independently from the others. The argument of a ParDo is a DoFn object, we need to redefine the processElement method to instruct the system to do the right thing.
  36. The easiest way to deploy a Datalab Pipeline is using Maven. (hidden some complexity here, like the choice of the runner, the staging location)
  37. Once your pipeline is deployed, you can monitor its execution from the Cloud Console.
  38. You can check if data are actually being processed by querying the destination BigQuery table. It works! We built a very simple processing pipeline that streams data in real-time to our DWH and allows us to query results right as they are coming in. What now?
  39. Now we have to find some interesting analyses that we can evaluate on our data, represent them in a readable and shareable manner
  40. Google Data Studio is a BI solution that allows the creation of dashboards and graphs from several sources, including BigQuery.
  41. Here you see an example showing the number of tweets per state in the US. Not very fancy. In fact, we soon realize that the informations we have from raw data don’t give us very “smart” insights.
  42. We need to enrich our data model in some way. The good news is that Google released a series of APIs exposing ready-to-use Machine Learning algorithms and models. The one that seems to fit our case is...
  43. ...Natural Language APIs. These APIs can perform several different tasks on text strings: extract the syntactic structure of sentences extract entities that are mentioned within a text and even perform sentiment analysis.
  44. The Sentiment analysis API takes a text in input and returns two float values: Polarity (float ranging from -1 to 1) expresses the mood of the text: positive values denote positive moods Magnitude (float ranging from 0 to +inf) expresses the intensity of the feeling. Higher values denote stronger feelings.
  45. Our personal simplistic definition of “sentiment” will be “polarity times magnitude”.
  46. Let’s modify our pipeline. For illustration purposes we will maintain the old flow adding another one to implement the sentiment analysis. The evaluation of the sentiment will happen only for a subset of tweets (those that explicitly contain the words “blackfriday”)
  47. How does this reflect on the Pipeline code? We only have to add three lines of code (I’m lying!) Note how we start from the “tweets” PCollection both for the processing and the write of raw data. Note also how we can reuse the DoFormat function for both flows.
  48. Updating a pipeline is easy if the update doesn’t modify the existing structure (we are only adding new pieces). We only have to provide the name of the job we want to update. Dataflow will take care of draining the existing pipeline before shutting it down.
  49. The Cloud Console shows the updated pipeline, and new “enriched” data is immediately available in a BigQuery table.
  50. We did it! We built a serverless scalable data solution based on Google Cloud Platform. One interesting aspect about this architecture is that it is completely no-ops, and...
  51. ...it has integrated logging, monitoring and alerting thanks to Google Stackdriver. And we didn’t have to do anything!
  52. Let me show you the final solution. We will see how easy it is to query data, monitor the infrastructure, and we will give a look to some dashboards.
  53. When you detect an anomaly in one of the trends, you can drill down in BigQuery to explore the reasons. Walmart popularity is not so high mainly due to their decision of starting Black Friday sales at 6 PM on Thanksgiving Day Amazon popularity dropped down right after they announced their first “Black Friday Week” deals, which apparently did not meet customers’ expectations (they are recovering, though :)