SlideShare una empresa de Scribd logo
1 de 91
Descargar para leer sin conexión
BUSINESS
DASHBOARDS
using Bonobo, Airflow and Grafana
makersquad.fr
makersquad.fr
Romain Dorgueil

romain@makersquad.fr


Maker ~0 years
Containers ~5 years
Python ~10 years
Web ~15 years
Linux ~20 years
Code ~25 years
rdorgueil
Intro. Product
1. Plan.
2. Implement.
3. Visualize.
4. Monitor.
5. Iterate.
Outro. References, Pointers, Sidekick
Content
DISCLAIMERS
If you build a product, do your own research.

Take time to learn and understand tools you consider.





I don’t know nothing, and I recommend nothing.

I assume things and change my mind when I hit a wall.

I’m the creator and main developper of Bonobo ETL.

I’ll try to be objective, but there is a bias here.
PRODUCT
GET https://aprc.it/api/800x600/ep2018.europython.eu/
January 2009
timedelta(years=9)
Febuary 2018
March 2018
April 2018
June 2018
July 2018
not expected
UNDER THE
HOOD
Load Balancer
TCP / L4
Janitor
(asyncio)
SpideSpideSpideSpideSpideSpideSpider
(asyncio)
“Events” Message Queue
Database
(postgres)
“Orders” Message Queue
Object
Storage
Redis
AMQP
HTTP
“CRAWL”
“CREATED”
AMQP/RabbitMQ
HTTP/HTTPS
SQL/Storage
Reverse Proxy
HTTP2 / L7
WebsiWebsite
(django)
APISeAPISeAPISeAPISeAPISeAPIServer
(tornado)
“MISS”
Local
Cache
Load Balancer
TCP / L4
HTTP
Reverse Proxy
HTTP2 / L7
HTTP/HTTPS
SQL/Storage
Prometheus
AlertManager
Grafana
Weblate
MANAGEMENT
SERVICES
Google Analytics
EXTERNAL SERVICES
Stripe
Slack
Sentry
Mailgun
Drift
MixMax
…
Prometheus
Kubernetes
RabbitMQ
Redis
PostgreSQL
NGINX + VTS
Apercite
…
PROMETHEUS EXPORTERS
Database
(postgres) +
PLAN
- Peter Drucker (?)
« If you can’t measure it,
you can’t improve it. »
« What gets measured
gets improved. »
- Peter Drucker (?)
- Take your time to choose metrics wisely.
- Cause vs Effect.
- Less is More.
- One at a time. Or one by team.
- There is not one answer to this question.
Planning
Vanity metrics will
waste your time
- What business are you in?
- What stage are you at?
- Start with a framework.
- You may build your own, later.
Planning
Pirate Metrics
Pirate Metrics(based on Ash Maurya version)
Lean Analytics(book by Alistair Croll & Benjamin Yoskovitz)
Plan A
Plan A
- What business? Software as a Service
- What stage? Empathy / Stickyness
- What metric matters?
- Rate from acquisition to activation.
- QOS (both for display and measure
improvements).
IMPLEMENT
Idea
DataSources
anything
Database
metric → value
Aggregated
(dims, metrics)
Model
Metric
(id) → name
HourlyValue

(metric, date, hour) → value
DailyValue

(metric, date) → value
1
1
n
n
Quick to write.
Not the best.
Keywords to read more: Star and Snowflake Schemas
Bonobo
Extract Transform Load
Select('''
SELECT *
FROM …
WHERE …
''')
def qualify(row):
yield (
row,
'active' if …
else 'inactive'
)
Join('''
SELECT count(*)
FROM …
WHERE uid = %(id)s
''')
def report(row):
send_email(
render(
'email.html', row
)
)
Bonobo
- Independent threads.
- Data is passed first in, first out.
- Supports any kind of directed acyclic graphs.
- Standard Python callable and iterators.
- Getting started still fits in here (ok, barely)
$ pip install bonobo
$ bonobo init somejob.py
$ python somejob.py
Bonobo
Let’s write our jobs.
Extract
from bonobo.config import use_context, Service
from bonobo_sqlalchemy.readers import Select
@use_context
class ObjectCountsReader(Select):
engine = Service('website.engine')
query = '''
SELECT count(%(0)s.id) AS cnt
FROM %(0)s
'''
output_fields = ['dims', 'metrics']
def formatter(self, input_row, row):
now = datetime.datetime.now()
return ({
'date': now.date(),
'hour': now.hour,
}, {
'objects.{}.count'.format(input_row[1]): row['cnt']
})
… counts from website’s database
Extract
TABLES_METRICS = {
AsIs('apercite_account_user'): 'users',
AsIs('apercite_account_userprofile'): 'profiles',
AsIs('apercite_account_apikey'): 'apikeys',
}
def get_readers():
return [
TABLES_METRICS.items(),
ObjectCountsReader(),
]
… counts from website’s database
Normalize
bonobo.SetFields(['dims', 'metrics'])
All data should look the same
Load
class AnalyticsWriter(InsertOrUpdate):
dims = Option(required=True)
filter = Option(required=True)
@property
def discriminant(self):
return ('metric_id', *self.dims)
def get_or_create_metrics(self, context, connection, metrics):
…
def __call__(self, connection, table, buffer, context, row, engine):
dims, metrics = row
if not self.filter(dims, metrics): return
# Get database rows for metric objects.
db_metrics_ids = self.get_or_create_metrics(context, connection, metrics)
# Insert or update values.
for metric, value in metrics.items():
yield from self._put(table, connection, buffer, {
'metric_id': db_metrics_ids[metric],
**{dim: dims[dim] for dim in self.dims},
'value': value,
})
Compose
def get_graph():
normalize = bonobo.SetFields(['dims', 'metrics'])
graph = bonobo.Graph(*get_readers(), normalize)
graph.add_chain(
AnalyticsWriter(
table_name=HourlyValue.__tablename__,
dims=('date', 'hour',),
filter=lambda dims, metrics: 'hour' in dims,
name='Hourly',
), _input=normalize
)
graph.add_chain(
AnalyticsWriter(
table_name=DailyValue.__tablename__,
dims=('date',),
filter=lambda dims, metrics: 'hour' not in dims,
name='Daily',
),
_input=normalize
)
return graph
Configure
def get_services():
return {
'sqlalchemy.engine': EventsDatabase().create_engine(),
'website.engine': WebsiteDatabase().create_engine(),
}
bonobo inspect --graph job.py | dot -o graph.png -T png
Inspect
Run
$ python -m apercite.analytics read objects --write
- dict_items in=1 out=3 [done]
- ObjectCountsReader in=3 out=3 [done]
- SetFields(['dims', 'metrics']) in=3 out=3 [done]
- HourlyAnalyticsWriter in=3 out=3 [done]
- DailyAnalyticsWriter in=3 [done]
Got it.
Let’s add readers.
We’ll run through, you’ll have the code.
Google Analytics
@use('google_analytics')
def read_analytics(google_analytics):
reports = google_analytics.reports().batchGet(
body={…}
).execute().get('reports', [])
for report in reports:
dimensions = report['columnHeader']['dimensions']
metrics = report[‘columnHeader']['metricHeader']['metricHeaderEntries']
rows = report['data']['rows']
for row in rows:
dim_values = zip(dimensions, row['dimensions'])
yield (
{
GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [dim])[0]:
GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [None, IDENTITY])[1](val)
for dim, val in dim_values
},
{
GOOGLE_ANALYTICS_METRICS.get(metric['name'], metric['name']):
GOOGLE_ANALYTICS_TYPES[metric['type']](value)
for metric, value in zip(metrics, row['metrics'][0]['values'])
},
)
Prometheus
class PrometheusReader(Configurable):
http = Service('http')
endpoint = 'http://{}:{}/api/v1'.format(PROMETHEUS_HOST, PROMETHEUS_PORT)
queries = […]
def __call__(self, *, http):
start_at, end_at = self.get_timerange()
for query in self.queries:
for result in http.get(…).json().get('data', {}).get('result', []):
metric = result.get('metric', {})
for ts, val in result.get('values', []):
name = query.target.format(**metric)
_date, _hour = …
yield {
'date': _date,
'hour': _hour,
}, {
name: float(val)
}
Spider counts
class SpidersReader(Select):
kwargs = Option()
output_fields = ['row']
@property
def query(self):
return '''
SELECT spider.value AS name,
spider.created_at AS created_at,
spider_status.attributes AS attributes,
spider_status.created_at AS updated_at
FROM
spider
JOIN …
WHERE spider_status.created_at > %(now)s
ORDER BY spider_status.created_at DESC
'''
def formatter(self, input_row, row):
return (row, )
Spider counts
def spider_reducer(self, left, right):
result = dict(left)
result['spider.total'] += len(right.attributes)
for worker in right.attributes:
if 'stage' in worker:
result['spider.active'] += 1
else:
result['spider.idle'] += 1
return result
Spider counts
now = datetime.datetime.utcnow() - datetime.timedelta(minutes=30)
def get_readers():
return (
SpidersReader(kwargs={'now': now}),
Reduce(spider_reducer, initializer={
'spider.idle': 0,
'spider.active': 0,
'spider.total': 0,
}),
(lambda x: ({'date': now.date(), 'hour': now.hour}, x))
)
You got the idea.
Inspect
We can generate ETL graphs with all readers or only a few.
Run
$ python -m apercite.analytics read all --write
- read_analytics in=1 out=91 [done]
- EventsReader in=1 out=27 [done]
- EventsTimingsReader in=1 out=2039 [done]
- group_timings in=2039 out=24 [done]
- format_timings_for_metrics in=24 out=24 [done]
- SpidersReader in=1 out=1 [done]
- Reduce in=1 out=1 [done]
- <lambda> in=1 out=1 [done]
- PrometheusReader in=1 out=3274 [done]
- dict_items in=1 out=3 [done]
- ObjectCountsReader in=3 out=3 [done]
- SetFields(['dims', 'metrics']) in=3420 out=3420 [done]
- HourlyAnalyticsWriter in=3420 out=3562 [done]
- DailyAnalyticsWriter in=3420 out=182 [done]
Easy to build.
Easy to add or replace parts.
Easy to run.
Told ya, slight bias.
VISUALIZE
Grafana
Analytics & Monitoring
Dashboards
Quality of Service
Quality of Service
Quality of Service
Public Dashboards
Acquisition Rate
+
User Counts New Sessions
Acquisition Rate
We’re just getting started.
MONITOR
Iteration 0
- Cron job runs everything every 30
minutes.
- No way to know if something fails.
- Expensive tasks.
- Hard to run manually.
I’ll handle that.
Monitoring?
Airflow




«Airflow is a platform to
programmatically author,
schedule and monitor
workflows.»
- Official docs
Airflow
- Created by Airbnb, joined Apache incubation.
- Schedules & monitor jobs.
- Distribute workloads through Celery, Dask, K8S…
- Can run anything, not just Python.
Airflow
Webserver Scheduler
Worker
Metadata
Worker
Worker
Worker
Simplified to show high-level concept.

Depends on executor (celery, dask, k8s, local, sequential …)
DAGsimport shlex
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
def _get_bash_command(*args, module='apercite.analytics'):
return '(cd /usr/local/apercite; /usr/local/env/bin/python -m {} {})'.format(
module, ' '.join(map(shlex.quote, args)),
)
def build_dag(name, *args, schedule_interval='@hourly'):
dag = DAG(
name,
schedule_interval=schedule_interval,
default_args=default_args,
catchup=False,
)
dag >> BashOperator(
dag=dag,
task_id=args[0],
bash_command=_get_bash_command(*args),
env=env,
)
return dag
DAGs
# Build datasource-to-metrics-db related dags.
for source in ('google-analytics', 'events', 'events-timings', 'spiders',
'prometheus', 'objects'):
name = 'apercite.analytics.' + source.replace('-', '_')
globals()[name] = build_dag(name, 'read', source, '--write')
# Cleanup dag.
name = 'apercite.analytics.cleanup'
globals()[name] = build_dag(name, 'clean', 'all', schedule_interval='@daily')
Data Sources
from airflow.models import Connection
from airflow.settings import Session
session = Session()
website = session.query(Connection).filter_by(conn_id='apercite_website').first()
events = session.query(Connection).filter_by(conn_id='apercite_events').first()
session.close()
env = {}
if website:
env['DATABASE_HOST'] = str(website.host)
env['DATABASE_PORT'] = str(website.port)
env['DATABASE_USER'] = str(website.login)
env['DATABASE_NAME'] = str(website.schema)
env['DATABASE_PASSWORD'] = str(website.password)
if events:
env['EVENT_DATABASE_USER'] = str(events.login)
env['EVENT_DATABASE_NAME'] = str(events.schema)
env['EVENT_DATABASE_PASSWORD'] = str(events.password)
Warning: sub-optimal
Airflow
- Where to store the dags ?
- Build : separate virtualenv
- Everything we do run locally
- Deployment
Learnings
- Multiple services, not trivial
- Helm charts :-(
- Astronomer Distro :-)
- Read the Source, Luke
ITERATE
Plan N+1
- Create a framework for
experiments.
- Timebox constraint
- Objective & Key Result
- Decide upon results.
Plan N+1read: Scaling Lean, by Ash Maurya
Tech Side
- Month on Month
- Year on Year
- % Growth
Ideas …
- Revenue (stripe, billing …)
- Trafic & SEO (analytics, console …)
- Conversion (AARRR)
- Quality of Service
- Processing (AMQP, …)
- Service Level (HTTP statuses, Time of Requests …)
- Vanity metrics
- Business metrics
Pick one.

Rince.

Repeat.
OUTRO
Airflow helps you manage the whole factory.

Does not care about the jobs’ content.

Bonobo helps you build assembly lines.

Does not care about the surrounding factory.
References
Feedback

+

Bonobo ETL Sprint
Slides, resources, feedback …
apercite.fr/europython
romain@makersquad.fr rdorgueil

Más contenido relacionado

La actualidad más candente

[社内勉強会]ELBとALBと数万スパイク負荷テスト
[社内勉強会]ELBとALBと数万スパイク負荷テスト[社内勉強会]ELBとALBと数万スパイク負荷テスト
[社内勉強会]ELBとALBと数万スパイク負荷テストTakahiro Moteki
 
DeNA の AWS アカウント管理とセキュリティ監査自動化
DeNA の AWS アカウント管理とセキュリティ監査自動化DeNA の AWS アカウント管理とセキュリティ監査自動化
DeNA の AWS アカウント管理とセキュリティ監査自動化DeNA
 
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawaws
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawawsOAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawaws
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawawsTatsuo Kudo
 
Authentification et autorisation d'accès avec AWS IAM
Authentification et autorisation d'accès avec AWS IAMAuthentification et autorisation d'accès avec AWS IAM
Authentification et autorisation d'accès avec AWS IAMJulien SIMON
 
Private Azure Kubernetes Service cluster を触ってみよう♪
Private Azure Kubernetes Service cluster を触ってみよう♪Private Azure Kubernetes Service cluster を触ってみよう♪
Private Azure Kubernetes Service cluster を触ってみよう♪Igarashi Toru
 
Cognito、Azure ADと仲良くしてみた
Cognito、Azure ADと仲良くしてみたCognito、Azure ADと仲良くしてみた
Cognito、Azure ADと仲良くしてみたTakafumi Kondo
 
AWS Wavelength最新情報(2020/12)
AWS Wavelength最新情報(2020/12)AWS Wavelength最新情報(2020/12)
AWS Wavelength最新情報(2020/12)Kentaro Matsumoto
 
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューション
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューションOAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューション
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューションTatsuo Kudo
 
もしSIerのエンジニアがSRE本を読んだら
もしSIerのエンジニアがSRE本を読んだらもしSIerのエンジニアがSRE本を読んだら
もしSIerのエンジニアがSRE本を読んだらTomoki Ando
 
Azure DevOps Online Vol.3 - Inside Azure Pipelines
Azure DevOps Online Vol.3 - Inside Azure PipelinesAzure DevOps Online Vol.3 - Inside Azure Pipelines
Azure DevOps Online Vol.3 - Inside Azure PipelinesKazushi Kamegawa
 
Azure Api Management 俺的マニュアル 2020年3月版
Azure Api Management 俺的マニュアル 2020年3月版Azure Api Management 俺的マニュアル 2020年3月版
Azure Api Management 俺的マニュアル 2020年3月版貴志 上坂
 
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...Amazon Web Services Japan
 
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~Hitachi, Ltd. OSS Solution Center.
 
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020OpenID Foundation Japan
 
Azure ADと外部アプリのID連携/SSO - Deep Dive
Azure ADと外部アプリのID連携/SSO - Deep DiveAzure ADと外部アプリのID連携/SSO - Deep Dive
Azure ADと外部アプリのID連携/SSO - Deep DiveNaohiro Fujie
 

La actualidad más candente (20)

ヤフー発のメッセージキュー「Pulsar」のご紹介
ヤフー発のメッセージキュー「Pulsar」のご紹介ヤフー発のメッセージキュー「Pulsar」のご紹介
ヤフー発のメッセージキュー「Pulsar」のご紹介
 
[社内勉強会]ELBとALBと数万スパイク負荷テスト
[社内勉強会]ELBとALBと数万スパイク負荷テスト[社内勉強会]ELBとALBと数万スパイク負荷テスト
[社内勉強会]ELBとALBと数万スパイク負荷テスト
 
KeycloakでAPI認可に入門する
KeycloakでAPI認可に入門するKeycloakでAPI認可に入門する
KeycloakでAPI認可に入門する
 
Keycloak入門
Keycloak入門Keycloak入門
Keycloak入門
 
DeNA の AWS アカウント管理とセキュリティ監査自動化
DeNA の AWS アカウント管理とセキュリティ監査自動化DeNA の AWS アカウント管理とセキュリティ監査自動化
DeNA の AWS アカウント管理とセキュリティ監査自動化
 
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawaws
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawawsOAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawaws
OAuth / OpenID Connectを中心とするAPIセキュリティについて #yuzawaws
 
WebSocket / WebRTCの技術紹介
WebSocket / WebRTCの技術紹介WebSocket / WebRTCの技術紹介
WebSocket / WebRTCの技術紹介
 
Authentification et autorisation d'accès avec AWS IAM
Authentification et autorisation d'accès avec AWS IAMAuthentification et autorisation d'accès avec AWS IAM
Authentification et autorisation d'accès avec AWS IAM
 
NVIDIA 入門
NVIDIA 入門NVIDIA 入門
NVIDIA 入門
 
Private Azure Kubernetes Service cluster を触ってみよう♪
Private Azure Kubernetes Service cluster を触ってみよう♪Private Azure Kubernetes Service cluster を触ってみよう♪
Private Azure Kubernetes Service cluster を触ってみよう♪
 
Cognito、Azure ADと仲良くしてみた
Cognito、Azure ADと仲良くしてみたCognito、Azure ADと仲良くしてみた
Cognito、Azure ADと仲良くしてみた
 
AWS Wavelength最新情報(2020/12)
AWS Wavelength最新情報(2020/12)AWS Wavelength最新情報(2020/12)
AWS Wavelength最新情報(2020/12)
 
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューション
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューションOAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューション
OAuth / OpenID Connect (OIDC) の最新動向と Authlete のソリューション
 
もしSIerのエンジニアがSRE本を読んだら
もしSIerのエンジニアがSRE本を読んだらもしSIerのエンジニアがSRE本を読んだら
もしSIerのエンジニアがSRE本を読んだら
 
Azure DevOps Online Vol.3 - Inside Azure Pipelines
Azure DevOps Online Vol.3 - Inside Azure PipelinesAzure DevOps Online Vol.3 - Inside Azure Pipelines
Azure DevOps Online Vol.3 - Inside Azure Pipelines
 
Azure Api Management 俺的マニュアル 2020年3月版
Azure Api Management 俺的マニュアル 2020年3月版Azure Api Management 俺的マニュアル 2020年3月版
Azure Api Management 俺的マニュアル 2020年3月版
 
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...
20180717 AWS Black Belt Online Seminar AWS大阪ローカルリージョンの活用とAWSで実現するDisaster Rec...
 
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~
最近のKeycloakのご紹介 ~クライアントポリシーとFAPI~
 
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020
OpenIDConnectを活用したgBizID(法人共通認証基盤)の現状と今後の展望 - OpenID Summit 2020
 
Azure ADと外部アプリのID連携/SSO - Deep Dive
Azure ADと外部アプリのID連携/SSO - Deep DiveAzure ADと外部アプリのID連携/SSO - Deep Dive
Azure ADと外部アプリのID連携/SSO - Deep Dive
 

Similar a Business Dashboards using Bonobo ETL, Grafana and Apache Airflow

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilPôle Systematic Paris-Region
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Johann de Boer
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Fast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBFast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBMongoDB
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...InfluxData
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 

Similar a Business Dashboards using Bonobo ETL, Grafana and Apache Airflow (20)

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Fast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBFast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDB
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 

Último (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 

Business Dashboards using Bonobo ETL, Grafana and Apache Airflow

  • 1. BUSINESS DASHBOARDS using Bonobo, Airflow and Grafana makersquad.fr
  • 2. makersquad.fr Romain Dorgueil
 romain@makersquad.fr 
 Maker ~0 years Containers ~5 years Python ~10 years Web ~15 years Linux ~20 years Code ~25 years rdorgueil
  • 3. Intro. Product 1. Plan. 2. Implement. 3. Visualize. 4. Monitor. 5. Iterate. Outro. References, Pointers, Sidekick Content
  • 4. DISCLAIMERS If you build a product, do your own research.
 Take time to learn and understand tools you consider.
 
 
 I don’t know nothing, and I recommend nothing.
 I assume things and change my mind when I hit a wall.
 I’m the creator and main developper of Bonobo ETL.
 I’ll try to be objective, but there is a bias here.
  • 16. Load Balancer TCP / L4 Janitor (asyncio) SpideSpideSpideSpideSpideSpideSpider (asyncio) “Events” Message Queue Database (postgres) “Orders” Message Queue Object Storage Redis AMQP HTTP “CRAWL” “CREATED” AMQP/RabbitMQ HTTP/HTTPS SQL/Storage Reverse Proxy HTTP2 / L7 WebsiWebsite (django) APISeAPISeAPISeAPISeAPISeAPIServer (tornado) “MISS” Local Cache
  • 17. Load Balancer TCP / L4 HTTP Reverse Proxy HTTP2 / L7 HTTP/HTTPS SQL/Storage Prometheus AlertManager Grafana Weblate MANAGEMENT SERVICES Google Analytics EXTERNAL SERVICES Stripe Slack Sentry Mailgun Drift MixMax … Prometheus Kubernetes RabbitMQ Redis PostgreSQL NGINX + VTS Apercite … PROMETHEUS EXPORTERS Database (postgres) +
  • 18.
  • 19.
  • 20.
  • 21. PLAN
  • 22. - Peter Drucker (?) « If you can’t measure it, you can’t improve it. »
  • 23. « What gets measured gets improved. » - Peter Drucker (?)
  • 24. - Take your time to choose metrics wisely. - Cause vs Effect. - Less is More. - One at a time. Or one by team. - There is not one answer to this question. Planning
  • 26. - What business are you in? - What stage are you at? - Start with a framework. - You may build your own, later. Planning
  • 28. Pirate Metrics(based on Ash Maurya version)
  • 29. Lean Analytics(book by Alistair Croll & Benjamin Yoskovitz)
  • 31. Plan A - What business? Software as a Service - What stage? Empathy / Stickyness - What metric matters? - Rate from acquisition to activation. - QOS (both for display and measure improvements).
  • 34. Model Metric (id) → name HourlyValue
 (metric, date, hour) → value DailyValue
 (metric, date) → value 1 1 n n
  • 35. Quick to write. Not the best. Keywords to read more: Star and Snowflake Schemas
  • 37. Select(''' SELECT * FROM … WHERE … ''') def qualify(row): yield ( row, 'active' if … else 'inactive' ) Join(''' SELECT count(*) FROM … WHERE uid = %(id)s ''') def report(row): send_email( render( 'email.html', row ) ) Bonobo
  • 38. - Independent threads. - Data is passed first in, first out. - Supports any kind of directed acyclic graphs. - Standard Python callable and iterators. - Getting started still fits in here (ok, barely) $ pip install bonobo $ bonobo init somejob.py $ python somejob.py Bonobo
  • 40. Extract from bonobo.config import use_context, Service from bonobo_sqlalchemy.readers import Select @use_context class ObjectCountsReader(Select): engine = Service('website.engine') query = ''' SELECT count(%(0)s.id) AS cnt FROM %(0)s ''' output_fields = ['dims', 'metrics'] def formatter(self, input_row, row): now = datetime.datetime.now() return ({ 'date': now.date(), 'hour': now.hour, }, { 'objects.{}.count'.format(input_row[1]): row['cnt'] }) … counts from website’s database
  • 41. Extract TABLES_METRICS = { AsIs('apercite_account_user'): 'users', AsIs('apercite_account_userprofile'): 'profiles', AsIs('apercite_account_apikey'): 'apikeys', } def get_readers(): return [ TABLES_METRICS.items(), ObjectCountsReader(), ] … counts from website’s database
  • 43. Load class AnalyticsWriter(InsertOrUpdate): dims = Option(required=True) filter = Option(required=True) @property def discriminant(self): return ('metric_id', *self.dims) def get_or_create_metrics(self, context, connection, metrics): … def __call__(self, connection, table, buffer, context, row, engine): dims, metrics = row if not self.filter(dims, metrics): return # Get database rows for metric objects. db_metrics_ids = self.get_or_create_metrics(context, connection, metrics) # Insert or update values. for metric, value in metrics.items(): yield from self._put(table, connection, buffer, { 'metric_id': db_metrics_ids[metric], **{dim: dims[dim] for dim in self.dims}, 'value': value, })
  • 44. Compose def get_graph(): normalize = bonobo.SetFields(['dims', 'metrics']) graph = bonobo.Graph(*get_readers(), normalize) graph.add_chain( AnalyticsWriter( table_name=HourlyValue.__tablename__, dims=('date', 'hour',), filter=lambda dims, metrics: 'hour' in dims, name='Hourly', ), _input=normalize ) graph.add_chain( AnalyticsWriter( table_name=DailyValue.__tablename__, dims=('date',), filter=lambda dims, metrics: 'hour' not in dims, name='Daily', ), _input=normalize ) return graph
  • 45. Configure def get_services(): return { 'sqlalchemy.engine': EventsDatabase().create_engine(), 'website.engine': WebsiteDatabase().create_engine(), }
  • 46. bonobo inspect --graph job.py | dot -o graph.png -T png Inspect
  • 47. Run $ python -m apercite.analytics read objects --write - dict_items in=1 out=3 [done] - ObjectCountsReader in=3 out=3 [done] - SetFields(['dims', 'metrics']) in=3 out=3 [done] - HourlyAnalyticsWriter in=3 out=3 [done] - DailyAnalyticsWriter in=3 [done]
  • 48. Got it. Let’s add readers. We’ll run through, you’ll have the code.
  • 49. Google Analytics @use('google_analytics') def read_analytics(google_analytics): reports = google_analytics.reports().batchGet( body={…} ).execute().get('reports', []) for report in reports: dimensions = report['columnHeader']['dimensions'] metrics = report[‘columnHeader']['metricHeader']['metricHeaderEntries'] rows = report['data']['rows'] for row in rows: dim_values = zip(dimensions, row['dimensions']) yield ( { GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [dim])[0]: GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [None, IDENTITY])[1](val) for dim, val in dim_values }, { GOOGLE_ANALYTICS_METRICS.get(metric['name'], metric['name']): GOOGLE_ANALYTICS_TYPES[metric['type']](value) for metric, value in zip(metrics, row['metrics'][0]['values']) }, )
  • 50. Prometheus class PrometheusReader(Configurable): http = Service('http') endpoint = 'http://{}:{}/api/v1'.format(PROMETHEUS_HOST, PROMETHEUS_PORT) queries = […] def __call__(self, *, http): start_at, end_at = self.get_timerange() for query in self.queries: for result in http.get(…).json().get('data', {}).get('result', []): metric = result.get('metric', {}) for ts, val in result.get('values', []): name = query.target.format(**metric) _date, _hour = … yield { 'date': _date, 'hour': _hour, }, { name: float(val) }
  • 51. Spider counts class SpidersReader(Select): kwargs = Option() output_fields = ['row'] @property def query(self): return ''' SELECT spider.value AS name, spider.created_at AS created_at, spider_status.attributes AS attributes, spider_status.created_at AS updated_at FROM spider JOIN … WHERE spider_status.created_at > %(now)s ORDER BY spider_status.created_at DESC ''' def formatter(self, input_row, row): return (row, )
  • 52. Spider counts def spider_reducer(self, left, right): result = dict(left) result['spider.total'] += len(right.attributes) for worker in right.attributes: if 'stage' in worker: result['spider.active'] += 1 else: result['spider.idle'] += 1 return result
  • 53. Spider counts now = datetime.datetime.utcnow() - datetime.timedelta(minutes=30) def get_readers(): return ( SpidersReader(kwargs={'now': now}), Reduce(spider_reducer, initializer={ 'spider.idle': 0, 'spider.active': 0, 'spider.total': 0, }), (lambda x: ({'date': now.date(), 'hour': now.hour}, x)) )
  • 54. You got the idea.
  • 55. Inspect We can generate ETL graphs with all readers or only a few.
  • 56. Run $ python -m apercite.analytics read all --write - read_analytics in=1 out=91 [done] - EventsReader in=1 out=27 [done] - EventsTimingsReader in=1 out=2039 [done] - group_timings in=2039 out=24 [done] - format_timings_for_metrics in=24 out=24 [done] - SpidersReader in=1 out=1 [done] - Reduce in=1 out=1 [done] - <lambda> in=1 out=1 [done] - PrometheusReader in=1 out=3274 [done] - dict_items in=1 out=3 [done] - ObjectCountsReader in=3 out=3 [done] - SetFields(['dims', 'metrics']) in=3420 out=3420 [done] - HourlyAnalyticsWriter in=3420 out=3562 [done] - DailyAnalyticsWriter in=3420 out=182 [done]
  • 57. Easy to build. Easy to add or replace parts. Easy to run. Told ya, slight bias.
  • 69. Iteration 0 - Cron job runs everything every 30 minutes. - No way to know if something fails. - Expensive tasks. - Hard to run manually.
  • 71. Airflow 
 
 «Airflow is a platform to programmatically author, schedule and monitor workflows.» - Official docs
  • 72. Airflow - Created by Airbnb, joined Apache incubation. - Schedules & monitor jobs. - Distribute workloads through Celery, Dask, K8S… - Can run anything, not just Python.
  • 73. Airflow Webserver Scheduler Worker Metadata Worker Worker Worker Simplified to show high-level concept.
 Depends on executor (celery, dask, k8s, local, sequential …)
  • 74.
  • 75. DAGsimport shlex from airflow import DAG from airflow.operators.bash_operator import BashOperator def _get_bash_command(*args, module='apercite.analytics'): return '(cd /usr/local/apercite; /usr/local/env/bin/python -m {} {})'.format( module, ' '.join(map(shlex.quote, args)), ) def build_dag(name, *args, schedule_interval='@hourly'): dag = DAG( name, schedule_interval=schedule_interval, default_args=default_args, catchup=False, ) dag >> BashOperator( dag=dag, task_id=args[0], bash_command=_get_bash_command(*args), env=env, ) return dag
  • 76. DAGs # Build datasource-to-metrics-db related dags. for source in ('google-analytics', 'events', 'events-timings', 'spiders', 'prometheus', 'objects'): name = 'apercite.analytics.' + source.replace('-', '_') globals()[name] = build_dag(name, 'read', source, '--write') # Cleanup dag. name = 'apercite.analytics.cleanup' globals()[name] = build_dag(name, 'clean', 'all', schedule_interval='@daily')
  • 77.
  • 78. Data Sources from airflow.models import Connection from airflow.settings import Session session = Session() website = session.query(Connection).filter_by(conn_id='apercite_website').first() events = session.query(Connection).filter_by(conn_id='apercite_events').first() session.close() env = {} if website: env['DATABASE_HOST'] = str(website.host) env['DATABASE_PORT'] = str(website.port) env['DATABASE_USER'] = str(website.login) env['DATABASE_NAME'] = str(website.schema) env['DATABASE_PASSWORD'] = str(website.password) if events: env['EVENT_DATABASE_USER'] = str(events.login) env['EVENT_DATABASE_NAME'] = str(events.schema) env['EVENT_DATABASE_PASSWORD'] = str(events.password) Warning: sub-optimal
  • 79. Airflow - Where to store the dags ? - Build : separate virtualenv - Everything we do run locally - Deployment
  • 80. Learnings - Multiple services, not trivial - Helm charts :-( - Astronomer Distro :-) - Read the Source, Luke
  • 82. Plan N+1 - Create a framework for experiments. - Timebox constraint - Objective & Key Result - Decide upon results.
  • 83. Plan N+1read: Scaling Lean, by Ash Maurya
  • 84. Tech Side - Month on Month - Year on Year - % Growth
  • 85. Ideas … - Revenue (stripe, billing …) - Trafic & SEO (analytics, console …) - Conversion (AARRR) - Quality of Service - Processing (AMQP, …) - Service Level (HTTP statuses, Time of Requests …) - Vanity metrics - Business metrics
  • 87. OUTRO
  • 88. Airflow helps you manage the whole factory.
 Does not care about the jobs’ content.
 Bonobo helps you build assembly lines.
 Does not care about the surrounding factory.
  • 91. Slides, resources, feedback … apercite.fr/europython romain@makersquad.fr rdorgueil