SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Resilient Data Pipelines with
Docker and Docker-
Compose
Heeren Sharma
Software Engineer @Cliqz
heeren@cliqz.com
@heerensharma
80+ - Team size!
!
500,000 - DAU!
!
3 Million+ - Downloads (Germany only)!
!
1 billion+ - Indexed pages (We do not believe
in indexing the web.)!
!
5 TB - In-Memory indexed (Based on open
source and in-house build NoSQL stores.)
Storyfile
FROM Introduction
RUN Data-pipelines
RUN whats-docker
ADD use-case ./wrap-up
CMD ./demo
“Data and data everywhere”
“Design and development of Data Pipelines”
“Cloud deployment has its own needs”
One’s processed Data is another
system’s input data
What’s that whale named
Docker ?
"Docker allows you to package an application with all of its dependencies into a
standardised unit for software development."
A little bit more Docker
• Dockerfile
!
FROM ubuntu:14.04!
RUN pip install requests!
ADD . /awesome-code!
WORKDIR /awesome-code!
CMD python revolutionary_app.py
$ docker build -t=“your-awesome-image:v1” .
$ docker run -d -p 80:5000 —name container-name your-awesome-image:v1
$ docker push your-awesome-image:v1
$ docker pull myregistry.com:8080/your-awesome-image:v1
$ docker ps!
!
$ docker inspect <container-name/ID>!
!
$ docker log <container-name/ID>!
!
$ docker stop <container-name/ID>!
!
$ docker exec -it <container-name/ID> your_command!
Simplified Use Case
• Streaming data from different sources - Twitter,
FB, feeds and customised scraping engine.
• Different processing engines
• Fast iterations over new requirements
• Resilient system with focus over easy
deployment
Use case - News Articles
• Trending news articles from
different domains
• News Categorisation
• Relevant news over search
query
• Traffic of news content
fluctuates - pretty dynamic.
• There is no universal right
answer
System Design (1)
Data
Stream
Processing
Engine
Processed
Data
(Redis)
$ docker run -d —-name processed_data redismaster:v1 redis-server
$ docker build -t=“redismaster:v1” .
$ docker build -t=“datastream:twitter” .
$ docker run -d —-name twitter_queue datastream:twitter python /code/format_data.py
$ docker build -t —-name processing_engine .!
$ docker run -d —-name magic_powerhouse —-link processed_data:db processing_engine python /code/
magic_script.py
New State of design
System Design (2)
Data
Stream
Processing
Engine
Processed
Data
(Redis)
News
Fetcher
Requests
System Design (3)
Data
Stream
Processing
Engine
Processed
Data
(Redis)
FB
Engine
News
Fetcher
Requests
System Design (4)
Data
Stream
Processing
Engine
Processed
Data
(Redis)
FB
Engine
News
Fetcher
Requests
NewsLetters
Engine
3rd
Party
docker-compose to rescue
queue:!
build: .!
command: python /news-swimlane/twitter-queue/read_queue.py!
volumes:!
- /ebs/data:/data!
environment:!
DATA_DIR: /data!
PYTHONPATH: /news-swimlane!
restart: always!
update:!
build: .!
command: python /news-swimlane/server-redis/build_redis_exc.py!
volumes:!
- /ebs/data:/data!
links:!
- redismaster!
- fetcher!
environment:!
REDIS_HOST: redismaster_1!
DATA_DIR: /data!
PYTHONPATH: /news-swimlane!
restart: always!
redismaster:!
build: redis3-container!
ports:!
- "6379:6379"!
volumes:!
- /ebs/backup:/data!
restart: always!
command: /usr/local/bin/redis-server /master.conf!
fetcher:!
build: .!
command: python /news-swimlane/fetcher/server_fetcher.py!
ports:!
- "80:5000"!
Key points: Design &
Development
• Micro - Services oriented design and henceforth
development
• Old/new components can be realised in form of
docker containers
• Different containers can readily interact among
each other
• Ease to test (Local environment) and no worries if
it bursts out in production.
Deployment
• Just install docker on remote (to be done with
care)
• Docker images can be pushed to remote
repository (better say registry).
• Make your instances autoscale in any cloud IaaS.
• If instance go down, then new instance just pull
docker images, and start docker containers.
Resources
• Docker - https://www.docker.com/whatisdocker
• Boot2docker - https://docs.docker.com/installation/mac/
• Docker-compose - https://docs.docker.com/compose/
• Very short video about Docker (American Style) - https://
www.youtube.com/watch?v=aLipr7tTuA4 - American style
• https://www.youtube.com/watch?v=FdkNAjjO5yQ - Good
resource to have a little deep insight about Containers
http://www.cliqz.com/en
THANKQZ

Más contenido relacionado

La actualidad más candente

Code4 lib 20141129 python
Code4 lib 20141129 pythonCode4 lib 20141129 python
Code4 lib 20141129 pythontdsmithCapU
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architectureSandeep Patil
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko
 
My First Hadoop Program !!!
My First Hadoop Program !!!My First Hadoop Program !!!
My First Hadoop Program !!!Ayapparaj SKS
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesKubeAcademy
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Yahoo Developer Network
 

La actualidad más candente (20)

Code4 lib 20141129 python
Code4 lib 20141129 pythonCode4 lib 20141129 python
Code4 lib 20141129 python
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
My First Hadoop Program !!!
My First Hadoop Program !!!My First Hadoop Program !!!
My First Hadoop Program !!!
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On Kubernetes
 
Pig
PigPig
Pig
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
 
Apache hive
Apache hiveApache hive
Apache hive
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 

Destacado

Offene Lehrveranstaltungen mit Web 2.0 Technologien
Offene Lehrveranstaltungen mit Web 2.0 TechnologienOffene Lehrveranstaltungen mit Web 2.0 Technologien
Offene Lehrveranstaltungen mit Web 2.0 TechnologienUlrich Schrader
 
Heldenplatz 1938 Vortrag gehalten an der Universität Oldenburg 2009
Heldenplatz 1938  Vortrag gehalten an der Universität Oldenburg 2009Heldenplatz 1938  Vortrag gehalten an der Universität Oldenburg 2009
Heldenplatz 1938 Vortrag gehalten an der Universität Oldenburg 2009Thomas Just
 
Screencasting - Vorlesungsaufzeichnungen leicht gemacht
Screencasting - Vorlesungsaufzeichnungen leicht gemachtScreencasting - Vorlesungsaufzeichnungen leicht gemacht
Screencasting - Vorlesungsaufzeichnungen leicht gemachtUlrich Schrader
 
01 pm vorbemerkungen_ws1011
01 pm vorbemerkungen_ws101101 pm vorbemerkungen_ws1011
01 pm vorbemerkungen_ws1011TH Köln
 
Medienbildung in einer zukunftsorientierten Lehrerbildung
Medienbildung in einer zukunftsorientierten LehrerbildungMedienbildung in einer zukunftsorientierten Lehrerbildung
Medienbildung in einer zukunftsorientierten LehrerbildungPetra Grell
 
Vortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni StuttgartVortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni StuttgartHenning Rauch
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)Menlo Systems GmbH
 

Destacado (12)

Offene Lehrveranstaltungen mit Web 2.0 Technologien
Offene Lehrveranstaltungen mit Web 2.0 TechnologienOffene Lehrveranstaltungen mit Web 2.0 Technologien
Offene Lehrveranstaltungen mit Web 2.0 Technologien
 
Heldenplatz 1938 Vortrag gehalten an der Universität Oldenburg 2009
Heldenplatz 1938  Vortrag gehalten an der Universität Oldenburg 2009Heldenplatz 1938  Vortrag gehalten an der Universität Oldenburg 2009
Heldenplatz 1938 Vortrag gehalten an der Universität Oldenburg 2009
 
Screencasting - Vorlesungsaufzeichnungen leicht gemacht
Screencasting - Vorlesungsaufzeichnungen leicht gemachtScreencasting - Vorlesungsaufzeichnungen leicht gemacht
Screencasting - Vorlesungsaufzeichnungen leicht gemacht
 
Landwirtschaft in Halle studieren
Landwirtschaft in Halle studierenLandwirtschaft in Halle studieren
Landwirtschaft in Halle studieren
 
Praesentation TU Darmstadt
Praesentation TU DarmstadtPraesentation TU Darmstadt
Praesentation TU Darmstadt
 
01 pm vorbemerkungen_ws1011
01 pm vorbemerkungen_ws101101 pm vorbemerkungen_ws1011
01 pm vorbemerkungen_ws1011
 
The Changing Character of Customization: Content Personalisation in the News
The Changing Character of Customization: Content Personalisation in the NewsThe Changing Character of Customization: Content Personalisation in the News
The Changing Character of Customization: Content Personalisation in the News
 
Medienbildung in einer zukunftsorientierten Lehrerbildung
Medienbildung in einer zukunftsorientierten LehrerbildungMedienbildung in einer zukunftsorientierten Lehrerbildung
Medienbildung in einer zukunftsorientierten Lehrerbildung
 
Vortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni StuttgartVortrag Graphendatenbanken Uni Stuttgart
Vortrag Graphendatenbanken Uni Stuttgart
 
Iitm10.Key
Iitm10.KeyIitm10.Key
Iitm10.Key
 
Praesentation TU Darmstadt English
Praesentation TU Darmstadt EnglishPraesentation TU Darmstadt English
Praesentation TU Darmstadt English
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
 

Similar a PyconUK-2015

Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data ScienceMisha Lisovich
 
TIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container worldTIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container worldThe Incredible Automation Day
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Amazon Web Services and Docker: from developing to production
Amazon Web Services and Docker: from developing to productionAmazon Web Services and Docker: from developing to production
Amazon Web Services and Docker: from developing to productionPaolo latella
 
Docker včera, dnes a zítra
Docker včera, dnes a zítraDocker včera, dnes a zítra
Docker včera, dnes a zítraLadislav Prskavec
 
Docker and the Container Revolution
Docker and the Container RevolutionDocker and the Container Revolution
Docker and the Container RevolutionRomain Dorgueil
 
Engineering Presentation for Careers@Directi
Engineering Presentation for Careers@DirectiEngineering Presentation for Careers@Directi
Engineering Presentation for Careers@DirectiDirecti Group
 
Docker Container As A Service - JAX 2016
Docker Container As A Service - JAX 2016Docker Container As A Service - JAX 2016
Docker Container As A Service - JAX 2016Patrick Chanezon
 
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...The Incredible Automation Day
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka
 
Redispresentation apac2012
Redispresentation apac2012Redispresentation apac2012
Redispresentation apac2012Ankur Gupta
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)QAware GmbH
 
ASP.NET Core and Docker
ASP.NET Core and DockerASP.NET Core and Docker
ASP.NET Core and DockerChuck Megivern
 
Docker Platform and Ecosystem
Docker Platform and EcosystemDocker Platform and Ecosystem
Docker Platform and EcosystemPatrick Chanezon
 
Docker Multi-arch All The Things
Docker Multi-arch All The ThingsDocker Multi-arch All The Things
Docker Multi-arch All The ThingsDocker, Inc.
 
Rakuten Ichiba development Automation show case - Bamboo, Docker -
Rakuten Ichiba development Automation show case - Bamboo, Docker -Rakuten Ichiba development Automation show case - Bamboo, Docker -
Rakuten Ichiba development Automation show case - Bamboo, Docker -Rakuten Group, Inc.
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Patrick Chanezon
 

Similar a PyconUK-2015 (20)

ansible_rhel_90.pdf
ansible_rhel_90.pdfansible_rhel_90.pdf
ansible_rhel_90.pdf
 
Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data Science
 
TIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container worldTIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container world
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Amazon Web Services and Docker: from developing to production
Amazon Web Services and Docker: from developing to productionAmazon Web Services and Docker: from developing to production
Amazon Web Services and Docker: from developing to production
 
Docker včera, dnes a zítra
Docker včera, dnes a zítraDocker včera, dnes a zítra
Docker včera, dnes a zítra
 
Docker and the Container Revolution
Docker and the Container RevolutionDocker and the Container Revolution
Docker and the Container Revolution
 
Engineering Presentation for Careers@Directi
Engineering Presentation for Careers@DirectiEngineering Presentation for Careers@Directi
Engineering Presentation for Careers@Directi
 
0507 057 01 98 * Adana Klima Tamir Servisi
0507 057 01 98 * Adana Klima Tamir Servisi0507 057 01 98 * Adana Klima Tamir Servisi
0507 057 01 98 * Adana Klima Tamir Servisi
 
Docker Container As A Service - JAX 2016
Docker Container As A Service - JAX 2016Docker Container As A Service - JAX 2016
Docker Container As A Service - JAX 2016
 
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
 
Redispresentation apac2012
Redispresentation apac2012Redispresentation apac2012
Redispresentation apac2012
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
 
ASP.NET Core and Docker
ASP.NET Core and DockerASP.NET Core and Docker
ASP.NET Core and Docker
 
Docker Platform and Ecosystem
Docker Platform and EcosystemDocker Platform and Ecosystem
Docker Platform and Ecosystem
 
Docker Multi-arch All The Things
Docker Multi-arch All The ThingsDocker Multi-arch All The Things
Docker Multi-arch All The Things
 
Rakuten Ichiba development Automation show case - Bamboo, Docker -
Rakuten Ichiba development Automation show case - Bamboo, Docker -Rakuten Ichiba development Automation show case - Bamboo, Docker -
Rakuten Ichiba development Automation show case - Bamboo, Docker -
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 

PyconUK-2015

  • 1. Resilient Data Pipelines with Docker and Docker- Compose Heeren Sharma Software Engineer @Cliqz heeren@cliqz.com @heerensharma
  • 2. 80+ - Team size! ! 500,000 - DAU! ! 3 Million+ - Downloads (Germany only)! ! 1 billion+ - Indexed pages (We do not believe in indexing the web.)! ! 5 TB - In-Memory indexed (Based on open source and in-house build NoSQL stores.)
  • 3. Storyfile FROM Introduction RUN Data-pipelines RUN whats-docker ADD use-case ./wrap-up CMD ./demo
  • 4. “Data and data everywhere” “Design and development of Data Pipelines” “Cloud deployment has its own needs”
  • 5. One’s processed Data is another system’s input data
  • 6. What’s that whale named Docker ? "Docker allows you to package an application with all of its dependencies into a standardised unit for software development."
  • 7. A little bit more Docker • Dockerfile ! FROM ubuntu:14.04! RUN pip install requests! ADD . /awesome-code! WORKDIR /awesome-code! CMD python revolutionary_app.py $ docker build -t=“your-awesome-image:v1” . $ docker run -d -p 80:5000 —name container-name your-awesome-image:v1 $ docker push your-awesome-image:v1 $ docker pull myregistry.com:8080/your-awesome-image:v1 $ docker ps! ! $ docker inspect <container-name/ID>! ! $ docker log <container-name/ID>! ! $ docker stop <container-name/ID>! ! $ docker exec -it <container-name/ID> your_command!
  • 8. Simplified Use Case • Streaming data from different sources - Twitter, FB, feeds and customised scraping engine. • Different processing engines • Fast iterations over new requirements • Resilient system with focus over easy deployment
  • 9. Use case - News Articles • Trending news articles from different domains • News Categorisation • Relevant news over search query • Traffic of news content fluctuates - pretty dynamic. • There is no universal right answer
  • 10. System Design (1) Data Stream Processing Engine Processed Data (Redis) $ docker run -d —-name processed_data redismaster:v1 redis-server $ docker build -t=“redismaster:v1” . $ docker build -t=“datastream:twitter” . $ docker run -d —-name twitter_queue datastream:twitter python /code/format_data.py $ docker build -t —-name processing_engine .! $ docker run -d —-name magic_powerhouse —-link processed_data:db processing_engine python /code/ magic_script.py
  • 11. New State of design
  • 15. docker-compose to rescue queue:! build: .! command: python /news-swimlane/twitter-queue/read_queue.py! volumes:! - /ebs/data:/data! environment:! DATA_DIR: /data! PYTHONPATH: /news-swimlane! restart: always! update:! build: .! command: python /news-swimlane/server-redis/build_redis_exc.py! volumes:! - /ebs/data:/data! links:! - redismaster! - fetcher! environment:! REDIS_HOST: redismaster_1! DATA_DIR: /data! PYTHONPATH: /news-swimlane! restart: always! redismaster:! build: redis3-container! ports:! - "6379:6379"! volumes:! - /ebs/backup:/data! restart: always! command: /usr/local/bin/redis-server /master.conf! fetcher:! build: .! command: python /news-swimlane/fetcher/server_fetcher.py! ports:! - "80:5000"!
  • 16. Key points: Design & Development • Micro - Services oriented design and henceforth development • Old/new components can be realised in form of docker containers • Different containers can readily interact among each other • Ease to test (Local environment) and no worries if it bursts out in production.
  • 17. Deployment • Just install docker on remote (to be done with care) • Docker images can be pushed to remote repository (better say registry). • Make your instances autoscale in any cloud IaaS. • If instance go down, then new instance just pull docker images, and start docker containers.
  • 18. Resources • Docker - https://www.docker.com/whatisdocker • Boot2docker - https://docs.docker.com/installation/mac/ • Docker-compose - https://docs.docker.com/compose/ • Very short video about Docker (American Style) - https:// www.youtube.com/watch?v=aLipr7tTuA4 - American style • https://www.youtube.com/watch?v=FdkNAjjO5yQ - Good resource to have a little deep insight about Containers