SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Using Hadoop stack to build a cloud VAT
declarations revising service
Alex Chistyakov
Git in Sky
Grodno, LVEE 2016
Who I am
●
Hello, my name is Alex
●
Principal Engineer @ Git in Sky
●
Hadoop operations engineer
●
Former Java developer (not only Java and not so
“former” in fact)
Who are you?
●
Linux and OSS enthusiasts?
●
Software developers?
●
DevOps engineers?
●
Big data guys?
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
●
3) ???
Well, what is this all about?
●
Configuring a Hadoop/HBase cluster is easy
●
1) Buy a lot of hardware
●
2) Configure the bloody cluster!
●
3) ???
●
4) PROFIT!!!
Big Data is hard!
●
A customer wants a number of environments for
different purposes (dev, testing, staging &
production)
●
DevOps culture requires repeatability!
●
(Observe a beautiful snowflake to the right)
●
Business wants to reduce costs
So, we need a detailed plan
●
1) Buy an enterprise subscription from Oracle
So, we need a detailed plan
●
1) Buy an enterprise subscription from Oracle
●
^ FAIL!
So, we need a detailed plan
●
1) Read the manual on the product site
So, we need a detailed plan
●
1) Read the manual on the product site
●
2) Configure everything manually
So, we need a detailed plan
●
1) Read the manual on the product site
●
2) Configure everything manually
●
^ FAIL!
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
●
3) Don’t forget to buy an enterprise subscription
So, we need a detailed plan
●
1) Take Cloudera distribution of Hadoop
●
2) Configure everything from a web interface
●
3) Don’t forget to buy an enterprise subscription
●
4) ^ MULTIPLE FAILS!!!
A word on proprietary software
●
Proprietary software is full of nasty bugs, period
A word on open source software
●
Open source software is awesome
Software market in 2016
●
It’s not “proprietary vs open source”
Software market in 2016
●
It’s not “proprietary vs open source”
●
It’s “open source vs open source”
Open source vs open source
●
Cloudera CDH vs vanilla Apache
So, we need a detailed plan
●
1) Hire a DevOps engineer
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
●
4) ???
So, we need a detailed plan
●
1) Hire a DevOps engineer
●
2) Use Chef or something
●
3) Automate all the things
●
4) ???
●
5) PROFIT!!!
100 reasons not to use Cloudera CDH
●
Cloudera CDH obscures configuration
●
Cloudera CDH generates textual configs from the DB
●
Cloudera CDH is web-interface centric
●
Cloudera CDH is a monolith with a vendor lock-in
Our own little open source product
●
Based on Ansible (Ansible is like Chef but awesome)
●
https://github.com/gitinsky/ansible-hadoop-stack-howto
●
https://github.com/gitinsky/ansible-role-*
Problems
●
Lack of documentation
Problems
●
Lack of documentation
●
Lack of manpower
Problems
●
Lack of documentation
●
Lack of manpower
●
Nobody uses our product (except us)
What about the VAT service thing?
●
Forget it, it’s not that relevant
Conclusions
●
Open source software is awesome
●
But Cloudera CDH is not
●
We can make open source software better
So long, and thanks for all the fish!
●
Ask your questions please
●
Alex Chistyakov, Principal Engineer @ Git in Sky
●
http://gitinsky.com
●
alex@gitinsky.com
●
http://meetup.com/DevOps-40

Más contenido relacionado

La actualidad más candente

Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's life
Mariano Iglesias
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Holden Karau
 

La actualidad más candente (20)

How Badoo Saved $1M Switching to PHP7 - Nikolay Krapivnyy - PHPDay Verona 2016
How Badoo Saved $1M Switching to PHP7 - Nikolay Krapivnyy - PHPDay Verona 2016How Badoo Saved $1M Switching to PHP7 - Nikolay Krapivnyy - PHPDay Verona 2016
How Badoo Saved $1M Switching to PHP7 - Nikolay Krapivnyy - PHPDay Verona 2016
 
Automating MySQL operations with Puppet
Automating MySQL operations with PuppetAutomating MySQL operations with Puppet
Automating MySQL operations with Puppet
 
Http2 on go1.6rc2
Http2 on go1.6rc2Http2 on go1.6rc2
Http2 on go1.6rc2
 
Pipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as CodePipeline as code for your infrastructure as Code
Pipeline as code for your infrastructure as Code
 
Pipeline as Code
Pipeline as CodePipeline as Code
Pipeline as Code
 
Random tips that will save your project's life
Random tips that will save your project's lifeRandom tips that will save your project's life
Random tips that will save your project's life
 
Git+jenkins+rex presentation
Git+jenkins+rex presentationGit+jenkins+rex presentation
Git+jenkins+rex presentation
 
働きやすい社内を目指す!二酸化炭素計測ツール
働きやすい社内を目指す!二酸化炭素計測ツール働きやすい社内を目指す!二酸化炭素計測ツール
働きやすい社内を目指す!二酸化炭素計測ツール
 
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...
Open-Source Analytics Stack on MongoDB, with Schema, Pierre-Alain Jachiet and...
 
Server side swift
Server side swiftServer side swift
Server side swift
 
TDD with Spock @xpdays_ua
TDD with Spock @xpdays_uaTDD with Spock @xpdays_ua
TDD with Spock @xpdays_ua
 
Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...
 
kikstart journey of Golang with Hello world - Gopherlabs
kikstart journey of Golang with Hello world - Gopherlabs kikstart journey of Golang with Hello world - Gopherlabs
kikstart journey of Golang with Hello world - Gopherlabs
 
Chef Conf 2015: Package Management & Chef
Chef Conf 2015: Package Management & ChefChef Conf 2015: Package Management & Chef
Chef Conf 2015: Package Management & Chef
 
Ruby, the language of devops
Ruby, the language of devopsRuby, the language of devops
Ruby, the language of devops
 
Thank you @miyagawa!
Thank you @miyagawa!Thank you @miyagawa!
Thank you @miyagawa!
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
 
Python for IoT, A return of experience
Python for IoT, A return of experiencePython for IoT, A return of experience
Python for IoT, A return of experience
 
Functional MCU programming
Functional MCU programmingFunctional MCU programming
Functional MCU programming
 
Do it Yourself Testing
Do it Yourself TestingDo it Yourself Testing
Do it Yourself Testing
 

Destacado

Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
Alex Chistyakov
 
Бинарные (файловые) хранилища- страшная сказка с мрачным концом
Бинарные (файловые) хранилища- страшная сказка с мрачным концомБинарные (файловые) хранилища- страшная сказка с мрачным концом
Бинарные (файловые) хранилища- страшная сказка с мрачным концом
Daniel Podolsky
 

Destacado (20)

My talk at CEE-SECR 2016
My talk at CEE-SECR 2016My talk at CEE-SECR 2016
My talk at CEE-SECR 2016
 
My talk at YouCon Saratov 2016
My talk at YouCon Saratov 2016My talk at YouCon Saratov 2016
My talk at YouCon Saratov 2016
 
My talk on HBase ops engineering at TBD Jun 2016
My talk on HBase ops engineering at TBD Jun 2016My talk on HBase ops engineering at TBD Jun 2016
My talk on HBase ops engineering at TBD Jun 2016
 
My talk at Linux Piter 2016
My talk at Linux Piter 2016My talk at Linux Piter 2016
My talk at Linux Piter 2016
 
My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016
 
My talk on monitoring systems at RootConf 2016
My talk on monitoring systems at RootConf 2016My talk on monitoring systems at RootConf 2016
My talk on monitoring systems at RootConf 2016
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015
 
Мой modern Perl (весенняя встреча Piter United)
Мой modern Perl (весенняя встреча Piter United)Мой modern Perl (весенняя встреча Piter United)
Мой modern Perl (весенняя встреча Piter United)
 
Презентация про DTrace на ADDconf в Минске
Презентация про DTrace на ADDconf в МинскеПрезентация про DTrace на ADDconf в Минске
Презентация про DTrace на ADDconf в Минске
 
Выступление в DataArt на тему "Кто такие DevOps?"
Выступление в DataArt на тему "Кто такие DevOps?"Выступление в DataArt на тему "Кто такие DevOps?"
Выступление в DataArt на тему "Кто такие DevOps?"
 
Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
Диалог с воображаемым слушателем, а также поток сознания, вне контекста НЕ ИН...
 
Optimization of a big PostgreSQL database
Optimization of a big PostgreSQL databaseOptimization of a big PostgreSQL database
Optimization of a big PostgreSQL database
 
Go и fuse
Go и fuseGo и fuse
Go и fuse
 
Бинарные (файловые) хранилища- страшная сказка с мрачным концом
Бинарные (файловые) хранилища- страшная сказка с мрачным концомБинарные (файловые) хранилища- страшная сказка с мрачным концом
Бинарные (файловые) хранилища- страшная сказка с мрачным концом
 
Спасение 6 миллионов файлов в условиях полного Хецнера
Спасение 6 миллионов файлов в условиях полного ХецнераСпасение 6 миллионов файлов в условиях полного Хецнера
Спасение 6 миллионов файлов в условиях полного Хецнера
 
опыт построения и эксплуатации большого файлового хранилища
опыт построения и эксплуатации большого файлового хранилищаопыт построения и эксплуатации большого файлового хранилища
опыт построения и эксплуатации большого файлового хранилища
 
DevOps-40 meetup #7, Project FiFo
DevOps-40 meetup #7, Project FiFoDevOps-40 meetup #7, Project FiFo
DevOps-40 meetup #7, Project FiFo
 
My talk on LeoFS, HappyDev 2014
My talk on LeoFS, HappyDev 2014My talk on LeoFS, HappyDev 2014
My talk on LeoFS, HappyDev 2014
 
Chef, Puppet, Salt, Ansible on SECON 2014
Chef, Puppet, Salt, Ansible on SECON 2014Chef, Puppet, Salt, Ansible on SECON 2014
Chef, Puppet, Salt, Ansible on SECON 2014
 
Ryazan
RyazanRyazan
Ryazan
 

Similar a My talk at LVEE 2016

Similar a My talk at LVEE 2016 (20)

Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
 
Container Days
Container DaysContainer Days
Container Days
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Continuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageContinuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritage
 
Moby is killing your devops efforts
Moby is killing your devops effortsMoby is killing your devops efforts
Moby is killing your devops efforts
 
GeoServer Developers Workshop
GeoServer Developers WorkshopGeoServer Developers Workshop
GeoServer Developers Workshop
 
Deploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremDeploying your SaaS stack OnPrem
Deploying your SaaS stack OnPrem
 
Go with the Flow - A Guide to a WordPress Workflow
Go with the Flow - A Guide to a WordPress WorkflowGo with the Flow - A Guide to a WordPress Workflow
Go with the Flow - A Guide to a WordPress Workflow
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
 
Groovy there's a docker in my application pipeline
Groovy there's a docker in my application pipelineGroovy there's a docker in my application pipeline
Groovy there's a docker in my application pipeline
 
OSMC 2017 | Groovy There is a Docker in my Dashing Pipeline by Kris Buytaert
OSMC 2017 | Groovy There is a Docker in my Dashing Pipeline by Kris Buytaert OSMC 2017 | Groovy There is a Docker in my Dashing Pipeline by Kris Buytaert
OSMC 2017 | Groovy There is a Docker in my Dashing Pipeline by Kris Buytaert
 
Go Revel Gooo...
Go Revel Gooo...Go Revel Gooo...
Go Revel Gooo...
 
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas BhagatOSDC 2018 | Migrating to the cloud by Devdas Bhagat
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
 
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud RunDesigning flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
 
Block Storage Updates - Juno Edition
Block Storage Updates - Juno EditionBlock Storage Updates - Juno Edition
Block Storage Updates - Juno Edition
 
OpenStack Compute - Juno Updates
OpenStack Compute - Juno UpdatesOpenStack Compute - Juno Updates
OpenStack Compute - Juno Updates
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
The Return of the Dull Stack Engineer
The Return of the Dull Stack EngineerThe Return of the Dull Stack Engineer
The Return of the Dull Stack Engineer
 

Más de Alex Chistyakov

Más de Alex Chistyakov (20)

My slides from DevOpsDays 2019
My slides from DevOpsDays 2019My slides from DevOpsDays 2019
My slides from DevOpsDays 2019
 
My slides from BMM №3 May 2019
My slides from BMM №3 May 2019My slides from BMM №3 May 2019
My slides from BMM №3 May 2019
 
My slides from DevOps-40 meetup Jun 2019
My slides from DevOps-40 meetup Jun 2019 My slides from DevOps-40 meetup Jun 2019
My slides from DevOps-40 meetup Jun 2019
 
My slides from SECR'2018
My slides from SECR'2018My slides from SECR'2018
My slides from SECR'2018
 
My slides from the first SPb SRE community meetup at DataArt
My slides from the first SPb SRE community meetup at DataArtMy slides from the first SPb SRE community meetup at DataArt
My slides from the first SPb SRE community meetup at DataArt
 
My slides from CC'2019
My slides from CC'2019My slides from CC'2019
My slides from CC'2019
 
My slides from BMM №4 Nov 2019
My slides from BMM №4 Nov 2019My slides from BMM №4 Nov 2019
My slides from BMM №4 Nov 2019
 
My slides from DevOps-40 meetup Oct 2019
My slides from DevOps-40 meetup Oct 2019My slides from DevOps-40 meetup Oct 2019
My slides from DevOps-40 meetup Oct 2019
 
My slides from DevOps-40 meetup Dec 2019
My slides from DevOps-40 meetup Dec 2019My slides from DevOps-40 meetup Dec 2019
My slides from DevOps-40 meetup Dec 2019
 
Configuration management and Kubernetes
Configuration management and KubernetesConfiguration management and Kubernetes
Configuration management and Kubernetes
 
Ansible and other stuff
Ansible and other stuffAnsible and other stuff
Ansible and other stuff
 
Python performance engineering in 2017
Python performance engineering in 2017Python performance engineering in 2017
Python performance engineering in 2017
 
My talk at SPb SQA sub-meetup of ITGM
My talk at SPb SQA sub-meetup of ITGMMy talk at SPb SQA sub-meetup of ITGM
My talk at SPb SQA sub-meetup of ITGM
 
My talk at SECR 2017
My talk at SECR 2017My talk at SECR 2017
My talk at SECR 2017
 
On scaling teams
On scaling teamsOn scaling teams
On scaling teams
 
MariaDB workshop
MariaDB workshopMariaDB workshop
MariaDB workshop
 
Docker for JS people
Docker for JS peopleDocker for JS people
Docker for JS people
 
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
My talk on DevOps engineer's adventures in the Windows world at UWDC 2017
 
My talk on GitHub open data at ITGM #10
 My talk on GitHub open data at ITGM #10 My talk on GitHub open data at ITGM #10
My talk on GitHub open data at ITGM #10
 
My talk on DevOps :) at Stachka 2017
My talk on DevOps :) at Stachka 2017My talk on DevOps :) at Stachka 2017
My talk on DevOps :) at Stachka 2017
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

My talk at LVEE 2016

  • 1. Using Hadoop stack to build a cloud VAT declarations revising service Alex Chistyakov Git in Sky Grodno, LVEE 2016
  • 2. Who I am ● Hello, my name is Alex ● Principal Engineer @ Git in Sky ● Hadoop operations engineer ● Former Java developer (not only Java and not so “former” in fact)
  • 3. Who are you? ● Linux and OSS enthusiasts? ● Software developers? ● DevOps engineers? ● Big data guys?
  • 4. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy
  • 5. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware
  • 6. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster!
  • 7. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ???
  • 8. Well, what is this all about? ● Configuring a Hadoop/HBase cluster is easy ● 1) Buy a lot of hardware ● 2) Configure the bloody cluster! ● 3) ??? ● 4) PROFIT!!!
  • 9. Big Data is hard! ● A customer wants a number of environments for different purposes (dev, testing, staging & production) ● DevOps culture requires repeatability! ● (Observe a beautiful snowflake to the right) ● Business wants to reduce costs
  • 10. So, we need a detailed plan ● 1) Buy an enterprise subscription from Oracle
  • 11. So, we need a detailed plan ● 1) Buy an enterprise subscription from Oracle ● ^ FAIL!
  • 12. So, we need a detailed plan ● 1) Read the manual on the product site
  • 13. So, we need a detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually
  • 14. So, we need a detailed plan ● 1) Read the manual on the product site ● 2) Configure everything manually ● ^ FAIL!
  • 15. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop
  • 16. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface
  • 17. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription
  • 18. So, we need a detailed plan ● 1) Take Cloudera distribution of Hadoop ● 2) Configure everything from a web interface ● 3) Don’t forget to buy an enterprise subscription ● 4) ^ MULTIPLE FAILS!!!
  • 19. A word on proprietary software ● Proprietary software is full of nasty bugs, period
  • 20. A word on open source software ● Open source software is awesome
  • 21. Software market in 2016 ● It’s not “proprietary vs open source”
  • 22. Software market in 2016 ● It’s not “proprietary vs open source” ● It’s “open source vs open source”
  • 23. Open source vs open source ● Cloudera CDH vs vanilla Apache
  • 24. So, we need a detailed plan ● 1) Hire a DevOps engineer
  • 25. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something
  • 26. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things
  • 27. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ???
  • 28. So, we need a detailed plan ● 1) Hire a DevOps engineer ● 2) Use Chef or something ● 3) Automate all the things ● 4) ??? ● 5) PROFIT!!!
  • 29. 100 reasons not to use Cloudera CDH ● Cloudera CDH obscures configuration ● Cloudera CDH generates textual configs from the DB ● Cloudera CDH is web-interface centric ● Cloudera CDH is a monolith with a vendor lock-in
  • 30. Our own little open source product ● Based on Ansible (Ansible is like Chef but awesome) ● https://github.com/gitinsky/ansible-hadoop-stack-howto ● https://github.com/gitinsky/ansible-role-*
  • 33. Problems ● Lack of documentation ● Lack of manpower ● Nobody uses our product (except us)
  • 34. What about the VAT service thing? ● Forget it, it’s not that relevant
  • 35. Conclusions ● Open source software is awesome ● But Cloudera CDH is not ● We can make open source software better
  • 36. So long, and thanks for all the fish! ● Ask your questions please ● Alex Chistyakov, Principal Engineer @ Git in Sky ● http://gitinsky.com ● alex@gitinsky.com ● http://meetup.com/DevOps-40