SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Hadoop 101 ETL
+ Automation Smackdown
Learning Big Data: 

	 Which approach makes me the most valuable as developer?
Bio - Pete Carapetyan
• Java dev last 15 years, dev 20 years
• Grew up automating in a different industry
• Apparent obsession with systems & automation
• Since 2000 as dataFundamentals, now 2 man shop
Special Skills - Special Snowflakes
• Let me show you these Hadoop & Avro skills.
• Then, we code for the special snowflakes. (data)
• Thus we are more valuable, and can up our bill rates!
• This is Approach #1: Manual or Special Snowflake
My 2013
Manual Hadoop
Story
• 15 ETL jobs [Partial scope]

• Brilliant, ninja level team

• 1 year of competitive NIH* 

copy paste spaghetti coding -
AKA special snowflake
approach

• Not a fun year
*NIH: Not Invented Here
[Demo Basics of ETL Job]
Special Snowflake Approach:	Human drama!
What limitations of this manual 

special skills special snowflakes

approach do we observe?
How To Un-Pack Either Approach?
What if we remove the human drama?
Now, what happens if we automate?
Automated Approach
Carrie
Our own internal project for
automating big data.



Name inspired by the horror film…
Also inspired by 

The Phoenix Project
• Results, not drama

• Focus only on bottleneck

• Brent as bottleneck
On Brent
• Brent is a team’s best asset!
Brent is a ninja.

• Brent is my dark side only
when treating every situation
like a special snowflake.

• Brent enjoys the attention.

• Brent is not the drama queen,
others bring the drama to him.
Brent?
Automation Basics
1. Brent spends time on clean
design, not NIH*

• [Camel] - Integration Server
2. Brent automates the rule,
codes the exception

• Apply metadata to templates
• Automated VM dev infrastructure
* NIH: Not Invented Here
Demo Clean
• Clean project folder

• Clean hadoop file system

• Clean hadoop DDL
https://www.youtube.com/watch?v=qR7XTzv5P_M&index=2&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
Later Demo
Integration Server
• Raw linux OS (Centos)

• Java

• Maven

• Ruby

• networking

• maven repo - binaries

• [created with vagrant]
https://www.youtube.com/watch?v=xgheERvulqw&index=3&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
Demo Metadata
Collection
• Simple properties

• Collected using a cheesy UI

• UI written in Ruby
Demo Generated
Code
• Camel ETL binary

• OSGi, versioned, modular jar

• Only 3 primary outputs!

• simple

• clean

• well designed (?)

• JUnit/integration tested

• Supporting scripting

• messy
Demo Server Deploy
• One line deploy/run command

• Compiles on server with Maven

• Also runnable as jar
Does it work?
• Make custom file

• Drop into ETL folder

• Inspect
Demo - Review
• Schema created

• DDL run

• Avro binary (JSON) transform

• Data Migration

• FTP to server

• Into HDFS partition

• Alter Table: Date Partition
Transform to Avro
• Not detailed in this talk

• Demo’d here as a binary

• Code listed at end of talk
Modular Binaries
• Each ETL

• Own binary, OSGi

• Own codebase

• Fully versioned

• Fully customizable after
generation

• Runs alone or as part of Camel
container(s)

• Tests on build

• Contains own supporting
scripts
Takeaways
• Brent coding the exception manually, rule by template.
• Brent has time to focus on design.
• Brent may lose some amount of desired attention :(
• Resulting code is
• clean
• consistent, easy to maintain
• But is there a Home Run?
• defined as not possible via special snowflake approach
Home Run 1: Infrastructure As Code Demo
• [Jeff]
Home Run 2: Big Data, Beyond Hadoop!
1. Pick your provider
• Hadoop
• Cassandra
• Couchbase
• etc
2. Adopt your templates,
VMs, etc
Home Run 3: Idempotent Effort
• Idempotent effort? Each subsequent run doesn’t have bad effect.
• Walkup - The 10 minute test
• Walkaway - Requirements
• Features
• Testing, technical debt, already in place for code
• VMs and recipes for dev, test, prod
• OSGi etc modularity for binaries
• Does what we see here pass this test?
What to leave with
• De-mystify: how to Avro/Hadoop a delimited file
• Review motives for automating this process
• Code automation basics
• Infrastructure automation basics
• Code for above
Further Hadoop Tutuorial Resources
• Hortonworks
• best free stuff? Except networking vas
• Cloudera
• Lots but appear to prefer to get paid
• Apache Hadoop
• haven’t tried but it is Apache
Wish To See More?
• In office demos
• Your data
Code, Content, Contacts
• This Slide Deck: http://www.slideshare.net/datafundamentals/hadoop-big-data-35762308
• or just remember slideshare.net/datafundamentals it may be the only one there
• Youtube - 11 minute version of code demo - https://www.youtube.com/playlist?list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
• Dev Code
• Carrie (ruby UI and generator) https://github.com/datafundamentals/df_ui_carrie
• Avro from delimited https://bitbucket.org/datafundamentals/avro_from_delimited
• Camel-Avro https://bitbucket.org/datafundamentals/camel-avro-etl
• Ops Code - cookbook recipes
• https://github.com/datafundamentals
• Contact
• pete@datafundamentals.com, jeff@datafundamentals.com
Be careful out there!
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown
Hadoop 101 ETL Automation Smackdown

Más contenido relacionado

La actualidad más candente

Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Fwdays
 
Write Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js MunichWrite Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js MunichMike North
 
Integration Testing with Selenium
Integration Testing with SeleniumIntegration Testing with Selenium
Integration Testing with SeleniumAll Things Open
 
Webcomponents are your frameworks best friend
Webcomponents are your frameworks best friendWebcomponents are your frameworks best friend
Webcomponents are your frameworks best friendFilip Bruun Bech-Larsen
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2Wen-Tien Chang
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuningJohn McCaffrey
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationMaarten Dirkse
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
Python to go
Python to goPython to go
Python to goWeng Wei
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on RailsJohn McCaffrey
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performanceMike North
 
Capybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automationCapybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automationCOMAQA.BY
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Derek Jacoby
 

La actualidad más candente (20)

Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
 
Write Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js MunichWrite Once, Run Everywhere - Ember.js Munich
Write Once, Run Everywhere - Ember.js Munich
 
Cloud tools
Cloud toolsCloud tools
Cloud tools
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Integration Testing with Selenium
Integration Testing with SeleniumIntegration Testing with Selenium
Integration Testing with Selenium
 
Webcomponents are your frameworks best friend
Webcomponents are your frameworks best friendWebcomponents are your frameworks best friend
Webcomponents are your frameworks best friend
 
Frameworks and webcomponents
Frameworks and webcomponentsFrameworks and webcomponents
Frameworks and webcomponents
 
淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2淺談 Startup 公司的軟體開發流程 v2
淺談 Startup 公司的軟體開發流程 v2
 
CI/CD at bol.com
CI/CD at bol.comCI/CD at bol.com
CI/CD at bol.com
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
Coscup
CoscupCoscup
Coscup
 
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
 
bol.com Dutch Container Day presentation
bol.com Dutch Container Day presentationbol.com Dutch Container Day presentation
bol.com Dutch Container Day presentation
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Python to go
Python to goPython to go
Python to go
 
Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on Rails
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performance
 
Capybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automationCapybara + RSpec - ruby dsl-based web ui qa automation
Capybara + RSpec - ruby dsl-based web ui qa automation
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
 

Destacado

ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1pageใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1pagePrachoom Rangkasikorn
 
2 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 12 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 1Heimer Perez
 
гимназисты 5 б
гимназисты 5 бгимназисты 5 б
гимназисты 5 бOlga Gorbenko
 
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...Innovation Enterprise
 
Year 9
Year 9Year 9
Year 9hodder
 
Web design winter start
Web design  winter startWeb design  winter start
Web design winter startKonrad Roeder
 
Affordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydneyAffordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydneysmtwastebrokers
 
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)Dr Dev Kambhampati
 

Destacado (15)

Easter.b
Easter.bEaster.b
Easter.b
 
Pril 1
Pril 1Pril 1
Pril 1
 
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1pageใบความรู้  กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
ใบความรู้ กลุ่มทางเศรษฐกิจ+497+dltvsocp6+54soc p06 f26-1page
 
2 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 12 diarecreacionalcomfandi10 1
2 diarecreacionalcomfandi10 1
 
Institute of Clinical Research India
Institute of Clinical Research IndiaInstitute of Clinical Research India
Institute of Clinical Research India
 
гимназисты 5 б
гимназисты 5 бгимназисты 5 б
гимназисты 5 б
 
Rassegna Stampa2_AZ Holding
Rassegna Stampa2_AZ HoldingRassegna Stampa2_AZ Holding
Rassegna Stampa2_AZ Holding
 
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
Strategic Case Study: Investment Optimisation for Executives using Big Data, ...
 
Resume
ResumeResume
Resume
 
Pedestriantv media-kit-2013
Pedestriantv media-kit-2013Pedestriantv media-kit-2013
Pedestriantv media-kit-2013
 
Year 9
Year 9Year 9
Year 9
 
Web design winter start
Web design  winter startWeb design  winter start
Web design winter start
 
Affordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydneyAffordable e waste recycling for the marketing agencies of sydney
Affordable e waste recycling for the marketing agencies of sydney
 
Ziegler Portfolio 2014
Ziegler Portfolio 2014Ziegler Portfolio 2014
Ziegler Portfolio 2014
 
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
Dr Dev Kambhampati | Cosmetics & Toiletries Market Size (by Country)
 

Similar a Hadoop 101 ETL Automation Smackdown

Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael CollinsDevopsdays
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Hannes Lowette
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchExcella
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchHoward Greenberg
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsDaniel Stange
 
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirstBuilding CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirstJun-ichi Sakamoto
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflowTomas Doran
 
Open stack jobs avoiding the axe
Open stack jobs   avoiding the axeOpen stack jobs   avoiding the axe
Open stack jobs avoiding the axeJim Leitch
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesJonathan Klein
 
Simplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolSimplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolDataWorks Summit
 
Test Automation with Twist and Sahi
Test Automation with Twist and SahiTest Automation with Twist and Sahi
Test Automation with Twist and Sahiericjamesblackburn
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneDashlane
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Junichi Ishida
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...ConSol Consulting & Solutions Software GmbH
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...ConSol Consulting & Solutions Software GmbH
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyKelly Looney
 

Similar a Hadoop 101 ETL Automation Smackdown (20)

Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
 
Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®Build software like a bag of marbles, not a castle of LEGO®
Build software like a bag of marbles, not a castle of LEGO®
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from Scratch
 
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's WorkbenchAugust Webinar - Water Cooler Talks: A Look into a Developer's Workbench
August Webinar - Water Cooler Talks: A Look into a Developer's Workbench
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce Apps
 
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirstBuilding CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
Building CLR/H Registration Site with ASP.NET MVC4 and EF4CodeFirst
 
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow Puppet Camp New York 2014: Streamlining Puppet Development Workflow
Puppet Camp New York 2014: Streamlining Puppet Development Workflow
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
 
Open stack jobs avoiding the axe
Open stack jobs   avoiding the axeOpen stack jobs   avoiding the axe
Open stack jobs avoiding the axe
 
BTV PHP - Building Fast Websites
BTV PHP - Building Fast WebsitesBTV PHP - Building Fast Websites
BTV PHP - Building Fast Websites
 
Simplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query ToolSimplifying Use of Hive with the Hive Query Tool
Simplifying Use of Hive with the Hive Query Tool
 
Stackato
StackatoStackato
Stackato
 
From Heroku to Amazon AWS
From Heroku to Amazon AWSFrom Heroku to Amazon AWS
From Heroku to Amazon AWS
 
Test Automation with Twist and Sahi
Test Automation with Twist and SahiTest Automation with Twist and Sahi
Test Automation with Twist and Sahi
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
Continuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at DashlaneContinuous Delivery: releasing Better and Faster at Dashlane
Continuous Delivery: releasing Better and Faster at Dashlane
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
 
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen..."Using Automation Tools To Deploy And Operate Applications In Real World Scen...
"Using Automation Tools To Deploy And Operate Applications In Real World Scen...
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
 

Último

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 

Último (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Hadoop 101 ETL Automation Smackdown

  • 1. Hadoop 101 ETL + Automation Smackdown Learning Big Data: Which approach makes me the most valuable as developer?
  • 2. Bio - Pete Carapetyan • Java dev last 15 years, dev 20 years • Grew up automating in a different industry • Apparent obsession with systems & automation • Since 2000 as dataFundamentals, now 2 man shop
  • 3. Special Skills - Special Snowflakes • Let me show you these Hadoop & Avro skills. • Then, we code for the special snowflakes. (data) • Thus we are more valuable, and can up our bill rates! • This is Approach #1: Manual or Special Snowflake
  • 4. My 2013 Manual Hadoop Story • 15 ETL jobs [Partial scope] • Brilliant, ninja level team • 1 year of competitive NIH* 
 copy paste spaghetti coding - AKA special snowflake approach • Not a fun year *NIH: Not Invented Here
  • 5. [Demo Basics of ETL Job]
  • 6. Special Snowflake Approach: Human drama! What limitations of this manual 
 special skills special snowflakes
 approach do we observe?
  • 7. How To Un-Pack Either Approach? What if we remove the human drama?
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Now, what happens if we automate? Automated Approach
  • 15. Carrie Our own internal project for automating big data.
 
 Name inspired by the horror film…
  • 16.
  • 17. Also inspired by 
 The Phoenix Project • Results, not drama • Focus only on bottleneck • Brent as bottleneck
  • 18. On Brent • Brent is a team’s best asset! Brent is a ninja. • Brent is my dark side only when treating every situation like a special snowflake. • Brent enjoys the attention. • Brent is not the drama queen, others bring the drama to him. Brent?
  • 19. Automation Basics 1. Brent spends time on clean design, not NIH* • [Camel] - Integration Server 2. Brent automates the rule, codes the exception • Apply metadata to templates • Automated VM dev infrastructure * NIH: Not Invented Here
  • 20. Demo Clean • Clean project folder • Clean hadoop file system • Clean hadoop DDL https://www.youtube.com/watch?v=qR7XTzv5P_M&index=2&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
  • 21. Later Demo Integration Server • Raw linux OS (Centos) • Java • Maven • Ruby • networking • maven repo - binaries • [created with vagrant] https://www.youtube.com/watch?v=xgheERvulqw&index=3&list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe
  • 22. Demo Metadata Collection • Simple properties • Collected using a cheesy UI • UI written in Ruby
  • 23. Demo Generated Code • Camel ETL binary • OSGi, versioned, modular jar • Only 3 primary outputs! • simple • clean • well designed (?) • JUnit/integration tested • Supporting scripting • messy
  • 24. Demo Server Deploy • One line deploy/run command • Compiles on server with Maven • Also runnable as jar
  • 25. Does it work? • Make custom file • Drop into ETL folder • Inspect
  • 26. Demo - Review • Schema created • DDL run • Avro binary (JSON) transform • Data Migration • FTP to server • Into HDFS partition • Alter Table: Date Partition
  • 27. Transform to Avro • Not detailed in this talk • Demo’d here as a binary • Code listed at end of talk
  • 28. Modular Binaries • Each ETL • Own binary, OSGi • Own codebase • Fully versioned • Fully customizable after generation • Runs alone or as part of Camel container(s) • Tests on build • Contains own supporting scripts
  • 29. Takeaways • Brent coding the exception manually, rule by template. • Brent has time to focus on design. • Brent may lose some amount of desired attention :( • Resulting code is • clean • consistent, easy to maintain • But is there a Home Run? • defined as not possible via special snowflake approach
  • 30. Home Run 1: Infrastructure As Code Demo • [Jeff]
  • 31. Home Run 2: Big Data, Beyond Hadoop! 1. Pick your provider • Hadoop • Cassandra • Couchbase • etc 2. Adopt your templates, VMs, etc
  • 32. Home Run 3: Idempotent Effort • Idempotent effort? Each subsequent run doesn’t have bad effect. • Walkup - The 10 minute test • Walkaway - Requirements • Features • Testing, technical debt, already in place for code • VMs and recipes for dev, test, prod • OSGi etc modularity for binaries • Does what we see here pass this test?
  • 33. What to leave with • De-mystify: how to Avro/Hadoop a delimited file • Review motives for automating this process • Code automation basics • Infrastructure automation basics • Code for above
  • 34. Further Hadoop Tutuorial Resources • Hortonworks • best free stuff? Except networking vas • Cloudera • Lots but appear to prefer to get paid • Apache Hadoop • haven’t tried but it is Apache
  • 35. Wish To See More? • In office demos • Your data
  • 36. Code, Content, Contacts • This Slide Deck: http://www.slideshare.net/datafundamentals/hadoop-big-data-35762308 • or just remember slideshare.net/datafundamentals it may be the only one there • Youtube - 11 minute version of code demo - https://www.youtube.com/playlist?list=PLO_T9AjxEaYeByfqBqHVCmg4GbLFkYCJe • Dev Code • Carrie (ruby UI and generator) https://github.com/datafundamentals/df_ui_carrie • Avro from delimited https://bitbucket.org/datafundamentals/avro_from_delimited • Camel-Avro https://bitbucket.org/datafundamentals/camel-avro-etl • Ops Code - cookbook recipes • https://github.com/datafundamentals • Contact • pete@datafundamentals.com, jeff@datafundamentals.com Be careful out there!