Enviar búsqueda
Cargar
Nix for etl using scripting to automate data cleaning & transformation
•
Descargar como PPTX, PDF
•
1 recomendación
•
2,913 vistas
Lynchpin Analytics Consultancy
Seguir
How to use * nix Script to clean and transform large data sets. Making life easier!
Leer menos
Leer más
Datos y análisis
Denunciar
Compartir
Denunciar
Compartir
1 de 19
Descargar ahora
Recomendados
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
Theodoros Vasiloudis
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
Vasia Kalavri
Single-Pass Graph Stream Analytics with Apache Flink
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
South Tyrol Free Software Conference
Storm at Spotify
Storm at Spotify
Neville Li
Luigi future
Luigi future
Erik Bernhardsson
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
Recomendados
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
Theodoros Vasiloudis
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
Vasia Kalavri
Single-Pass Graph Stream Analytics with Apache Flink
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
South Tyrol Free Software Conference
Storm at Spotify
Storm at Spotify
Neville Li
Luigi future
Luigi future
Erik Bernhardsson
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
Jean-Paul Calbimonte
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
Cédrick Lunven
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward
Apache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
Data Analysis With Pandas
Data Analysis With Pandas
Stephan Solomonidis
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
First Flink Bay Area meetup
First Flink Bay Area meetup
Kostas Tzoumas
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
Uwe Korn
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
LINE Corporation
Introduction to Apache Flink
Introduction to Apache Flink
mxmxm
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Ververica
LINEデリマでのElasticsearchの運用と監視の話
LINEデリマでのElasticsearchの運用と監視の話
LINE Corporation
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
Kostas Tzoumas
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Sean Burton
Measure camp pres 5 cro myths
Measure camp pres 5 cro myths
Amrdeep Athwal (L.I.O.N)
Más contenido relacionado
La actualidad más candente
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
Jean-Paul Calbimonte
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
Cédrick Lunven
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward
Apache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
Data Analysis With Pandas
Data Analysis With Pandas
Stephan Solomonidis
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
First Flink Bay Area meetup
First Flink Bay Area meetup
Kostas Tzoumas
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
Uwe Korn
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
LINE Corporation
Introduction to Apache Flink
Introduction to Apache Flink
mxmxm
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Ververica
LINEデリマでのElasticsearchの運用と監視の話
LINEデリマでのElasticsearchの運用と監視の話
LINE Corporation
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
Kostas Tzoumas
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
La actualidad más candente
(20)
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Apache Flink Training: System Overview
Apache Flink Training: System Overview
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Data Analysis With Pandas
Data Analysis With Pandas
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
First Flink Bay Area meetup
First Flink Bay Area meetup
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
Introduction to Apache Flink
Introduction to Apache Flink
Predictive Datacenter Analytics with Strymon
Predictive Datacenter Analytics with Strymon
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
LINEデリマでのElasticsearchの運用と監視の話
LINEデリマでのElasticsearchの運用と監視の話
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Destacado
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Sean Burton
Measure camp pres 5 cro myths
Measure camp pres 5 cro myths
Amrdeep Athwal (L.I.O.N)
Breaking down the barriers to the use of digital analytics
Breaking down the barriers to the use of digital analytics
Peter O'Neill
User-Centric Analytics (MeasureCamp Talk)
User-Centric Analytics (MeasureCamp Talk)
Taste Medio
Superweek 2015 traffic attribution
Superweek 2015 traffic attribution
Jacob Kildebogaard
A/B Testing Pitfalls - MeasureCamp London 2015
A/B Testing Pitfalls - MeasureCamp London 2015
Michal Parizek
Customer Lifetime Value
Customer Lifetime Value
pavel jašek
31 Ways To Destroy Your Google Analytics Implementation
31 Ways To Destroy Your Google Analytics Implementation
Charles Meaden
Measurecamp - Improving e commerce tracking with universal analytics
Measurecamp - Improving e commerce tracking with universal analytics
Matt Clarke
Getting to the People Behind The Keywords
Getting to the People Behind The Keywords
Carmen Mardiros
Don't you have a Funnel view? - Measure Camp VI
Don't you have a Funnel view? - Measure Camp VI
Xavier Colomes
BrainSINS and AWS meetup Keynote
BrainSINS and AWS meetup Keynote
Andrés Collado
The Future of Customer Engagement
The Future of Customer Engagement
Alterian
Customer lifetime value
Customer lifetime value
yalisassoon
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advice
The Marketing Distillery
The Who, What, Why, How of Lead Scoring
The Who, What, Why, How of Lead Scoring
Marketo
Customer Engagement Model (CEM)
Customer Engagement Model (CEM)
Mani Dev Gyawali
Customer Insight Analysis
Customer Insight Analysis
VC4
Customer engagement
Customer engagement
Tom Tsongas, PMP, CSM
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
Destacado
(20)
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Measurecamp 6 session: Effective Customer Feedback & Measurement Frameworks
Measure camp pres 5 cro myths
Measure camp pres 5 cro myths
Breaking down the barriers to the use of digital analytics
Breaking down the barriers to the use of digital analytics
User-Centric Analytics (MeasureCamp Talk)
User-Centric Analytics (MeasureCamp Talk)
Superweek 2015 traffic attribution
Superweek 2015 traffic attribution
A/B Testing Pitfalls - MeasureCamp London 2015
A/B Testing Pitfalls - MeasureCamp London 2015
Customer Lifetime Value
Customer Lifetime Value
31 Ways To Destroy Your Google Analytics Implementation
31 Ways To Destroy Your Google Analytics Implementation
Measurecamp - Improving e commerce tracking with universal analytics
Measurecamp - Improving e commerce tracking with universal analytics
Getting to the People Behind The Keywords
Getting to the People Behind The Keywords
Don't you have a Funnel view? - Measure Camp VI
Don't you have a Funnel view? - Measure Camp VI
BrainSINS and AWS meetup Keynote
BrainSINS and AWS meetup Keynote
The Future of Customer Engagement
The Future of Customer Engagement
Customer lifetime value
Customer lifetime value
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advice
The Who, What, Why, How of Lead Scoring
The Who, What, Why, How of Lead Scoring
Customer Engagement Model (CEM)
Customer Engagement Model (CEM)
Customer Insight Analysis
Customer Insight Analysis
Customer engagement
Customer engagement
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Similar a Nix for etl using scripting to automate data cleaning & transformation
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
StampedeCon
Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Guido Schmutz
Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014
chrissanders88
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
rajappaiyer
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Toshiyuki Shimono
Lipstick On Pig
Lipstick On Pig
bigdatagurus_meetup
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
Hakka Labs
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
Luigi presentation NYC Data Science
Luigi presentation NYC Data Science
Erik Bernhardsson
Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019
Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019
Codemotion
Go with the Flow-v2
Go with the Flow-v2
Zobair Khan
Go with the Flow
Go with the Flow
Bangladesh Network Operators Group
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Codemotion
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Demi Ben-Ari
Flink Streaming
Flink Streaming
Gyula Fóra
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
Alex Henthorn-Iwane
Similar a Nix for etl using scripting to automate data cleaning & transformation
(20)
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Linux Systems Performance 2016
Linux Systems Performance 2016
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Applied Detection and Analysis Using Flow Data - MIRCon 2014
Applied Detection and Analysis Using Flow Data - MIRCon 2014
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
Lipstick On Pig
Lipstick On Pig
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
Luigi presentation NYC Data Science
Luigi presentation NYC Data Science
Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019
Pablo Musa - Managing your Black Friday Logs - Codemotion Amsterdam 2019
Go with the Flow-v2
Go with the Flow-v2
Go with the Flow
Go with the Flow
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Flink Streaming
Flink Streaming
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
Más de Lynchpin Analytics Consultancy
Attribution Modelling: Real Business Applications that work.
Attribution Modelling: Real Business Applications that work.
Lynchpin Analytics Consultancy
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
Lynchpin Analytics Consultancy
Lynchpin Measurement and Analytics Survey 2015
Lynchpin Measurement and Analytics Survey 2015
Lynchpin Analytics Consultancy
Econsultancy measurement-and-analytics-2015
Econsultancy measurement-and-analytics-2015
Lynchpin Analytics Consultancy
Brighton seo april 2015 lynchpin. segementation clustering presentation. crea...
Brighton seo april 2015 lynchpin. segementation clustering presentation. crea...
Lynchpin Analytics Consultancy
Data Analytics Time to Grow Up
Data Analytics Time to Grow Up
Lynchpin Analytics Consultancy
Más de Lynchpin Analytics Consultancy
(6)
Attribution Modelling: Real Business Applications that work.
Attribution Modelling: Real Business Applications that work.
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
Lynchpin Measurement and Analytics Survey 2015
Lynchpin Measurement and Analytics Survey 2015
Econsultancy measurement-and-analytics-2015
Econsultancy measurement-and-analytics-2015
Brighton seo april 2015 lynchpin. segementation clustering presentation. crea...
Brighton seo april 2015 lynchpin. segementation clustering presentation. crea...
Data Analytics Time to Grow Up
Data Analytics Time to Grow Up
Último
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
olyaivanovalion
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Suhani Kapoor
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Timothy Spann
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
ffjhghh
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
Suhani Kapoor
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
olyaivanovalion
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
atducpo
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Delhi Call girls
Halmar dropshipping via API with DroFx
Halmar dropshipping via API with DroFx
olyaivanovalion
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
ranjana rawat
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Pooja Nehwal
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
olyaivanovalion
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
Suhani Kapoor
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
olyaivanovalion
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Lars Albertsson
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Suhani Kapoor
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
olyaivanovalion
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
fulawalesam
Último
(20)
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Halmar dropshipping via API with DroFx
Halmar dropshipping via API with DroFx
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Nix for etl using scripting to automate data cleaning & transformation
1.
MeasureCamp VI *NIX for
ETL (Using Scripting to Automate Data Cleaning & Transformation) Andrew Hood @lynchpin
2.
Is this a
1970s tribute presentation? UNIX BSD Mac OS X Linux Android 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 2
3.
Why is *NIX
relevant to analytics? • Most input data is crap. And increasingly large. • Requires data processing: – Sequences of transformations – Reproducibility – Flexibility 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 3 Download file(s) Remove dodgy characters Fix date format Load into database
4.
Do I have
it/how do I get *NIX? • You might have it already: – Max OS X, Android, … • Linux – Stick on an old PC • E.g. http://www.ubuntu.com/ – 12 months free on AWS • http://aws.amazon.com/free/ • Windows – MinGW/MSYS ports of all the common command line tools • http://www.mingw.org/ 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 4
5.
3 Key UNIX
Principles 1. Small is Beautiful 2. Everything is a Filter 3. Silence is Golden 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 5
6.
Principle 1: Small
is Beautiful • A UNIX command does one simple thing efficiently – ls: lists files in the current directory – cat: concatenates files together – sort: er, sorts stuff – uniq: de-duplicates stuff – grep: search/filter on steroids – sed: search and replace on steroids – awk: calculator on steroids 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 6
7.
In contrast to… •
One tool that does lots of things badly. 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 7
8.
Learning Curve Disclaimer •
Commands are named to minimise typing (not deliberately maximise obfuscation!) – ls = list – mv = move – cp = copy – pwd = present working directory • man [command] is your friend – the manual pages – E.g. man ls 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 8
9.
Principle 2: Everything
is a Filter • Pretty much every command can take an input and generate an output based on that input. – This is rather good for data processing! • Key punctuation: – | chains output of one command to input of the next – > redirects output to a file (or device) • Example: – cat myfile.txt | sort | uniq > sorted-and-deduplicated.txt 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 9
10.
Terry Pratchett Tribute
Slide • How to remember what pipes and redirection do • Source: alt.fan.pratchett (yes, USENET) • A better LOL: – C|N>K • Coffee (filtered) through nose (redirected) to keyboard 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 10
11.
Key Principle 3:
Silence is Golden • Commands have 3 default streams: – STDIN: text streaming into the command – STDOUT: text streaming out of the command – STDERR: any errors or warnings from running the command • If a command completes successfully you get no messages – No “Done.”, “15 files copied” or “Are you sure?” prompts! 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 11
12.
Example 1: Quick
Line/Field Count • How big are these massive compressed Webtrends Logs for December 2014? % zcat *2014-12*.log.gz | wc -l 1514139 • And how many fields are there in the file? % zcat *.log.gz | head -n5 | awk '{ print NF+1 }' 20 19 19 19 19 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 12
13.
Example 1: Quick
Line/Field Count • Why 20 fields in the first line and 19 in the rest? % zcat *2014-12*.log.gz | head -n1 Fields: date time c-ip cs-username cs-host cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-version cs(User-Agent) cs(Cookie) cs(Referer) dcs-geo dcs-dns origin-id dcs-id • Let’s get rid of that header line and pop into one file: % zcat *2014-12*.log.gz | grep –v '^Fields' > 2014-12.log 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 13
14.
Example 2: Quick
& Dirty Sums • How many unique IP addresses in that logfile? Fields: date time c-ip cs-username cs-host cs-method cs-uri-stem cs-uri-query sc-status sc- bytes cs-version cs(User-Agent) cs(Cookie) cs(Referer) dcs-geo dcs-dns origin-id dcs-id % cat 2014-12.log | cut -d" " -f3 | sort | uniq | wc -l 8695 • What are the top 5 IP addresses by hits? % cat 2014-12.log | cut -d" " -f3 | sort | uniq -c | sort -nr | head -n5 1110 91.214.5.128 840 193.108.78.10 730 193.108.73.47 184 203.112.87.138 173 203.112.87.220 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 14
15.
(A Note on
Regular Expressions) • Best learn to love these – . is any character. – escapes a character that has a meaning i.e. . is a literal full stop – * Matches 0 or more characters – + Matches 1 or more characters – ? Matches 0 or 1 characters – ^ Starts With – $ Ends With – [ ] matches a list or group of charcaters (e.g. [A-Za-z] any letter) – ( ) groups and captures characters 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 15
16.
Example 3: Fixing
The Quotes/Date % cat test.csv Joe Bloggs,56.78,05/12/2014 Bill Bloggs,123.99,12/06/2014 % sed 's/[[:alnum:] /.]*/"&"/g' test.csv "Joe Bloggs","56.78","05/12/2014" "Bill Bloggs","123.99","12/06/2014“ % sed 's/[[:alnum:] /.]*/"&"/g' test.csv | sed 's|(..)/(..)/(....)|3-2-1|' "Joe Bloggs","56.78","2014-12-05" "Bill Bloggs","123.99","2014-06-12" 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 16
17.
Example 4: Clean
& Database Load cat input.csv | grep '^[0-9]' | sed 's/,,/,NULL,/g' | sed 's/[^[:print:]]//g' | psql -c "copy mytable from stdin with delimiter ','" 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 17
18.
Example 5: Basic
Scripting #!/bin/sh DATE=`date -v -1d +%Y-%m-%d` SOURCE="input-$DATE.zip" LOGFILE="$DATE.log" sftp -b /dev/stdin username@host 2>&1 >>$LOGFILE <<EOF cd incoming get $SOURCE EOF if [ ! -f "$SOURCE" ]; then echo "Source file $SOURCE not found" >> $LOGFILE exit 99 fi unzip $SOURCE echo "Source file $SOURCE successfully downloaded and decompressed" >> $LOGFILE 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 18
19.
Resources • UNIX Fundamentals: –
http://freeengineer.org/learnUNIXin10minutes.html – http://www.ee.surrey.ac.uk/Teaching/Unix/ • Cheat Sheet: – http://www.pixelbeat.org/cmdline.html • Basic Analytics Applications – http://www.gregreda.com/2013/07/15/unix-commands-for-data-science/ • Regular Expressions: – http://www.lunametrics.com/regex-book/Regular-Expressions-Google-Analytics.pdf – http://www.regexr.com/ 16 March 2015 © Lynchpin Analytics Limited, All Rights Reserved 19
Descargar ahora