SlideShare a Scribd company logo
1 of 9
Download to read offline
Command-line Data Tools
Peter Wang
@pwang
Why?
• Some times you just want to sling data
• Text is still king; Lowest common denominator
• Machines are pretty honking big now
This Presentation
• List of some good collections of cmd-line tools
• Call out and describe a few in particular
• The PyDataTool of my desire
Sources
• From author of “Data Science at the Command
Line”: http://jeroenjanssens.com/2013/09/19/seven-
command-line-tools-for-data-science.html (larger
list at http://datascienceatthecommandline.com/)
• HN discussion: https://news.ycombinator.com/
item?id=6412190
• https://github.com/bitly/data_hacks
Tools
• JSON:
• jq: https://stedolan.github.io/jq/
• RecordStream: https://github.com/benbernard/
RecordStream
• csvkit: https://csvkit.readthedocs.io/en/1.0.2/
• dt: https://github.com/clarkgrubb/data-tools
• XMLStarlet: http://xmlstar.sourceforge.net/overview.php
Honorable Mentions
• Pythonic awk: https://github.com/alecthomas/
pawk
• Google Crush Tools: https://github.com/google/
crush-tools
• Xonsh: http://xon.sh/tutorial.html
The PyDataTool of My Desire
• Support for csv, json, sql, xls, hdf5; image formats; network
formats (pcap etc.)
• Capability of:
• csvkit, jq, dt, “cols” tool
• unix tools: sed, sort, shuf, split, tr, tee, uniq, wc, head,
tail, bc
• netpbm, imagemagick for images
• Work in streaming mode (netcat, wget, curl)
• First-class support for dask, spark
• Basic plotting via gnuplot, mpl, bokeh
• Built-in SQLite to do in-memory support for queries
Continuum Is Hiring!
• Creators of Anaconda, conda, bokeh, blaze, dask,
holoviews, numba, phosphorJS
• Maintainers/contributors to Jupyter, JupyterLab,
Spyder, pandas, conda-forge, …
• 150+ ppl, 80 in Austin
• Venture backed
• Enterprise product, OSS community innovation,
consulting, training
Continuum Is Hiring
• Enterprise Product Team:
• Dev Manager (reports to CTO, runs product engineering)
• QA Lead Engineer - creates test plans, coordinates with
product mgmt, dev, and testing team
• Senior Python Developer - enterprise product development;
backend, web tech; full stack preferred
• DevOps and Operations - enterprise product, anaconda.org,
Anaconda build system
• Email careers@continuum.io

More Related Content

Similar to Command line Data Tools

Best practices-wordpress-enterprise
Best practices-wordpress-enterpriseBest practices-wordpress-enterprise
Best practices-wordpress-enterprise
Taylor Lovett
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 

Similar to Command line Data Tools (20)

Client Side Performance for Back End Developers - Camb Expert Talks, Nov 2016
Client Side Performance for Back End Developers - Camb Expert Talks, Nov 2016Client Side Performance for Back End Developers - Camb Expert Talks, Nov 2016
Client Side Performance for Back End Developers - Camb Expert Talks, Nov 2016
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
 
Best practices-wordpress-enterprise
Best practices-wordpress-enterpriseBest practices-wordpress-enterprise
Best practices-wordpress-enterprise
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Old code doesn't stink - Detroit
Old code doesn't stink - DetroitOld code doesn't stink - Detroit
Old code doesn't stink - Detroit
 
Best Practices for WordPress in Enterprise
Best Practices for WordPress in EnterpriseBest Practices for WordPress in Enterprise
Best Practices for WordPress in Enterprise
 
Best Practices for WordPress
Best Practices for WordPressBest Practices for WordPress
Best Practices for WordPress
 
Web Scrapping Using Python
Web Scrapping Using PythonWeb Scrapping Using Python
Web Scrapping Using Python
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)
 
Week 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and ProducingWeek 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and Producing
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Hitchhiker's Guide to Web Standards
Hitchhiker's Guide to Web StandardsHitchhiker's Guide to Web Standards
Hitchhiker's Guide to Web Standards
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 

More from Peter Wang

More from Peter Wang (10)

Rethinking Decentralization / Whither Privacy?
Rethinking Decentralization / Whither Privacy?Rethinking Decentralization / Whither Privacy?
Rethinking Decentralization / Whither Privacy?
 
Rethinking OSS In An Era of Cloud and ML
Rethinking OSS In An Era of Cloud and MLRethinking OSS In An Era of Cloud and ML
Rethinking OSS In An Era of Cloud and ML
 
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
 
Stories, Myth, and the Humane Network
Stories, Myth, and the Humane NetworkStories, Myth, and the Humane Network
Stories, Myth, and the Humane Network
 
Thoughts on Business & Startups
Thoughts on Business & StartupsThoughts on Business & Startups
Thoughts on Business & Startups
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Bokeh Tutorial - PyData @ Strata San Jose 2015
Bokeh Tutorial - PyData @ Strata San Jose 2015Bokeh Tutorial - PyData @ Strata San Jose 2015
Bokeh Tutorial - PyData @ Strata San Jose 2015
 
Interactive Visualization With Bokeh (SF Python Meetup)
Interactive Visualization With Bokeh (SF Python Meetup)Interactive Visualization With Bokeh (SF Python Meetup)
Interactive Visualization With Bokeh (SF Python Meetup)
 
PyData: Past, Present Future (PyData SV 2014 Keynote)
PyData: Past, Present Future (PyData SV 2014 Keynote)PyData: Past, Present Future (PyData SV 2014 Keynote)
PyData: Past, Present Future (PyData SV 2014 Keynote)
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 

Recently uploaded

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Recently uploaded (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Command line Data Tools

  • 2. Why? • Some times you just want to sling data • Text is still king; Lowest common denominator • Machines are pretty honking big now
  • 3. This Presentation • List of some good collections of cmd-line tools • Call out and describe a few in particular • The PyDataTool of my desire
  • 4. Sources • From author of “Data Science at the Command Line”: http://jeroenjanssens.com/2013/09/19/seven- command-line-tools-for-data-science.html (larger list at http://datascienceatthecommandline.com/) • HN discussion: https://news.ycombinator.com/ item?id=6412190 • https://github.com/bitly/data_hacks
  • 5. Tools • JSON: • jq: https://stedolan.github.io/jq/ • RecordStream: https://github.com/benbernard/ RecordStream • csvkit: https://csvkit.readthedocs.io/en/1.0.2/ • dt: https://github.com/clarkgrubb/data-tools • XMLStarlet: http://xmlstar.sourceforge.net/overview.php
  • 6. Honorable Mentions • Pythonic awk: https://github.com/alecthomas/ pawk • Google Crush Tools: https://github.com/google/ crush-tools • Xonsh: http://xon.sh/tutorial.html
  • 7. The PyDataTool of My Desire • Support for csv, json, sql, xls, hdf5; image formats; network formats (pcap etc.) • Capability of: • csvkit, jq, dt, “cols” tool • unix tools: sed, sort, shuf, split, tr, tee, uniq, wc, head, tail, bc • netpbm, imagemagick for images • Work in streaming mode (netcat, wget, curl) • First-class support for dask, spark • Basic plotting via gnuplot, mpl, bokeh • Built-in SQLite to do in-memory support for queries
  • 8. Continuum Is Hiring! • Creators of Anaconda, conda, bokeh, blaze, dask, holoviews, numba, phosphorJS • Maintainers/contributors to Jupyter, JupyterLab, Spyder, pandas, conda-forge, … • 150+ ppl, 80 in Austin • Venture backed • Enterprise product, OSS community innovation, consulting, training
  • 9. Continuum Is Hiring • Enterprise Product Team: • Dev Manager (reports to CTO, runs product engineering) • QA Lead Engineer - creates test plans, coordinates with product mgmt, dev, and testing team • Senior Python Developer - enterprise product development; backend, web tech; full stack preferred • DevOps and Operations - enterprise product, anaconda.org, Anaconda build system • Email careers@continuum.io