SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Data Science in
Ruby? Is it possible?
Is it Fast? Should we
use it?
• Rodrigo Urubatan
• rodrigo@urubatan.dev
• http://urubatan.dev
• http://twitter.com/urubatan
Anyone here work
with Data Science?
• DataScientist?
• DataEngineer?
• Developers of application that uses Data?
• Statisticians?
What exactly
is Data
Science?
The process of extractingmeaning from and interpret
data
The usage of statisticsand machine learning to clean
and manipulate data
The usage of computer software to collect, clean,
manipulate and interpret data
A cool name for the combination of Data Mining and
Business Intelligence (other buzz words thatwere used
for a long time for exactly what we call Data Science
today, but with more expensive tool sets)
Can Ruby do Data
Science?
Can Ruby do
Data Science?
(Long Answer)
• Standing on the shoulders of giants
• pycall — Bridgeinto the Python world.
• rserve-client— Ruby connector
for Rserve, R's binary server.
• Data Manipulation
• kiba — lightweight Ruby ETL (Extract-
Transform-Load) framework.
• jongleur — Workflowmanager using
DAG definitions to execute ETL tasks.
• Distributed Computing
• ruby-spark — Ruby Interface to Apache
Spark 1.x.x.
• Data Structures
• daru — Data Frame and Vector
structures with comprehensive
manipulatingand visualization
methods.
• numo-narray — n-dimensional
Numerical Array for Ruby.
• nmatrix — dense and sparselinear
algebra library for Ruby via SciRuby.
• Datasets
• rdatasets — Data sets available in R
via Rdatasets.
• red-datasets — Growing collectionof publicly
available data sets suchas CIFAR-10,Iris,MNIST
etc
• Statistics
• rb-gsl — Ruby interfacetotheGNU Scientific
Library. [dep: GLS]
• simple_stats — Enumerablepatches for
descriptive statistics.
• enumerable-statistics — fastimplementation of
descriptive statistics for theEnumerablemodule.
• Visualization
• matplotlib — Rubybased wrapper
around matplotlib. [dep: matplotlib]
• mathematical — PNGand MathML renderings for
your equations.
• daru-view — daru-view is interactive plotting
gem for web application (any Ruby web
applicationframework like
Rails/Sinatra/Nanoc/Hanami) &IRubynotebook.
It is a plugin gemfor daru.
• daru-plotly — Plotly basedvisualization for Daru.
Ruby
X
Python
Ruby
Daru
NMatrix/NArray
Python
Pandas
Numpy
The 3 Major
Ruby Data
Science
Projects
SciRuby project
Nmatrix Centric gems
Nmatrix
Daru
GnuplotRB
Stas_sample
Ruby Numo project
Numo:: NArray centric Gems
Numo:: NArray
Numo:: FFTE
Numo:: FFTW
Numo::Gnuplot
RedDataToolsproject
Apache Arrow centric gems
RedArrow
RedChainer
RedArrowGSL
RedArrowNMatrix
RedArrowNumoNArray
Other libraries
• ruby-spark — RubyInterface to Apache Spark 1.x.x.
• The project is almost dead,not commits in ages
• kiba — lightweight RubyETL (Extract-Transform-Load)framework.
• Great frameworkto load and transformdata,great performance
• enumerable-statistics — fast implementation ofdescriptivestatistics for the
Enumerable module.
• Very handyforsmall statisticalcalculations in yourapplication
• iruby — Rubykernel for Jupyter.
• The easiest wayto use Ruby in your Jupyter Notebook
• decisiontree - Decision Tree ID3 Algorithmin pure Ruby
• Easydecision tree implementation,and a very fast to train
Doing data science in
Ruby is Hard!
Ruby and Ruby on Rails are
way better to write business
web applications!
We can even do
really good Machine
Learning with Ruby
(but that is subject
for another
presentation)
And my objective is to
help ruby developers to use
the best tools for each job so
they can solve hard
problems, with less bugs and
have more free time.
pycall to the
rescue
pycall lets you use Python libraries from
your ruby code very naturally, as if you
were calling a Ruby library
pycall consists of one ruby binding
library for libpython.so and an Object-
oriented protocol for communication
between Ruby and Python
Simple pycall
code
What about some
light machine
learning? Do I
need python too?
Ok, so what
are the best
work
patterns?
Python is way better than Ruby for
Data Science
Ruby is better for web business
applications
Best patterns for integration are
(IMHO)
• Pointing both applications to the same
database
• Exchanging data through JSON or some similar
serialization
• Calling Python directly through pycall
References
• Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn)
• Big Data analysis in Ruby
• Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst)
• Progress of Ruby/Numo: Numerical Computing for Ruby
• SciRuby
• Ruby::Numo
• Ruby Machine Learning resources
• Ruby Data Science Resources
• PyCall
Any questions? Talk to
me!
• @urubatan
• https://urubatan.dev
• rodrigo@urubatan.dev

Más contenido relacionado

La actualidad más candente

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleMichael Goodness
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platformLINE Corporation
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at ScaleNikhil Ketkar
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in PythonJagriti Goswami
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summitAdam Gibson
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkMárton Balassi
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJason Plurad
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphJason Plurad
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013show you
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 

La actualidad más candente (20)

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at Scale
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at Scale
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 

Similar a Data science in ruby, is it possible? is it fast? should we use it?

Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015Wes McKinney
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyRakuten Group, Inc.
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"Portland R User Group
 

Similar a Data science in ruby, is it possible? is it fast? should we use it? (20)

Session 2
Session 2Session 2
Session 2
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Python ml
Python mlPython ml
Python ml
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 

Más de Rodrigo Urubatan

2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...Rodrigo Urubatan
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d frameworkRodrigo Urubatan
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuídaRodrigo Urubatan
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRodrigo Urubatan
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddRodrigo Urubatan
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotoRodrigo Urubatan
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problemsRodrigo Urubatan
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JSRodrigo Urubatan
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Rodrigo Urubatan
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JSRodrigo Urubatan
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextRodrigo Urubatan
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores javaRodrigo Urubatan
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPRodrigo Urubatan
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu clienteRodrigo Urubatan
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Rodrigo Urubatan
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time agoRodrigo Urubatan
 

Más de Rodrigo Urubatan (20)

Ruby code smells
Ruby code smellsRuby code smells
Ruby code smells
 
2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d framework
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuída
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDD
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bdd
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remoto
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problems
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores java
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HP
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
 
Mini curso rails 3
Mini curso rails 3Mini curso rails 3
Mini curso rails 3
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time ago
 

Último

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Data science in ruby, is it possible? is it fast? should we use it?

  • 1. Data Science in Ruby? Is it possible? Is it Fast? Should we use it? • Rodrigo Urubatan • rodrigo@urubatan.dev • http://urubatan.dev • http://twitter.com/urubatan
  • 2. Anyone here work with Data Science? • DataScientist? • DataEngineer? • Developers of application that uses Data? • Statisticians?
  • 3. What exactly is Data Science? The process of extractingmeaning from and interpret data The usage of statisticsand machine learning to clean and manipulate data The usage of computer software to collect, clean, manipulate and interpret data A cool name for the combination of Data Mining and Business Intelligence (other buzz words thatwere used for a long time for exactly what we call Data Science today, but with more expensive tool sets)
  • 4. Can Ruby do Data Science?
  • 5. Can Ruby do Data Science? (Long Answer) • Standing on the shoulders of giants • pycall — Bridgeinto the Python world. • rserve-client— Ruby connector for Rserve, R's binary server. • Data Manipulation • kiba — lightweight Ruby ETL (Extract- Transform-Load) framework. • jongleur — Workflowmanager using DAG definitions to execute ETL tasks. • Distributed Computing • ruby-spark — Ruby Interface to Apache Spark 1.x.x. • Data Structures • daru — Data Frame and Vector structures with comprehensive manipulatingand visualization methods. • numo-narray — n-dimensional Numerical Array for Ruby. • nmatrix — dense and sparselinear algebra library for Ruby via SciRuby. • Datasets • rdatasets — Data sets available in R via Rdatasets. • red-datasets — Growing collectionof publicly available data sets suchas CIFAR-10,Iris,MNIST etc • Statistics • rb-gsl — Ruby interfacetotheGNU Scientific Library. [dep: GLS] • simple_stats — Enumerablepatches for descriptive statistics. • enumerable-statistics — fastimplementation of descriptive statistics for theEnumerablemodule. • Visualization • matplotlib — Rubybased wrapper around matplotlib. [dep: matplotlib] • mathematical — PNGand MathML renderings for your equations. • daru-view — daru-view is interactive plotting gem for web application (any Ruby web applicationframework like Rails/Sinatra/Nanoc/Hanami) &IRubynotebook. It is a plugin gemfor daru. • daru-plotly — Plotly basedvisualization for Daru.
  • 7. The 3 Major Ruby Data Science Projects SciRuby project Nmatrix Centric gems Nmatrix Daru GnuplotRB Stas_sample Ruby Numo project Numo:: NArray centric Gems Numo:: NArray Numo:: FFTE Numo:: FFTW Numo::Gnuplot RedDataToolsproject Apache Arrow centric gems RedArrow RedChainer RedArrowGSL RedArrowNMatrix RedArrowNumoNArray
  • 8. Other libraries • ruby-spark — RubyInterface to Apache Spark 1.x.x. • The project is almost dead,not commits in ages • kiba — lightweight RubyETL (Extract-Transform-Load)framework. • Great frameworkto load and transformdata,great performance • enumerable-statistics — fast implementation ofdescriptivestatistics for the Enumerable module. • Very handyforsmall statisticalcalculations in yourapplication • iruby — Rubykernel for Jupyter. • The easiest wayto use Ruby in your Jupyter Notebook • decisiontree - Decision Tree ID3 Algorithmin pure Ruby • Easydecision tree implementation,and a very fast to train
  • 9. Doing data science in Ruby is Hard!
  • 10. Ruby and Ruby on Rails are way better to write business web applications!
  • 11. We can even do really good Machine Learning with Ruby (but that is subject for another presentation)
  • 12. And my objective is to help ruby developers to use the best tools for each job so they can solve hard problems, with less bugs and have more free time.
  • 13. pycall to the rescue pycall lets you use Python libraries from your ruby code very naturally, as if you were calling a Ruby library pycall consists of one ruby binding library for libpython.so and an Object- oriented protocol for communication between Ruby and Python
  • 15. What about some light machine learning? Do I need python too?
  • 16. Ok, so what are the best work patterns? Python is way better than Ruby for Data Science Ruby is better for web business applications Best patterns for integration are (IMHO) • Pointing both applications to the same database • Exchanging data through JSON or some similar serialization • Calling Python directly through pycall
  • 17. References • Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn) • Big Data analysis in Ruby • Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst) • Progress of Ruby/Numo: Numerical Computing for Ruby • SciRuby • Ruby::Numo • Ruby Machine Learning resources • Ruby Data Science Resources • PyCall
  • 18. Any questions? Talk to me! • @urubatan • https://urubatan.dev • rodrigo@urubatan.dev