SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Data Science With R
~ for ~
Java Developers
@Sander_Mak
Agenda
Data Science
The R language
Gimme some Java!
1
1
1
1
11
1
1
0
0
0
0
0
0
90% of the world’s data was
produced in the last 2 years
- SINTEF/ScienceDaily June 2013!!!!!!!!
We need more than
just CRUD
Stand back.
I know Data Science!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Data Science:
Achievement Unlocked
R, R-Studio
Today
Data Science:
Achievement Unlocked
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Language
Designers
Statisticians
Language
Designers?
Statisticians?
The best thing about R is that it was developed by statisticians. The
worst thing about R is that... it was developed by statisticians.
- Bo Cowgill, Google
Why R, then?
Open Source
De-facto standard (in statistical research)
“It’s a DSL posing as general purpose language”
Interactive data exploration
Why not R, then?
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Why not R, then?
‘If you are using R and you think
you’re in hell, this is a map for you.’
- The R Inferno
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Apparently, statisticians aren’t designers, either...
VS
Dynamic (eval)
Interpreted
Static types
Compiled
Functional/OO/Procedural OO
Factor Enum
numeric
character String
Integer/Double/...
Factor Enum
numeric
character String
vector
list
dataframe
Integer/Double/...
1-based 0-based1
2
3
4
0
1
2
3
1-based 0-based1
2
3
4
0
1
2
3
for-loops
higher-order functions
sapply(vec, function(elm) {
elm + 1;
})
eager evalutationlazy evaluation
eager evalutationlazy evaluation
pass-by-value
(copy-on-write)
pass-by-reference
Function F
Value A Value A
Value A’
call F(A) modify
Studio
Central
Comprehensive
R
Archive
Network
Studio
Coding time!
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Fare > 100
T FT T F
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Random Forest
Fare > 100
T FT T F
T
FT T FT
FT T F
T
FT T FT
FT T F
Demo time!
...
...
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Bridging R and Java
Integrate
Assimilate
Replace
rJava & Java/R interface
Integrate
Two way native interface
- JNI: libjri
- or TCP to RServe
Rengine re = new Rengine(new String[] {}, false, null);
// wait until engine is ready
if (!re.waitForR()) {
throw new IllegalStateException(“Can’t load R engine”);
}
re.eval("data(cars)", false);
REXP cars = re.eval("cars");
RVector carsVector = cars.asVector();
// dissect carsVector...
Assimilate
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Assimilate
// create a script engine manager
ScriptEngineManager factory =
new ScriptEngineManager();
// create an R engine
ScriptEngine engine =
factory.getEngineByName("Renjin");
// load package from classpath
engine.eval(“library(survey)");
// evaluate R code from String
engine.eval("print('Hello from R')");
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Reimplementation of R on JVM
Share data:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Reimplementation of R on JVM
Share data:
import(com.foo.User)
# instantiate Java beans
tim <- User$new(name='Tim', age=23)
tom <- User$new(name='Tom', age=45)
# invoke setter
tim$name <- "Timmy"
Use Java from Renjin:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Big Data?
Replace
JVM Libraries/platforms
Replace
Scalable R distributions
(non-JVM)
Revolution Analytics
Oracle Enterprise R
Wrap-up
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Sanitize
Explore
Model Predict
Scale
Next steps
Computing for Data Analysis
starts Sept. 23rd
Install R Read
Questions?
Data Science
The R language
Gimme some Java!
1
1
1 1
1
1
110
0
0
0
0
0
@Sander_Mak
branchandbound.net

Más contenido relacionado

Más de Sander Mak (@Sander_Mak)

TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painSander Mak (@Sander_Mak)
 
The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)Sander Mak (@Sander_Mak)
 
Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?Sander Mak (@Sander_Mak)
 
Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)Sander Mak (@Sander_Mak)
 
Java 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the futureJava 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the futureSander Mak (@Sander_Mak)
 

Más de Sander Mak (@Sander_Mak) (20)

Modules or microservices?
Modules or microservices?Modules or microservices?
Modules or microservices?
 
Migrating to Java 9 Modules
Migrating to Java 9 ModulesMigrating to Java 9 Modules
Migrating to Java 9 Modules
 
Java 9 Modularity in Action
Java 9 Modularity in ActionJava 9 Modularity in Action
Java 9 Modularity in Action
 
Java modularity: life after Java 9
Java modularity: life after Java 9Java modularity: life after Java 9
Java modularity: life after Java 9
 
Provisioning the IoT
Provisioning the IoTProvisioning the IoT
Provisioning the IoT
 
Event-sourced architectures with Akka
Event-sourced architectures with AkkaEvent-sourced architectures with Akka
Event-sourced architectures with Akka
 
TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the pain
 
The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)
 
Modular JavaScript
Modular JavaScriptModular JavaScript
Modular JavaScript
 
Modularity in the Cloud
Modularity in the CloudModularity in the Cloud
Modularity in the Cloud
 
Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?
 
Scala & Lift (JEEConf 2012)
Scala & Lift (JEEConf 2012)Scala & Lift (JEEConf 2012)
Scala & Lift (JEEConf 2012)
 
Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)
 
Akka (BeJUG)
Akka (BeJUG)Akka (BeJUG)
Akka (BeJUG)
 
Fork Join (BeJUG 2012)
Fork Join (BeJUG 2012)Fork Join (BeJUG 2012)
Fork Join (BeJUG 2012)
 
Fork/Join for Fun and Profit!
Fork/Join for Fun and Profit!Fork/Join for Fun and Profit!
Fork/Join for Fun and Profit!
 
Kscope11 recap
Kscope11 recapKscope11 recap
Kscope11 recap
 
Java 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the futureJava 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the future
 
Scala and Lift
Scala and LiftScala and Lift
Scala and Lift
 
Elevate your webapps with Scala and Lift
Elevate your webapps with Scala and LiftElevate your webapps with Scala and Lift
Elevate your webapps with Scala and Lift
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Data Science with R for Java developers