SlideShare una empresa de Scribd logo
1 de 70
Scientific Computing on JRuby
github.com/prasunanand
Objective
● A Scientific library is memory intensive and speed counts. How to use JRuby
effectively to create a great tool/gem?
● A General Purpose GPU library for Ruby that can be used by industry in
production and academia for research.
● Ruby Science Foundation
● SciRuby has been trying to push Ruby for scientific computing.
● Popular Rubygems:
1. NMatrix
2. Daru
3. Mixed_models
NMatrix
● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as
well as two types of sparse (linked-list-based and Yale/CSR).
● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for
several of its linear algebra operations.
Daru
Mixed_models
Nyaplot
SciRuby vs SciPy
● We love Ruby.
● We love Rails.
● Expressiveness of Ruby.
● Known for performance JRuby is 10 times faster than CRuby.
● With truffle it’s around 40 times faster than CRuby. Truffle is supported by
Oracle.
Say Hello!
NMatrix for JRuby
● Parallelism=> No Global Interpreter Lock as in case of MRI
● Easy Deployment(Warbler gem)
● Auto Garbage collection.
● Speed
● NMatrix for JRuby relies on Apache Commons Math
MDArray
● Not a unified interface for Sciruby gems=> Why not build a wrapper around
MDArray ?
● MDArray is a great gem for Linear Algebra.
● MdArray used Parallel colt that was depreceated.
● However, every gem that used NMatrix as dependency needed to be
reimplemented with MDArray.
How NMatrix works?
● N-Dimensional
● 2-Dimensional NMatrix
N-dimensional matrices are stored as a one-dimensional Array!
NMatrix Architecture
MRI JRuby
N - dimensional Matrix
Elementwise Operation
● [:add, :subtract, :sin, :gamma]
● Iterate through the elements.
● Access the element; do the operation, return it
Challenges
● Autoboxing and Multiple data type
● Minimise copying of data
Errors that can’t be reproduced :p
[ 0.11, 0.05, 0.34, 0.14 ]
+ [ 0. 21, 0.05, 0.14, 0.14 ]
= [ 0, 0, 0, 0]
([ 0. 11, 0.05, 0.34, 0.14 ] + 5)
+ ([ 0. 21, 0.05, 0.14, 0.14 ] + 5)
- 10
= [ 0.32, 0.1, 0.48, 0.28]
Autoboxing
● :float64 => double only
● Strict dtypes => creating data type in Java. Can’t Rely on Reflection
● @s = Array.new()
● @s = Java::double[rows*cols].new()
Autoboxing and Enumerators
def each_with_indices
nmatrix = create_dummy_nmatrix
stride = get_stride(self)
offset = 0
coords = Array.new(dim){ 0 }
shape_copy = Array.new(dim)
(0...size).each do |k|
dense_storage_coords(nmatrix, k, coords,
stride, offset)
slice_index =
dense_storage_pos(coords,stride)
ary = Array.new
if (@dtype == :object)
ary << self.s[slice_index]
else
ary << self.s.toArray.to_a[slice_index]
end
(0...dim).each do |p|
ary << coords[p]
end
yield(ary)
end if block_given?
return nmatrix
end
Minimise copying of data
● Make sure you don’t make copies of data.
● Pass-by-Reference in action:
○ Use static methods as helpers.
2 - dimensional Matrix
2 - dimensional Matrix Operations
● [:dot, :det, :factorize_lu]
● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their
respective libraries.
● NMatrix-JRuby depends on Java functions.
Challenges
● Converting a 1-D array to 2-D array
● Array Size and Accessing elements
● Speed and Memory Required
Ruby Code
index =0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
c[i][j] = b[i][j]
index+=1
end
end
}
#67.790000 0.070000 67.860000 ( 65.126546)
#RAM consumed => 5.4GB
b = Java::double[15_000,15_000].new
c = Java::double[15_000,15_000].new
index=0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
b[i][j] = index
index+=1
end
end
}
#43.260000 3.250000 46.510000 ( 39.606356)
Java Code
public class MatrixGenerator{
public static void test2(){
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
c[i][j]= b[i][j];
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test2}
#0.034000 0.001000 00.034000 ( 00.03300)
#RAM consumed => 300MB
public class MatrixGenerator{
public static void test1(){
double[][] b = new double[15000][15000];
double[][] c = new double[15000][15000];
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
b[i][j]= index;
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test1}
#0.032000 0.001000 00.032000 ( 00.03100)
Results
Improves:
● 1000 times the speed
● 10times the memory
Mixed models
● After NMAtrix for doubles was ready, I tested it with mixed_models.
Benchmarking NMatrix functionalities
System Specifications
● CPU: AMD FX8350 0ctacore 4.2GHz
● RAM: 16GB
Addition
Subtraction
Gamma
Matrix Multiplication
Determinant
Factorization
Benchmark conclusion
● NMatrix-JRuby is incredibly faster for N-dimensional matrices when
elementwise operations are concerned.
● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix
multiplication, determinant calculation and factorization.
Improvements
● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and
LAPACK routines.
● How?
● Why not JBlas?
MRI
JRuby
Future Work
● Add support for complex dtype.
● Convert NMatrix-JRuby Enumerators to Java code.
● Add sparse support.
Am I done?
Nope!
Enter GPU
A General-Purpose GPU library
● Combine the beauty of Ruby with transparent GPU processing
● This will work both on client computers and on servers that make use of
TESLA's and Intel Xeon Phi solutions.
● Developer activity and support for the current projects is mixed at best, and
they are tough to use as they involve writing kernels and require a lot of effort
to be put in buffer/RAM optimisation.
ArrayFire-rb
● Wraps ArrayFire library
ArrayFire
● ArrayFire is an open-source GPGPU library written in C++ and uses JIT.
● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C-
programming backend.
● It abstracts away from the difficult task of writing kernels for multiple
architectures; handling memory management, and performing tuning and
optimisation.
Using ArrayFire
MRI
● C extension
● Architecture is inspired by NMatrix and NArray
● The C++ function is placed in a namespace (e.g., namespace af { }) or is
declared static if possible. The C function receives the prefix af_, e.g.,
arf_multiply() (this function also happens to be static).
● C macros are capitalized and generally have the prefix ARF_, as with
ARF_DTYPE().
● C functions (and macros, for consistency) are placed within extern "C" { }
blocks to turn off C++ mangling.
● C macros (in extern blocks) may represent C++ constants (which are always
#include <ruby.h>
typedef struct AF_STRUCT
{
size_t ndims;
size_t count;
size_t* dimension;
double* array;
}afstruct;
void Init_arrayfire() {
ArrayFire = rb_define_module("ArrayFire");
Blas = rb_define_class_under(ArrayFire, "BLAS",
rb_cObject);
rb_define_singleton_method(Blas, "matmul",
(METHOD)arf_matmul, 2);
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE
right_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
result->ndims = left->ndims;
size_t dimension[2];
dimension[0] = left->dimension[0];
dimension[1] = right->dimension[1];
size_t count = dimension[0]*dimension[1];
result->dimension = dimension;
result->count = count;
arf::matmul(result, left, right);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL,
arf_free, result);
}
#include <arrayfire.h>
namespace arf {
using namespace af;
static void matmul(afstruct *result, afstruct *left, afstruct *right)
{
array l = array(left->dimension[0], left->dimension[1], left->array);
array r = array(right->dimension[0], right->dimension[1], right->array);
array res = matmul(l,r);
result->array = res.host<double>();
}
}
extern "C" {
#include "arrayfire.c"
}
JRuby
● The approach is same as NMatrix JRuby.
● Java Native Interface( JNI )
● Work on ArrayFire-Java.
● Place 'libaf.so' in the Load path.
require 'ext/vendor/ArrayFire.jar'
class Af_Array
attr_accessor :dims, :elements
def matmul(other)
Blas.matmul(self.arr, other)
end
end
Benchmarking ArrayFire
System Specification
CPU: AMD FX Octacore 4.2GHz
RAM: 16GB
GPU: Nvidia GTX 750Ti
GPU RAM : 4GB DDR5
Matrix Addition
Matrix Multiplication
Matrix Determinant
Factorization
Transparency
● Integrate with Narray
● Integrate with NMatrix
● Integrate with Rails
Applications
● Endless possibilities ;)
● Bioinformatics
● Integrate Tensorflow
● Image Processing
● Computational Fluid Dynamics
Conclusion
Useful Links
● https://github.com/sciruby/nmatrix
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/arrayfire-rb/tree/temp
Acknowlegements
1. Pjotr Prins
2. Charles Nutter
3. John Woods
4. Alexej Gossmann
5. Sameer Deshmukh
6. Pradeep Garigipati
Thank You
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com

Más contenido relacionado

La actualidad más candente

MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelJavaDayUA
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David SzakallasDatabricks
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...Edureka!
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0Makoto Yui
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahoutsscdotopen
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探台灣資料科學年會
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big DataLeonardo Gamas
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0Makoto Yui
 

La actualidad más candente (20)

MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Caching in
Caching inCaching in
Caching in
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
 
Map db
Map dbMap db
Map db
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0
 

Similar a Fosdem2017 Scientific computing on Jruby

Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyPrasun Anand
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017Prasun Anand
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with ScalaJorge Paez
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 

Similar a Fosdem2017 Scientific computing on Jruby (20)

Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 
Java 8
Java 8Java 8
Java 8
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with Scala
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Lrz kurse: r as superglue
Lrz kurse: r as superglueLrz kurse: r as superglue
Lrz kurse: r as superglue
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Xgboost
XgboostXgboost
Xgboost
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Fosdem2017 Scientific computing on Jruby

  • 1. Scientific Computing on JRuby github.com/prasunanand
  • 2. Objective ● A Scientific library is memory intensive and speed counts. How to use JRuby effectively to create a great tool/gem? ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
  • 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
  • 4. NMatrix ● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). ● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
  • 5.
  • 9. SciRuby vs SciPy ● We love Ruby. ● We love Rails. ● Expressiveness of Ruby.
  • 10. ● Known for performance JRuby is 10 times faster than CRuby. ● With truffle it’s around 40 times faster than CRuby. Truffle is supported by Oracle.
  • 12. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem) ● Auto Garbage collection. ● Speed ● NMatrix for JRuby relies on Apache Commons Math
  • 13. MDArray ● Not a unified interface for Sciruby gems=> Why not build a wrapper around MDArray ? ● MDArray is a great gem for Linear Algebra. ● MdArray used Parallel colt that was depreceated. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray.
  • 14. How NMatrix works? ● N-Dimensional ● 2-Dimensional NMatrix
  • 15. N-dimensional matrices are stored as a one-dimensional Array!
  • 17. N - dimensional Matrix
  • 18. Elementwise Operation ● [:add, :subtract, :sin, :gamma] ● Iterate through the elements. ● Access the element; do the operation, return it
  • 19.
  • 20. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data
  • 21. Errors that can’t be reproduced :p [ 0.11, 0.05, 0.34, 0.14 ] + [ 0. 21, 0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
  • 22. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java. Can’t Rely on Reflection ● @s = Array.new() ● @s = Java::double[rows*cols].new()
  • 23. Autoboxing and Enumerators def each_with_indices nmatrix = create_dummy_nmatrix stride = get_stride(self) offset = 0 coords = Array.new(dim){ 0 } shape_copy = Array.new(dim) (0...size).each do |k| dense_storage_coords(nmatrix, k, coords, stride, offset) slice_index = dense_storage_pos(coords,stride) ary = Array.new if (@dtype == :object) ary << self.s[slice_index] else ary << self.s.toArray.to_a[slice_index] end (0...dim).each do |p| ary << coords[p] end yield(ary) end if block_given? return nmatrix end
  • 24. Minimise copying of data ● Make sure you don’t make copies of data. ● Pass-by-Reference in action: ○ Use static methods as helpers.
  • 25. 2 - dimensional Matrix
  • 26. 2 - dimensional Matrix Operations ● [:dot, :det, :factorize_lu] ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries. ● NMatrix-JRuby depends on Java functions.
  • 27. Challenges ● Converting a 1-D array to 2-D array ● Array Size and Accessing elements ● Speed and Memory Required
  • 28.
  • 29. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
  • 30.
  • 31. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
  • 32. Results Improves: ● 1000 times the speed ● 10times the memory
  • 33. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
  • 35. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
  • 38. Gamma
  • 42. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
  • 43. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
  • 45. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
  • 47. Nope!
  • 49. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
  • 51. ArrayFire ● ArrayFire is an open-source GPGPU library written in C++ and uses JIT. ● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C- programming backend. ● It abstracts away from the difficult task of writing kernels for multiple architectures; handling memory management, and performing tuning and optimisation.
  • 53. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., arf_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix ARF_, as with ARF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling. ● C macros (in extern blocks) may represent C++ constants (which are always
  • 54. #include <ruby.h> typedef struct AF_STRUCT { size_t ndims; size_t count; size_t* dimension; double* array; }afstruct; void Init_arrayfire() { ArrayFire = rb_define_module("ArrayFire"); Blas = rb_define_class_under(ArrayFire, "BLAS", rb_cObject); rb_define_singleton_method(Blas, "matmul", (METHOD)arf_matmul, 2); } static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); result->ndims = left->ndims; size_t dimension[2]; dimension[0] = left->dimension[0]; dimension[1] = right->dimension[1]; size_t count = dimension[0]*dimension[1]; result->dimension = dimension; result->count = count; arf::matmul(result, left, right); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  • 55. #include <arrayfire.h> namespace arf { using namespace af; static void matmul(afstruct *result, afstruct *left, afstruct *right) { array l = array(left->dimension[0], left->dimension[1], left->array); array r = array(right->dimension[0], right->dimension[1], right->array); array res = matmul(l,r); result->array = res.host<double>(); } } extern "C" { #include "arrayfire.c" }
  • 56. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java.
  • 57. ● Place 'libaf.so' in the Load path. require 'ext/vendor/ArrayFire.jar' class Af_Array attr_accessor :dims, :elements def matmul(other) Blas.matmul(self.arr, other) end end
  • 59. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
  • 64. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
  • 65. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
  • 67. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
  • 68. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
  • 69.
  • 70. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com