SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Streaming Data And
 Concurrency In R



      Rory Winston

    rory@theresearchkitchen.com
About Me




   Independent Software Consultant
   M.Sc. Applied Computing, 2000
   M.Sc. Finance, 2008
   Apache Committer
   Interested in practical applications of functional languages and
   machine learning
   Really interested in seeing R usage grow in finance
1   A Short Rant


2   Why We Need Concurrency


3   Motivating Example


4   Conclusion


5   References and Further Reading
A Short Rant



   Parallelization vs. Concurrency in R


           Multithreading vs. parallelization
           i.e. fork() vs. pthread_create()
           R interpreter is single threaded
           Some historical context for this (e.g. non-threadsafe BLAS
           implementations)
           Multithreading can be complex and problematic
           Instead a focus on parallelization:
                Distributed computation: gridR, nws, snow
                Multicore/multi-cpu scaling: Rmpi, Romp, pnmath
                Interfaces to PBLAS/Hadoop/OpenMP/MPI/Globus/etc.
           Parallelization suits large CPU-bound processing applications
           So do we really need it at all then?
Why We Need Concurrency



   Multithreading Is A Valuable Tool


          I say, "yes"
          For general real-time (streaming to be more precise) data
          analysis
          (Growing interest in using R for streaming data, not just
          offline analyis)
          GUI toolkit integration
          Fine-grained control over independent task execution
          Fine-grained control over CPU-bound and I/O-bound task
          management
          "I believe that explicit concurrency management tools (i.e. a
          threads toolkit) are what we really need in R at this point." -
          Luke Tierney, 2001
Why We Need Concurrency



   Will There Be A Multithreaded R?



          Short answer is: Most likely not
          At least not in its current incarnation
          Internal workings of the interpreter not particularly amenable
          to concurrency:
                 Functions can manipulate caller state («- vs. <-)
                 Lazy evaluation machinery (promises)
                 Dynamic State, garbage collection, etc.
                 Scoping: global environments
                 Management of resources: streams, I/O, connections, sinks
          Implications for current code
          Possibly in the next language evolution (cf. Ihaka?)
Motivating Example



   Motivating Example




           Based on work I did last year and presented at UseR! 2008
           Wrote a real-time and historical market data service from
           Reuters/R
           The real-time interface used the Reuters C++ API
           R extension that spawned listening thread and handled market
           updates
           New version also does publishing as well as subscribing
Motivating Example



   Motivating Example




           The (real-world) example involves building a new
           high-frequency trading system
           Step 1 is handling market prices (in this case interbank
           currency prices)
           Need to ensure that the new system’s prices are:
                     Correct;
                     Fast
Motivating Example




                        R Analytics

                      C++ RMDS API




                     RMDS Message Bus
Motivating Example



   Issues With This Approach




           As R interpreter is single threaded, cannot spawn thread for
           callbacks
           Thus, interpreter thread is locked for the duration of
           subscription
           Not a great user experience
           Need to find alternative mechanism
Motivating Example



   Alternative Approach



           If we cannot run subscriber threads in-process, need to
           decouple
           Standard approach: add an extra layer and use some form of
           IPC
           For instance, we could:
                     Subscribe in a dedicated R process (A)
                     Push incoming data onto a socket
                     R process (B) reads from a listening socket
           Sockets could also be another IPC primitive, e.g. pipes, shared
           mem
           We will use the bigmemoRy package to leverage the latter
Motivating Example



   The bigmemoRy package




           From the description: "Use C++ to create, store,
           access, and manipulate massive matrices"
           Allows creation of large (≥ RAM) matrices
           These matrices can be mapped to files/shared memory
           It is the shared memory functionality that we will use

    big.matrix(nrow, ncol, type = "integer", ....)
    shared.big.matrix(nrow, ncol, type = "integer", ...)
    filebacked.big.matrix(nrow, ncol, type = "integer", ...)
    read.big.matrix(file, sep=, ...)
Motivating Example



   Sample Usage




    > library(bigmemory)
    > X <- shared.big.matrix(type="double", ncol=1000, nrow=1000)
    > X
    An object of class “big.matrix”
    Slot "address":
    <pointer: 0x7378a0>
Motivating Example



   Create Shared Memory Descriptor

    > desc <- describe(X)
    > desc
    $sharedType
    [1] "SharedMemory"

    $sharedName
    [1] "53f14925-dca1-42a8-a547-e1bccae999ce"

    $nrow
    [1] 1000

    $ncol
    [1] 1000

    $rowNames
    NULL

    $colNames
    NULL

    $type
    [1] "double"
Motivating Example



   Export the Descriptor




    In R session 1:

    > dput(desc, file="/tmp/matrix.desc")

    In R session 2:

    > library(bigmemory)
    > desc <- dget("/tmp/matrix.desc")
    > X <- attach.big.matrix(desc)

    Now R sessions A and B share the same big.matrix instance
Motivating Example



   Share Data Between Sessions




    R session 1:

    > X[1,1] <- 1.2345

    R session 2:

    > X[1,1]
    [1] 1.2345

    Thus, streaming data can be continuously fed into session A
    And concurrently processed in session B
Motivating Example




                     RMDS Message Bus




                      C++ RMDS API

                      R / bigmemoRy




                      R / bigmemoRy

                      C++ RMDS API




                     RMDS Message Bus
Conclusion



    Summary




             Lack of threads not necessarily a barrier to concurrent analysis
             Packages like bigmemoRy, nws, etc. facilitate decoupling via
             IPC
             Could potentially take this further (using e.g. nws)
References and Further Reading



    References




            bigmemoRy:
            http://cran.r-project.org/web/packages/bigmemory/
            Luke Tierney’s original threading paper:
            http://www.cs.uiowa.edu/~luke/R/thrgui/
            HPC and R Survey:
            http://epub.ub.uni-muenchen.de/8991/
            Inside The Python GIL:
            www.dabeaz.com/python/GIL.pdf

Más contenido relacionado

La actualidad más candente

Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
Ajay Ohri
 

La actualidad más candente (20)

Giraph++: From "Think Like a Vertex" to "Think Like a Graph"
Giraph++: From "Think Like a Vertex" to "Think Like a Graph"Giraph++: From "Think Like a Vertex" to "Think Like a Graph"
Giraph++: From "Think Like a Vertex" to "Think Like a Graph"
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 

Similar a Streaming Data in R

Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
Revolution Analytics
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
Paco Nathan
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
Ajay Ohri
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 

Similar a Streaming Data in R (20)

Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Data analysis with R and Julia
Data analysis with R and JuliaData analysis with R and Julia
Data analysis with R and Julia
 
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache AiravataRESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 

Más de Rory Winston (6)

Building A Trading Desk On Analytics
Building A Trading Desk On AnalyticsBuilding A Trading Desk On Analytics
Building A Trading Desk On Analytics
 
The Modern FX Desk
The Modern FX DeskThe Modern FX Desk
The Modern FX Desk
 
Introduction to kdb+
Introduction to kdb+Introduction to kdb+
Introduction to kdb+
 
An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packages
 
Streaming Data and Concurrency in R
Streaming Data and Concurrency in RStreaming Data and Concurrency in R
Streaming Data and Concurrency in R
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Streaming Data in R

  • 1. Streaming Data And Concurrency In R Rory Winston rory@theresearchkitchen.com
  • 2. About Me Independent Software Consultant M.Sc. Applied Computing, 2000 M.Sc. Finance, 2008 Apache Committer Interested in practical applications of functional languages and machine learning Really interested in seeing R usage grow in finance
  • 3. 1 A Short Rant 2 Why We Need Concurrency 3 Motivating Example 4 Conclusion 5 References and Further Reading
  • 4. A Short Rant Parallelization vs. Concurrency in R Multithreading vs. parallelization i.e. fork() vs. pthread_create() R interpreter is single threaded Some historical context for this (e.g. non-threadsafe BLAS implementations) Multithreading can be complex and problematic Instead a focus on parallelization: Distributed computation: gridR, nws, snow Multicore/multi-cpu scaling: Rmpi, Romp, pnmath Interfaces to PBLAS/Hadoop/OpenMP/MPI/Globus/etc. Parallelization suits large CPU-bound processing applications So do we really need it at all then?
  • 5. Why We Need Concurrency Multithreading Is A Valuable Tool I say, "yes" For general real-time (streaming to be more precise) data analysis (Growing interest in using R for streaming data, not just offline analyis) GUI toolkit integration Fine-grained control over independent task execution Fine-grained control over CPU-bound and I/O-bound task management "I believe that explicit concurrency management tools (i.e. a threads toolkit) are what we really need in R at this point." - Luke Tierney, 2001
  • 6. Why We Need Concurrency Will There Be A Multithreaded R? Short answer is: Most likely not At least not in its current incarnation Internal workings of the interpreter not particularly amenable to concurrency: Functions can manipulate caller state («- vs. <-) Lazy evaluation machinery (promises) Dynamic State, garbage collection, etc. Scoping: global environments Management of resources: streams, I/O, connections, sinks Implications for current code Possibly in the next language evolution (cf. Ihaka?)
  • 7. Motivating Example Motivating Example Based on work I did last year and presented at UseR! 2008 Wrote a real-time and historical market data service from Reuters/R The real-time interface used the Reuters C++ API R extension that spawned listening thread and handled market updates New version also does publishing as well as subscribing
  • 8. Motivating Example Motivating Example The (real-world) example involves building a new high-frequency trading system Step 1 is handling market prices (in this case interbank currency prices) Need to ensure that the new system’s prices are: Correct; Fast
  • 9. Motivating Example R Analytics C++ RMDS API RMDS Message Bus
  • 10. Motivating Example Issues With This Approach As R interpreter is single threaded, cannot spawn thread for callbacks Thus, interpreter thread is locked for the duration of subscription Not a great user experience Need to find alternative mechanism
  • 11. Motivating Example Alternative Approach If we cannot run subscriber threads in-process, need to decouple Standard approach: add an extra layer and use some form of IPC For instance, we could: Subscribe in a dedicated R process (A) Push incoming data onto a socket R process (B) reads from a listening socket Sockets could also be another IPC primitive, e.g. pipes, shared mem We will use the bigmemoRy package to leverage the latter
  • 12. Motivating Example The bigmemoRy package From the description: "Use C++ to create, store, access, and manipulate massive matrices" Allows creation of large (≥ RAM) matrices These matrices can be mapped to files/shared memory It is the shared memory functionality that we will use big.matrix(nrow, ncol, type = "integer", ....) shared.big.matrix(nrow, ncol, type = "integer", ...) filebacked.big.matrix(nrow, ncol, type = "integer", ...) read.big.matrix(file, sep=, ...)
  • 13. Motivating Example Sample Usage > library(bigmemory) > X <- shared.big.matrix(type="double", ncol=1000, nrow=1000) > X An object of class “big.matrix” Slot "address": <pointer: 0x7378a0>
  • 14. Motivating Example Create Shared Memory Descriptor > desc <- describe(X) > desc $sharedType [1] "SharedMemory" $sharedName [1] "53f14925-dca1-42a8-a547-e1bccae999ce" $nrow [1] 1000 $ncol [1] 1000 $rowNames NULL $colNames NULL $type [1] "double"
  • 15. Motivating Example Export the Descriptor In R session 1: > dput(desc, file="/tmp/matrix.desc") In R session 2: > library(bigmemory) > desc <- dget("/tmp/matrix.desc") > X <- attach.big.matrix(desc) Now R sessions A and B share the same big.matrix instance
  • 16. Motivating Example Share Data Between Sessions R session 1: > X[1,1] <- 1.2345 R session 2: > X[1,1] [1] 1.2345 Thus, streaming data can be continuously fed into session A And concurrently processed in session B
  • 17. Motivating Example RMDS Message Bus C++ RMDS API R / bigmemoRy R / bigmemoRy C++ RMDS API RMDS Message Bus
  • 18. Conclusion Summary Lack of threads not necessarily a barrier to concurrent analysis Packages like bigmemoRy, nws, etc. facilitate decoupling via IPC Could potentially take this further (using e.g. nws)
  • 19. References and Further Reading References bigmemoRy: http://cran.r-project.org/web/packages/bigmemory/ Luke Tierney’s original threading paper: http://www.cs.uiowa.edu/~luke/R/thrgui/ HPC and R Survey: http://epub.ub.uni-muenchen.de/8991/ Inside The Python GIL: www.dabeaz.com/python/GIL.pdf