SlideShare a Scribd company logo
1 of 29
Download to read offline
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium




                                       R on Amazon cloud

           Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)



                                                        2012




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Outline



      1   Getting started on Amazon cloud


      2   Some concrete applications using Hadoop


      3   About RBelgium




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate
             S3, EC2, EMR, . . .




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Basics on AWS


             Register for AWS EC2 and S3 account
             (http://aws.amazon.com/)
             Account Number, Access Key ID, Secret Access Key, 509
             Certificate
             S3, EC2, EMR, . . .
             Not followed or some more info ?
             http://aws.amazon.com/documentation/gettingstarted/
             http://www.bucketexplorer.com/documentation/
             amazon-s3--what-is-my-aws-access-and-secret-key.html
             http://www.yusufhm.info/content/
             adding-x509-certificate-aws-iam-user-api-command-line-tools-0
             ...



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Why AWS?




             Simple to use Just start up an instance with an AMI
             Elastic: Auto-scaling groups (RAM,CPU) + Load balancing
             (I/O) + Elastic IPs
             On demand: anytime, what you want (limit to 20 EC2
             instances without demand), normal, spot, reserved and
             EBS-optimized (see http://aws.amazon.com/ec2/)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Which AMI(s)? (1/2)

             Bioconductor on Amazon cloud: http:
             //bioconductor.org/help/bioconductor-cloud-ami/
             MPI cluster on Amazon:
       Example
   1                 l i b r a r y ( Rmpi )
                     mpi . spawn . R s l a v e s ( )
   3                 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x
                             ) x +1)
                     mpi . c l o s e . R s l a v e s ( )
   5                 mpi . q u i t ( )

                                       Listing 1: ’Rmpi’ on EC2



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Which AMI(s)? (2/2)
             Parallel cluster on Amazon:
       Example
   1                  library ( parallel )
                      c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’
                             10.68.155.45 ’ , ’ 10.68.155.65 ’ ) )
   3                  c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) )

                                      Listing 2: ’parallel’ on EC2

             Hadoop cluster on Amazon with RHadoop:
             https://github.com/RevolutionAnalytics/RHadoop/tree/
             master/rmr2/pkg/tools
             Storm cluster on Amazon:
             https://github.com/nathanmarz/storm-deploy
             SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise
             (Hadoop for batch + NoSQL for real-time), etc.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y
      solve(t(X)%*%X, t(X)%*%y)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
      Toy case
      Xβ=y
      solve(t(X)%*%X, t(X)%*%y)




                                                              =




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (1/4)
       Toy case
       Xβ=y
       solve(t(X)%*%X, t(X)%*%y)




                                                              =


       Example
   1             l i b r a r y ( rmr2 )
                 X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) )
   3             y = a s . m a t r i x ( rnorm ( 2 0 0 ) )

                                    Listing 6: initializing variables

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (2/4)


       Example
   1         tXX =
               values (
   3           from . d f s (
               mapreduce (
   5           input = X,
               map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) ,
   7            % reduce = reducerFunction ,
               combine = TRUE) ) ) [ [ 1 ] ]

                             Listing 7: ’rmr2’ matrix multiplication




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Using rmr2 in Hadoop framework (3/4)


       Example
              tXy =
   2              values (
                  from . d f s (
   4              mapreduce (
                  input = X,
   6             map = f u n c t i o n ( k , X i )
                  k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) ,
   8              combine = TRUE) ) ) [ [ 1 ] ]
              s o l v e ( tXX , tXy )

                                        Listing 8: ’rmr2’ solving




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


How to debug (4/4)




      Debugging
      rmr.str(varName)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR with segue package


       Example
   1              l i b r a r y ( segue )
                  s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”)
   3              m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 ,
                          m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” ,
                  s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a
                         ”)
   5              R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc )
                  stopCluster ()

                                Listing 9: R on EMR with ’segue’




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (1/3)

             Upload the numberList file (integers from 1 to 100 with one
             integer per line) and the following R scripts: ”mapper.r” and
             ”reducer.r” to your AWS S3
             Run the command line in your bash:
      Example
          . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / /
                 y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / /
                 y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t /
                 r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −−
                 name EMRexampleR1 −−num−i n s t a n c e s 1

                                  Listing 10: Running R on EMR



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (2/3)


       Example
   1             #! / u s r / b i n / env R s c r i p t
                 t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$
                           ) ” , ”” , l i n e )
   3             con <− f i l e ( ” s t d i n ” , open = ” r ” )
                 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn
                          = FALSE ) ) > 0 ) {
   5                   l i n e <− t r i m W h i t e S p a c e ( l i n e )
                      c a t ( a s . n u m e r i c ( l i n e ) , ”  t ” , ” n” , s e p=” ” )
   7             }

          Listing 11: Running simple R scripts on EMR - mapper script




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


R on EMR using the API command (2/3)


       Example
   1         #! / u s r / b i n / env R s c r i p t
              t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$
                      ) ” , ”” , l i n e )
   3          con <− f i l e ( ” s t d i n ” , open = ” r ” )
              x <− c ( )
   5          w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn
                     = FALSE ) ) > 0 ) {
       x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) )
   7          }
              c a t ( mean ( x ) )

          Listing 12: Running simple R scripts on EMR - reducer script



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


How to debug (4/4)




       Debugging
       Debug first your R code in local with the command line:
                  c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g
                        mapper . r o u t . t x t ;
   2              c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g
                        r e d u c e r . r 2>&1

                    Listing 13: Debugging R code before using EMR

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Tips with EMR
             Be careful between s3 and s3n, either you use one or the other
             but not both. For more information about the differences
             between s3 and s3n, see
             http://stackoverflow.com/questions/10569455/difference-
             between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6
             2012).
             The first line of the file must be well written to call the right
             language (such as #! /usr/bin/env Rscript" for R or
             #!/usr/bin/python for python). If this file is called by
             another one then this is not necessary (ex: an R script calls an
             R function from another file, the R function file does not need
             to start with #! /usr/bin/env Rscript).
             the output directory may NOT exist before launching your
             EMR job, otherwise the job will always FAIL. Use
             s3://yourProjects/project1 instead of s3://project1.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Projects in RBelgium

             http://www.heritagehealthprize.com/c/hhp




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Projects in RBelgium

             http://www.heritagehealthprize.com/c/hhp




             Text Mining using real “text” data extracted from the
             database systems of a project-partner

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (1/3)




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (2/3)
       Example
                 mygroup <− ” RBelgium ”
   2             # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API
                 l i b r a r y ( RJSONIO , R c u r l )
   4             # library for plotting
                 l i b r a r y ( ggplot2 )
   6             # g e t member d a t a from meetup . com
                 domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 /
                        members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e
                        =RBelgium ” , c o l l a p s e=” ” , s e p=” ” )
   8             domain . g e t<−getURL ( domain . u r l )
                 domain . d a t a<−fromJSON ( domain . g e t )
  10             # d i s p l a y i n g names
                 p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n (
                         x ) x $name ) ) )



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium members (3/3)
       Example
   1             # p l o t t i n g graph
                 j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s ,
                        f u n c t i o n ( x ) x$ j o i n e d ) )
   3             o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ]
                 l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=”
                        1970−01−01” )
   5             d f <− d a t a . f r a m e (
                             x=l a b ,
   7                         y =1: l e n g t h ( domain . d a t a $ r e s u l t s )
                             )
   9             png ( ” memberJoined . png ” )
                 ggplot ( df ) +
  11                     geom p o i n t ( a e s ( x = x , y = y ) ) +
                         x l a b ( ” Date ” ) +
  13                     y l a b ( ”#members ” )
                 dev . o f f ( )

Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


RBelgium on internet


             Website: http://www.meetup.com/RBelgium/ (68
             members)
             Website: http://www.rbelgium.be
             Twitter: twitter.com/rbelgium (5 followers)
             LinkedIn: http://www.linkedin.com/groups/
             RBelgium-4223869?gid=4223869&trk=hb_side_g (7
             members)
             Google group:
             http://groups.google.com/group/rbelgium,
             rbelgium@googlegroups.com



Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud
Getting started on Amazon cloud
                  Some concrete applications using Hadoop
                                          About RBelgium


Questions?




Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)   R on Amazon cloud

More Related Content

Similar to R belgium 20121116-awson-cloud-beamer

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of HadoopAsif Ali
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBoxlzap
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Servicesstephenjbarr
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleMerelda
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
SAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkSAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkJuergen Schmerder
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 

Similar to R belgium 20121116-awson-cloud-beamer (20)

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Services
 
Building an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable ScaleBuilding an MLOps Stack for Companies at Reasonable Scale
Building an MLOps Stack for Companies at Reasonable Scale
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
SAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At WorkSAP REST Summit 2009 - Atom At Work
SAP REST Summit 2009 - Atom At Work
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 

Recently uploaded

8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7dollysharma2066
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfArturo Pacheco Alvarez
 
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
Dubai Call Girls Bikni O528786472 Call Girls Dubai Ebony
Dubai Call Girls Bikni O528786472 Call Girls Dubai EbonyDubai Call Girls Bikni O528786472 Call Girls Dubai Ebony
Dubai Call Girls Bikni O528786472 Call Girls Dubai Ebonyhf8803863
 
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样7pn7zv3i
 
ppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interestppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my InterestNagaissenValaydum
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Judith Chuquipul
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝soniya singh
 
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfJORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfArturo Pacheco Alvarez
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeOptics-Trade
 
Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeOptics-Trade
 
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Eticketing.co
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxEuro Cup 2024 Tickets
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyApk Toly
 
Technical Data | ThermTec Wild 650 | Optics Trade
Technical Data | ThermTec Wild 650 | Optics TradeTechnical Data | ThermTec Wild 650 | Optics Trade
Technical Data | ThermTec Wild 650 | Optics TradeOptics-Trade
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.SJU Quizzers
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLAll American Billiards
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeOptics-Trade
 

Recently uploaded (20)

8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
 
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Serviceyoung Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
young Call girls in Moolchand 🔝 9953056974 🔝 Delhi escort Service
 
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
Dubai Call Girls Bikni O528786472 Call Girls Dubai Ebony
Dubai Call Girls Bikni O528786472 Call Girls Dubai EbonyDubai Call Girls Bikni O528786472 Call Girls Dubai Ebony
Dubai Call Girls Bikni O528786472 Call Girls Dubai Ebony
 
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
办理学位证(KCL文凭证书)伦敦国王学院毕业证成绩单原版一模一样
 
ppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interestppt on Myself, Occupation and my Interest
ppt on Myself, Occupation and my Interest
 
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
Resultados del Campeonato mundial de Marcha por equipos Antalya 2024
 
FULL ENJOY Call Girls In Savitri Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In  Savitri Nagar (Delhi) Call Us 9953056974FULL ENJOY Call Girls In  Savitri Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Savitri Nagar (Delhi) Call Us 9953056974
 
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
Call Girls in Dhaula Kuan 💯Call Us 🔝8264348440🔝
 
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdfJORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
JORNADA 4 LIGA MURO 2024TUXTEPEC1234.pdf
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics Trade
 
Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics Trade
 
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
Croatia vs Italy UEFA Euro 2024 Croatia's Checkered Legacy on Display in New ...
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
 
Technical Data | ThermTec Wild 650 | Optics Trade
Technical Data | ThermTec Wild 650 | Optics TradeTechnical Data | ThermTec Wild 650 | Optics Trade
Technical Data | ThermTec Wild 650 | Optics Trade
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FL
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
 

R belgium 20121116-awson-cloud-beamer

  • 1. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on Amazon cloud Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) 2012 Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 2. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Outline 1 Getting started on Amazon cloud 2 Some concrete applications using Hadoop 3 About RBelgium Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 3. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 4. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 5. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 6. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Basics on AWS Register for AWS EC2 and S3 account (http://aws.amazon.com/) Account Number, Access Key ID, Secret Access Key, 509 Certificate S3, EC2, EMR, . . . Not followed or some more info ? http://aws.amazon.com/documentation/gettingstarted/ http://www.bucketexplorer.com/documentation/ amazon-s3--what-is-my-aws-access-and-secret-key.html http://www.yusufhm.info/content/ adding-x509-certificate-aws-iam-user-api-command-line-tools-0 ... Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 7. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Why AWS? Simple to use Just start up an instance with an AMI Elastic: Auto-scaling groups (RAM,CPU) + Load balancing (I/O) + Elastic IPs On demand: anytime, what you want (limit to 20 EC2 instances without demand), normal, spot, reserved and EBS-optimized (see http://aws.amazon.com/ec2/) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 8. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Which AMI(s)? (1/2) Bioconductor on Amazon cloud: http: //bioconductor.org/help/bioconductor-cloud-ami/ MPI cluster on Amazon: Example 1 l i b r a r y ( Rmpi ) mpi . spawn . R s l a v e s ( ) 3 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x ) x +1) mpi . c l o s e . R s l a v e s ( ) 5 mpi . q u i t ( ) Listing 1: ’Rmpi’ on EC2 Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 9. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Which AMI(s)? (2/2) Parallel cluster on Amazon: Example 1 library ( parallel ) c l <− makePSOCKcluster ( c ( ’ 1 0 . 6 8 . 1 5 5 . 3 0 ’ , ’ 10.68.155.45 ’ , ’ 10.68.155.65 ’ ) ) 3 c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) ) Listing 2: ’parallel’ on EC2 Hadoop cluster on Amazon with RHadoop: https://github.com/RevolutionAnalytics/RHadoop/tree/ master/rmr2/pkg/tools Storm cluster on Amazon: https://github.com/nathanmarz/storm-deploy SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise (Hadoop for batch + NoSQL for real-time), etc. Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 10. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 11. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 12. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 13. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (1/4) Toy case Xβ=y solve(t(X)%*%X, t(X)%*%y) = Example 1 l i b r a r y ( rmr2 ) X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) ) 3 y = a s . m a t r i x ( rnorm ( 2 0 0 ) ) Listing 6: initializing variables Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 14. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (2/4) Example 1 tXX = values ( 3 from . d f s ( mapreduce ( 5 input = X, map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%∗%Xi ) , 7 % reduce = reducerFunction , combine = TRUE) ) ) [ [ 1 ] ] Listing 7: ’rmr2’ matrix multiplication Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 15. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Using rmr2 in Hadoop framework (3/4) Example tXy = 2 values ( from . d f s ( 4 mapreduce ( input = X, 6 map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi ) %∗% y ) ) , 8 combine = TRUE) ) ) [ [ 1 ] ] s o l v e ( tXX , tXy ) Listing 8: ’rmr2’ solving Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 16. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium How to debug (4/4) Debugging rmr.str(varName) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 17. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR with segue package Example 1 l i b r a r y ( segue ) s e t C r e d e n t i a l s (” accessKey ” ,” secretAccessKey ”) 3 m y C l u s t e r <− c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 , m a s t e r I n s t a n c e T y p e=”m1 . s m a l l ” , s l a v e I n s t a n c e T y p e=”m1 . s m a l l ” , l o c a t i o n=” us−e a s t −1a ”) 5 R e s u l t L i s t<−e m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc ) stopCluster () Listing 9: R on EMR with ’segue’ Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 18. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (1/3) Upload the numberList file (integers from 1 to 100 with one integer per line) and the following R scripts: ”mapper.r” and ”reducer.r” to your AWS S3 Run the command line in your bash: Example . / e l a s t i c −mapreduce −−c r e a t e −−s t r e a m −−i n p u t s 3 : / / y o u r b u c k e t / n u m b e r L i s t . t x t −−mapper s 3 : / / y o u r b u c k e t / mapper . r −−r e d u c e r s 3 : / / y o u r b u c k e t / r e d u c e r . r −−o u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s −− name EMRexampleR1 −−num−i n s t a n c e s 1 Listing 10: Running R on EMR Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 19. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { 5 l i n e <− t r i m W h i t e S p a c e ( l i n e ) c a t ( a s . n u m e r i c ( l i n e ) , ” t ” , ” n” , s e p=” ” ) 7 } Listing 11: Running simple R scripts on EMR - mapper script Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 20. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium R on EMR using the API command (2/3) Example 1 #! / u s r / b i n / env R s c r i p t t r i m W h i t e S p a c e <− f u n c t i o n ( l i n e ) gsub ( ” ( ˆ +) | ( +$ ) ” , ”” , l i n e ) 3 con <− f i l e ( ” s t d i n ” , open = ” r ” ) x <− c ( ) 5 w h i l e ( l e n g t h ( l i n e <− r e a d L i n e s ( con , n = 1 , warn = FALSE ) ) > 0 ) { x <− c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) ) 7 } c a t ( mean ( x ) ) Listing 12: Running simple R scripts on EMR - reducer script Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 21. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium How to debug (4/4) Debugging Debug first your R code in local with the command line: c a t i n p u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g mapper . r o u t . t x t ; 2 c a t o u t . t x t | R CMD BATCH −−s l a v e −−no−t i m i n g r e d u c e r . r 2>&1 Listing 13: Debugging R code before using EMR Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 22. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Tips with EMR Be careful between s3 and s3n, either you use one or the other but not both. For more information about the differences between s3 and s3n, see http://stackoverflow.com/questions/10569455/difference- between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6 2012). The first line of the file must be well written to call the right language (such as #! /usr/bin/env Rscript" for R or #!/usr/bin/python for python). If this file is called by another one then this is not necessary (ex: an R script calls an R function from another file, the R function file does not need to start with #! /usr/bin/env Rscript). the output directory may NOT exist before launching your EMR job, otherwise the job will always FAIL. Use s3://yourProjects/project1 instead of s3://project1. Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 23. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Projects in RBelgium http://www.heritagehealthprize.com/c/hhp Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 24. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Projects in RBelgium http://www.heritagehealthprize.com/c/hhp Text Mining using real “text” data extracted from the database systems of a project-partner Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 25. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (1/3) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 26. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (2/3) Example mygroup <− ” RBelgium ” 2 # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API l i b r a r y ( RJSONIO , R c u r l ) 4 # library for plotting l i b r a r y ( ggplot2 ) 6 # g e t member d a t a from meetup . com domain . u r l<−p a s t e ( ” h t t p s : / / a p i . meetup . com/ 2 / members ? k e y=” , mykey , ”&s i g n=t r u e&g r o u p u r l n a m e =RBelgium ” , c o l l a p s e=” ” , s e p=” ” ) 8 domain . g e t<−getURL ( domain . u r l ) domain . d a t a<−fromJSON ( domain . g e t ) 10 # d i s p l a y i n g names p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x $name ) ) ) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 27. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium members (3/3) Example 1 # p l o t t i n g graph j o i n s <− u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n ( x ) x$ j o i n e d ) ) 3 o r d e r e d J o i n s <− j o i n s [ o r d e r ( j o i n s ) ] l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=” 1970−01−01” ) 5 d f <− d a t a . f r a m e ( x=l a b , 7 y =1: l e n g t h ( domain . d a t a $ r e s u l t s ) ) 9 png ( ” memberJoined . png ” ) ggplot ( df ) + 11 geom p o i n t ( a e s ( x = x , y = y ) ) + x l a b ( ” Date ” ) + 13 y l a b ( ”#members ” ) dev . o f f ( ) Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 28. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium RBelgium on internet Website: http://www.meetup.com/RBelgium/ (68 members) Website: http://www.rbelgium.be Twitter: twitter.com/rbelgium (5 followers) LinkedIn: http://www.linkedin.com/groups/ RBelgium-4223869?gid=4223869&trk=hb_side_g (7 members) Google group: http://groups.google.com/group/rbelgium, rbelgium@googlegroups.com Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
  • 29. Getting started on Amazon cloud Some concrete applications using Hadoop About RBelgium Questions? Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud