SlideShare una empresa de Scribd logo
1 de 47
RUBY AND R


Chang Sau Sheong
Director, Applied Research, HP Labs Singapore


1   © Copyright 2010 Hewlett-Packard Development Company, L.P.
About HP Labs



2   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS
– Exploratory and advanced
  research group for Hewlett-Packard
– Global organization that tackles
  complex challenges facing our
  customers and society over the next
  decade
– Pushes the frontiers of fundamental
  science
– HQ Palo Alto



3   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS AROUND THE WORLD

                                                                 Bristol   St. Petersburg

                                                                                 Beijing
           Palo Alto

                                                                             Bangalore

                      Haifa                                                 Singapore




4   © Copyright 2010 Hewlett-Packard Development Company, L.P.
HP LABS SINGAPORE
– Set up in February 2010
– Focus on Cloud Computing
      Research                                                   Applied Research
            •   Exploratory research                              •   Applied Research
            •   Researchers                                       •   Innovators
            •   Change the state of the art                       •   Take the research to the next
                                                                      stage
            •   Working closely with the
                academic community                                •   Work closely with customers
                                                                      and business units



5   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Ruby and R



6   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Programming language and
    platform for statistical computing,
           licensed under GPL


7   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Strengths in
               statistical processing
                                                                 and
                          data visualization

8   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Extensive library of statistical
           computing packages (CRAN)
              written by statisticians



9   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Statistics is not just
                            for statisticians


10   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Recommendation                                                       Speech
         engine                                                         recognition
        Fingerprint         Spam detection
       identification
                    Card fraud Financial
         Face        detection forecasting
     recognition

                       Data                                       OCR      Credit scoring
                      mining
11   © Copyright 2010 Hewlett-Packard Development Company, L.P.
CRAN
– Almost 2000 packages, mostly created by
  statisticians
     • BiodiversityR                           – GUI for biodiversity and community ecology
       analysis
     • Emu – analyze speech patterns
     • GenABEL – study human genome
     • Quantmod– quantitative financial modeling framework
     • Ftrading – technical trading analysis
     • Cyclones – cyclone identification
     • DOSim – disease analysis toolkit for gene set
     • Agricolae – statistical procedures for agricultural research


12   © Copyright 2010 Hewlett-Packard Development Company, L.P.
EXAMPLE R CODE
– EPL data from football-data.co.uk
– Show home/away goals distribution for 201 season
                                           1




13   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Why Ruby and R?



14   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Stand on shoulders
                          of giants


15   © Copyright 2010 Hewlett-Packard Development Company, L.P.
–Ruby
     • Human   focused programming!
     • Better general purpose programming capabilities
     • Great                  frameworks!
     • Great                  libraries (20,000+ gems in RubyGems)
–R
     • Focus   on statistical computing/crunching
     • Lots of packages written by domain experts/
       statisticians
     • Great                  graphing libraries

16   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Ruby and R
                                                    integration


17   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RINRUBY
– 100% Ruby
– Uses pipes to send commands and evals
– Uses TCP/IP Sockets to send and retrieve data
– Pros:
     •   Doesn't requires anything but R
     •   Works flawlessly on Windows
     •   Work with Ruby 1.8, 1.9 and JRuby 1.5
     •   All API tested

– Cons:
     •   VERY SLOW in assigning
     •   Very limited datatypes: only Vector and Matrix
     •   Not released since 2009
     •   Poor documentation


18   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RSRUBY
– C Extension for Ruby, linked to R's shared library
– Pros:
     •   Blazing speed! 5-10 times faster than Rserve and 100-1000 than RinRuby.
     •   Seamless integration with Ruby. Every method and object is treated like a Ruby object

– Cons:
     •   Transformation between R and Ruby types aren't trivial
     •   Dependent on operating system, Ruby implementation and R version
     •   Not available for alternative implementations of Ruby (eg JRuby)
     •   Not released since 2009
     •   Poor documentation




19   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RSERVE
– 100% Ruby
– Uses TCP/IP sockets to interchange data and commands
– Requires Rserve installed on the server machine
– Access with Ruby uses Ruby-Rserve-Client library
– Pros:
     •   Work with Ruby 1.8, 1.9 and JRuby 1.5.
     •   Session allows to process data asynchronously
     •   Fast: 5-10 times faster than RinRuby
     •   Most recently updated (Jan 2011)

– Cons:
     •   Requires Rserve
     •   Limited features on Windows
     •   Poor documentation



20   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RAPACHE/RRACK
– Web service based
– Run R scripts as web services, consumed by Ruby front-end apps
– Pros:
     •   Modular and separate (no direct integration)
     •   Can be scalable, ‘cloud’-ready

– Cons:
     •   Requires Rapache/rRack
     •   rRack is very new (not accepted by CRAN yet, as of today!), requires R 2.13 (just
         released a few weeks ago)
     •   Rapache specific to Apache web server only
     •   Communications overhead for smaller integrations




21   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Let’s look at some
                                    code!
                                                  (I’m going to use Rserve)




22   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Text classification



23   © Copyright 2010 Hewlett-Packard Development Company, L.P.
TEXT CLASSIFICATION
–Automatically sorting a set of documents into
 different categories from a predefined set
–Classic uses:                                                    Training
                                                                                          Test data
     • Spam               filtering                                 data
     • Email              prioritization
                                                                             Classifier




                                                                             category


24   © Copyright 2010 Hewlett-Packard Development Company, L.P.
25   © Copyright 2010 Hewlett-Packard Development Company, L.P.
TEXT CLASSIFIER CODE

 Prepare




26   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Train classifier by counting frequency of
each word in the document




27   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Get word count




28   © Copyright 2010 Hewlett-Packard Development Company, L.P.
What you get
     {"check"=>1, "result"=>3, "marissa"=>1, "experi"=>1,
     "click"=>1, "engin"=>1, "simpli"=>1, "mistakenli"=>1,
     "pick"=>1, "prevent"=>1, "40"=>1, "regularli"=>1, "place"=>1,
     "user"=>5, "prefer"=>1, "malevol"=>1, "access"=>1,
     "robust"=>1, "servic"=>1, "fault"=>1, "malici"=>1, "list"=>2,
     "hand"=>1, "internet"=>1, "attribut"=>1, "instal"=>1,
     "file"=>1, "unabl"=>1, "vice"=>1, "stopbadwareorg"=>2,
     "merit"=>1, "decid"=>1, "flag"=>2, "saturdai"=>2, "hit"=>2,
     "offici"=>1, "error"=>3, "work"=>1, "site"=>5, "happen"=>2,
     "incid"=>1, "technic"=>1, "advis"=>1, "put"=>1, "human"=>3,
     "harm"=>2, "softwar"=>1, "ms"=>1, "affect"=>1, "carefulli"=>1,
     "product"=>1, "presid"=>1, "complaint"=>1, "potenti"=>2,
     "googl"=>6, "comput"=>2, "peopl"=>1, "investig"=>2,
     "consum"=>1, "danger"=>2, "period"=>1, "wrote"=>2,
     "search"=>7, "ascertain"=>1, "blog"=>1, "warn"=>2,
     "problem"=>1, "updat"=>2, "minut"=>1, "mayer"=>2}




29   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Generate training data for prediction




30   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Training data



31   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,sof
twar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,syst
em,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,wal
l,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,


                                                                     The top 25 most
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0

                                                                    frequent words in
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0


                                                                   the training dataset
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0



 32   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,sof
twar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,syst
em,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,wal
l,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,


                                                                       Each line
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0

                                                                     represents 1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0


                                                                   document trained
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0



 33   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site
,softwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,
system,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous
,wall,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0
,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,
0,0,0,0,0,0,0,3,1,3,1,0,2,0


                                                                    Categories set
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0
                                                                   when the classifier
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,


                                                                      is created
0,0,3,3,0,0,0,0,0,0,0,2,0,0
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0


 34   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,microsoft,site,s
oftwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sharpli,error,group,result,sy
stem,rebel,econom,presid,crisi,find,year,accus,global,obama,china,civilian,shrink,hous,w
all,street,quarter,white,heavi,lehman,economi,session,ey,time,davo,human
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0
not_interesting,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,5,0,2,0,0,0,3,0,0,0,3,
1,0,0,0,0,0,3,0,0,0,0,0,0,2
not_interesting,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,0,3,1,2,0,2,0,0,0,


                                                                   Number indicates the
0,0,0,0,0,0,0,3,1,3,1,0,2,0
not_interesting,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1

                                                                   number of times the
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,0,0,1,2,1,4,0,
0,2,0,0,0,2,0,0,0,0,2,0,1,0


                                                                   word appears in that
not_interesting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,
0,0,3,3,0,0,0,0,0,0,0,2,0,0
not_interesting,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,2,0,0,2,0,0,

                                                                        document
2,1,0,0,2,1,0,0,2,0,0,1,0,0
interesting,6,0,7,5,0,0,0,0,1,0,5,1,2,0,0,0,0,0,0,0,0,3,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,3
interesting,0,7,0,0,2,0,0,0,0,0,0,0,1,0,0,1,0,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,1,0,0,0,0,0,3,3,1,0,1,1,1,0,3,3,0,1,0,3,0,1,0,2,0,1,0,0,0,3,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,1,0,0,3,0
interesting,0,0,0,0,3,5,5,0,0,0,0,0,0,0,0,0,1,4,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,6,0,1,1,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0
interesting,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,2,0,0


 35   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Test data



36   © Copyright 2010 Hewlett-Packard Development Company, L.P.
category,googl,report,search,user,review,court,mckinnon,year,internet,micr
 osoft,site,softwar,warn,browser,oper,expert,rise,lawyer,digit,extradit,sha
 rpli,error,group,result,system,rebel,econom,presid,crisi,find,year,accus,g
 lobal,obama,china,civilian,shrink,hous,wall,street,quarter,white,heavi,leh
 man,economi,session,ey,time,davo,human
 category,0,0,0,2,0,0,0,2,1,4,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0
 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0

37   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Using different
                  classification models


38   © Copyright 2010 Hewlett-Packard Development Company, L.P.
NAÏVE BAYES




39   © Copyright 2010 Hewlett-Packard Development Company, L.P.
SVM




40   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RANDOM FOREST




41   © Copyright 2010 Hewlett-Packard Development Company, L.P.
NEURAL NETWORKS




42   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Using the classifier



43   © Copyright 2010 Hewlett-Packard Development Company, L.P.
44   © Copyright 2010 Hewlett-Packard Development Company, L.P.
45   © Copyright 2010 Hewlett-Packard Development Company, L.P.
RESOURCES
– HP Labs Worldwide                                               – Rserve-Ruby-Client
http://www.hpl.hp.com/                                            https://github.com/clbustos/Rserve-
– R Project                                                       Ruby-client

http://www.r-project.org/                                         – rApache
– RsRuby                                                          http://rapache.net/index.html

https://github.com/alexgutteridge/rsrub                           – rRack
y                                                                 https://github.com/jeffreyhorner/rRack/
– RinRuby
http://rinruby.ddahl.org/
– Rserve
http://www.rforge.net/Rserve/


46   © Copyright 2010 Hewlett-Packard Development Company, L.P.
Thank you

 sausheong@hp.com
 http://twitter.com/sausheong
 http://blog.saush.com
47   © Copyright 2010 Hewlett-Packard Development Company, L.P.

Más contenido relacionado

Similar a Ruby and R

Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 

Similar a Ruby and R (20)

Evented programming
Evented programmingEvented programming
Evented programming
 
Python course in hyderabad
Python course in hyderabadPython course in hyderabad
Python course in hyderabad
 
Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian FrankHP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
HP Helion Webinar #1 - Introduction to HP Helion OpenStack w/Christian Frank
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Reason To learn & use r
Reason To learn & use rReason To learn & use r
Reason To learn & use r
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
iKariera 2015
iKariera 2015iKariera 2015
iKariera 2015
 
Pilot Project Highlights: Ruby on Rails - November 2006
Pilot Project Highlights: Ruby on Rails - November 2006Pilot Project Highlights: Ruby on Rails - November 2006
Pilot Project Highlights: Ruby on Rails - November 2006
 
Helion meetup-2014
Helion meetup-2014Helion meetup-2014
Helion meetup-2014
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop Big Data & SQL: The On-Ramp to Hadoop
Big Data & SQL: The On-Ramp to Hadoop
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Pig programming is fun
Pig programming is funPig programming is fun
Pig programming is fun
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
HP and linux
HP and linuxHP and linux
HP and linux
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Ruby and R