Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data mining with Rattle For R

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 40 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Data mining with Rattle For R (20)

Más reciente (20)

Anuncio

Data mining with Rattle For R

  1. 1. Data Mining with Rattle for R Akhil Anil Karun Full Stack Engineer (Java)
  2. 2. About Me ● Entrepreneur ● 4+ Years Exp in Java Development ● Passionate about programming ● Loves blogging
  3. 3. Courtsey Dr Graham Williams ● PhD, Data Scientist, Togaware and Australian Taxation Office; ● Adjunct Professor, Australian National University ● International Visiting Professor, Chinese Academy of Sciences. ● Dr. Williams is the author of the Rattle, the well--known mining and analytics tool built on top of R. ● Open Source Enthusiast
  4. 4. Scope of Discussion ● Introduction to Data Mining ● Introduction to R ● Introduction to R- Studio ● Introduction to Rattle ● Shiny - Build Web Applications Using R
  5. 5. Introduction to Data Mining A data driven analysis to uncover otherwise unknown but useful patterns in large datasets, to discover new knowledge and to develop predictive models, turning data and information into knowledge and (one day perhaps) wisdom, in a timely manner.
  6. 6. Data Mining A data driven analysis to uncover otherwise unknown but useful patterns in large datasets, to discover new knowledge and to develop predictive models, turning data and information into knowledge and (one day perhaps) wisdom, in a timely manner.
  7. 7. Data Mining Application of ● Machine Learning ● Statistics Software Engineering and Programming with Data ● Effective Communications and Intuition . . . to Datasets that vary by Volume, Velocity, Variety, Value, Veracity . . . to discover new knowledge . . . to improve business outcomes . . . to deliver better tailored services
  8. 8. Application of Data Mining ● Health Research: Adverse reactions using linked Pharmaceutical, General Practitioner, Hospital, Pathology datasets. ● Psychology: Investigation of age-of-onset for Alzheimer’s disease from 75 variables for 800 people. ● Social Sciences: Survey evaluation. Social network analysis - identifying key influencers.
  9. 9. Stats about Data Mining ● SAS has annual revenues of $3B (2013) ● IBM bought SPSS for $1.2B (2009) ● Analytics is >$100B business and >$320B by 2020 ● Amazon, eBay/PayPal, Google, Facebook, LinkedIn, . . . ● Shortage of 180,000 data scientists in US in 2018 (McKinsey)
  10. 10. Introduction to R ● R is a programming language and environment developed for statistical analysis by practising statisticians and researchers. ● Most widely used Data Mining and Machine Learning Package ○ Machine Learning ○ Statistics ○ Software Engineering and Programming with Data ○ But not the nicest of languages for a Computer Scientist!
  11. 11. Why R ? ● Free ○ . . . all modern statistical approaches ○ . . . many/most machine learning algorithms ○ . . . opportunity to readily add new algorithms ● That is important for us in the research community Get our algorithms out there and being used—impact!!!
  12. 12. Open Source (R) A Danger ? “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” Anne H. Milley, director of technology product marketing at SAS (New York Times, 7 January 2009). It’s interesting that SAS Institute feels that non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others should be trusted when building aircraft engines. (Frank Harrell)
  13. 13. Introduction to R
  14. 14. Popularity of R
  15. 15. Popularity of R
  16. 16. Popularity of R
  17. 17. Popularity of R
  18. 18. R — The Video A 90 Second Promo from Revolution Analytics http://www.revolutionanalytics.com/what-is-open-source-r/
  19. 19. Popularity of R
  20. 20. A Quick Tour - R ● Basic R libraries ● Exploring Facebook Data
  21. 21. Why a separate - Rattle GUI?
  22. 22. Why Rattle? ● Statistics can be complex and traps await ● So many tools in R to deliver insights ● Effective analyses should be scripted ● Scripting also required for repeatability ● R is a language for programming with data How to remember how to do all of this in R? How to skill up 150 data analysts with Data Mining?
  23. 23. Rattle - Installation ● Rattle is built using R ● Need to download and install R from cran.r-project.org ● Recommend also install RStudio from www.rstudio.org ● Then start up RStudio and install Rattle: ○ install.packages("rattle") ● Then we can start up Rattle: ○ rattle() ● Required packages are loaded as needed.
  24. 24. A Tour through Rattle : Step 1 - Explorations & Transformation ● Summarising Data - Skewness, Kurtosis, Missing values ● Visualising Distribution - Box Plot, Histogram ● Correlation Analysis - Text / Plot ● Rescaling data ● Imputation
  25. 25. A Tour through Rattle : Step 2 - Building Models ● Descriptive and Predictive Analytics . ● Cluster Analysis ● Association Analysis ● Decision Trees ● Random Forests ● Boosting
  26. 26. A Tour through Rattle : Step 3 - Model Evaluations ● Run against the test dataset ● False Positives and False Negatives
  27. 27. Rattle Interface Notes ● Work through the tabs from left to right ● After setting up a tab we need to Execute it ● Projects save the current Rattle state ● Projects can be restored at a later time
  28. 28. Moving back to R
  29. 29. Moving To R from Rattle - GUI To CLI ● Use the Log Tab - Tour
  30. 30. Step 1 : Load Data in R
  31. 31. Step 2: Observe The Data - Observations
  32. 32. Step 3: Observe The Data - Structure
  33. 33. Step 3: Observe The Data - Summary
  34. 34. Introduction to RStudio
  35. 35. Rattle : Scatter Plot
  36. 36. Rattle : Scatter Plot 2
  37. 37. Getting Help - Precede Command with ?
  38. 38. Shiny ● Build R based web applications using Shiny ● Shiny combines the computational power of R with the interactivity of the modern web. ● Just 2 files - ui.R and server.R ● Free , Paid and Self managed hosting available
  39. 39. Resources & References ● OnePageR: http://onepager.togaware.com ● Tutorial Notes Rattle: http://rattle.togaware.com ● Guides: http://datamining.togaware.com ● Practise: http://analystfirst.com ● Book: Data Mining using Rattle/R ● Chapter: Rattle and Other Tales

×