SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Reading and
Manipulationg
   data in
   2013-02-15 @HSPH
  Kazuki Yoshida, M.D.
    MPH-CLE student

                         FREEDOM
                         TO	
  KNOW
Reading data in


n   Usually the first task in real-life data analysis.
Supported
n   .RData (native) files: load()
n   .csv files: read.csv()
n   .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx()
n   .sas7bdat files: sas7bdat ::read.sas7bdat()
n   .dta files: foreign::read.dta()
n   and more...
       http://cran.r-project.org/doc/manuals/R-data.html
package name
(packages add functions)     function name




   foreign::read.dta()

                        functions are followed by (),
                      in which you specify arguments
Create a folder for
   this group
Open
R Studio
Make sure your
working directory
   is correct
Download files
n   Rosner (ASCII, comma-separated and Stata):
     http://www.cengage.com/cgi-wadsworth/
     course_products_wp.pl?
     fid=M20bI&product_isbn_issn=9780538733496
n   Hernan (Excel and SAS): http://
     www.hsph.harvard.edu/miguel-hernan/causal-
     inference-book/
.csv
http://www.wondergraphs.com/img/SFO_Landings.csv
For comma-, tab-, or
space-separated text
name of object to create
                               assignment operator




new.dat <- read.csv(“file.csv”)

         function to read .csv files
                                 file name here
Space separated

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
read.table(“file.dat”)
                  or
  read.table(“file.dat”, header = T)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
tab-separated
read.delim(“file.tsv”)
     http://www.brookscole.com/cgi-wadsworth/
              course_products_wp.pl?
fid=M20b&flag=student&product_isbn_issn=9780495384
    960&disciplinenumber=1038&template=AUS
Excel files
Install xlsx package
Just click
box to load
To install/load a package

install.packages(“package”, dep = T)

         library(package)
name of object to create
                             assignment operator




xlsdat <- read.xlsx(“file.xls”, 1)

       function to read .xlsx files
                               file name here
                                   sheet number
20130215 Reading data into R
SAS native files




            library(sas7bdat)
sasdat <- read.sas7bdat(“file.sas7bdat”)
SAS xport files




        library(foreign)
 xptdat <- read.xport(“file.xpt”)

ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/
                2009-2010/DEMO_F.xpt
20130215 Reading data into R
library(foreign)
 statadat <- read.dta(“file.dta”)

http://www.biostat.harvard.edu/~fitzmaur/ala2e/
                 headache.dta
Fixed width
fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...))




                  Use width = list(c(3,5,..), c(5,7,..))
                    for multiple rows per subject
Manipulating data in R

n   Objects
n   Classes
n   Various data objects
Objects

n   Just about everything named in R is an object
n   An object is a container that
     n   knows its class (eg, I have numbers inside!).
     n   has contents (eg, Actual numbers).
Examples of objects

n   data, which you use for analysis (various classes)
n   functions, which perform analysis (function class)
n   results, which come out of analysis (various
     classes)
Classes of data values
      inside data objects
n   Numeric: Continuous variables
n   Factor: Categorical variables
n   Logical: TRUE/FALSE binary variables
n   etc...
Class?

n   An object’s class tells R how the object should be
     handled.
n   For example, summarizing data should work
     differently for numbers and categories!
Data objects

n   Vector (contains single class of data values)


n   List (contains multiple classes of data values)
Data objects

n   Vector (contains single class of data values)
     n   Array including Matrix
n   List (contains multiple classes of data values)
     n   Data frame
Vector
n   Smallest building block of data objects
n   Single dimension
n   Combination of values of same class
n   vec1 <- c(2013, 2, 15, -10) # combine
n   vec2 <- 1:16 # integers 1 to 16
Array
n   Vector folded into a multidimensional structure
n   2-dimensional array is a matrix
n   vec3 <- 1:16
n   dim(vec3) <- c(4, 4) # 4 x 4 structure
n   dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure
n   arr1 <- array(1:60, dim = c(3,4,5))
List
n   Combination of any values or objects
n   Can contain objects of multiple classes
n   eg, a list of two vectors, a matrix, three arrays
n   list1 <- list(first = 1:17, second = matrix(letters, 13,2))
n   list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
Data frame
n   Special case of a list
n   List of same-length vectors vertically aligned
n   df1 <- data.frame(list2)
n   list3 <- list(small = letters, large = LETTERS,
     number = 1:26)
n   df2 <- data.frame(list3)
Access by indexes
n   letters[3] # 1-dimensional object
n   arr1[1,2,3] # 3-dimensional object
n   arr1[1, ,3] # implies 1,(all),3
n   df1[ ,3] # implies (all),3
n   list1[[1]] # list needs [[ ]]
Access named elements
n   list3
n   list3$small
n   list3[["small"]]
n   df1$large
n   df1[, "large"]
20130215 Reading data into R

Más contenido relacionado

La actualidad más candente

Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Laura Hughes
 
R reference card
R reference cardR reference card
R reference cardHesher Shih
 
Export Data using R Studio
Export Data using R StudioExport Data using R Studio
Export Data using R StudioRupak Roy
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
R short-refcard
R short-refcardR short-refcard
R short-refcardconline
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#Shahzad
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 

La actualidad más candente (19)

Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
R reference card
R reference cardR reference card
R reference card
 
Export Data using R Studio
Export Data using R StudioExport Data using R Studio
Export Data using R Studio
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
 
Practical cats
Practical catsPractical cats
Practical cats
 
R training2
R training2R training2
R training2
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
arrays of structures
arrays of structuresarrays of structures
arrays of structures
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Data Structure In C#
Data Structure In C#Data Structure In C#
Data Structure In C#
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 

Similar a 20130215 Reading data into R (20)

R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
R교육1
R교육1R교육1
R교육1
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
R Introduction
R IntroductionR Introduction
R Introduction
 
@ R reference
@ R reference@ R reference
@ R reference
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
Unit 3
Unit 3Unit 3
Unit 3
 
R language introduction
R language introductionR language introduction
R language introduction
 
base-r.pdf
base-r.pdfbase-r.pdf
base-r.pdf
 
The Very ^ 2 Basics of R
The Very ^ 2 Basics of RThe Very ^ 2 Basics of R
The Very ^ 2 Basics of R
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Base r
Base rBase r
Base r
 
Data structures
Data structures Data structures
Data structures
 
R basics
R basicsR basics
R basics
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 

Más de Kazuki Yoshida

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisKazuki Yoshida
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTKazuki Yoshida
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Kazuki Yoshida
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...Kazuki Yoshida
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Kazuki Yoshida
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Kazuki Yoshida
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionKazuki Yoshida
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Kazuki Yoshida
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataKazuki Yoshida
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2Kazuki Yoshida
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with RKazuki Yoshida
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to DeducerKazuki Yoshida
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous dataKazuki Yoshida
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudioKazuki Yoshida
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISEDKazuki Yoshida
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 

Más de Kazuki Yoshida (20)

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysis
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
Emacs Key Bindings
Emacs Key BindingsEmacs Key Bindings
Emacs Key Bindings
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impression
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

20130215 Reading data into R

  • 1. Reading and Manipulationg data in 2013-02-15 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2. Reading data in n Usually the first task in real-life data analysis.
  • 3. Supported n .RData (native) files: load() n .csv files: read.csv() n .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx() n .sas7bdat files: sas7bdat ::read.sas7bdat() n .dta files: foreign::read.dta() n and more... http://cran.r-project.org/doc/manuals/R-data.html
  • 4. package name (packages add functions) function name foreign::read.dta() functions are followed by (), in which you specify arguments
  • 5. Create a folder for this group
  • 7. Make sure your working directory is correct
  • 8. Download files n Rosner (ASCII, comma-separated and Stata): http://www.cengage.com/cgi-wadsworth/ course_products_wp.pl? fid=M20bI&product_isbn_issn=9780538733496 n Hernan (Excel and SAS): http:// www.hsph.harvard.edu/miguel-hernan/causal- inference-book/
  • 10. For comma-, tab-, or space-separated text
  • 11. name of object to create assignment operator new.dat <- read.csv(“file.csv”) function to read .csv files file name here
  • 13. read.table(“file.dat”) or read.table(“file.dat”, header = T) http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
  • 15. read.delim(“file.tsv”) http://www.brookscole.com/cgi-wadsworth/ course_products_wp.pl? fid=M20b&flag=student&product_isbn_issn=9780495384 960&disciplinenumber=1038&template=AUS
  • 19. To install/load a package install.packages(“package”, dep = T) library(package)
  • 20. name of object to create assignment operator xlsdat <- read.xlsx(“file.xls”, 1) function to read .xlsx files file name here sheet number
  • 22. SAS native files library(sas7bdat) sasdat <- read.sas7bdat(“file.sas7bdat”)
  • 23. SAS xport files library(foreign) xptdat <- read.xport(“file.xpt”) ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/ 2009-2010/DEMO_F.xpt
  • 25. library(foreign) statadat <- read.dta(“file.dta”) http://www.biostat.harvard.edu/~fitzmaur/ala2e/ headache.dta
  • 27. fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...)) Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
  • 28. Manipulating data in R n Objects n Classes n Various data objects
  • 29. Objects n Just about everything named in R is an object n An object is a container that n knows its class (eg, I have numbers inside!). n has contents (eg, Actual numbers).
  • 30. Examples of objects n data, which you use for analysis (various classes) n functions, which perform analysis (function class) n results, which come out of analysis (various classes)
  • 31. Classes of data values inside data objects n Numeric: Continuous variables n Factor: Categorical variables n Logical: TRUE/FALSE binary variables n etc...
  • 32. Class? n An object’s class tells R how the object should be handled. n For example, summarizing data should work differently for numbers and categories!
  • 33. Data objects n Vector (contains single class of data values) n List (contains multiple classes of data values)
  • 34. Data objects n Vector (contains single class of data values) n Array including Matrix n List (contains multiple classes of data values) n Data frame
  • 35. Vector n Smallest building block of data objects n Single dimension n Combination of values of same class n vec1 <- c(2013, 2, 15, -10) # combine n vec2 <- 1:16 # integers 1 to 16
  • 36. Array n Vector folded into a multidimensional structure n 2-dimensional array is a matrix n vec3 <- 1:16 n dim(vec3) <- c(4, 4) # 4 x 4 structure n dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure n arr1 <- array(1:60, dim = c(3,4,5))
  • 37. List n Combination of any values or objects n Can contain objects of multiple classes n eg, a list of two vectors, a matrix, three arrays n list1 <- list(first = 1:17, second = matrix(letters, 13,2)) n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
  • 38. Data frame n Special case of a list n List of same-length vectors vertically aligned n df1 <- data.frame(list2) n list3 <- list(small = letters, large = LETTERS, number = 1:26) n df2 <- data.frame(list3)
  • 39. Access by indexes n letters[3] # 1-dimensional object n arr1[1,2,3] # 3-dimensional object n arr1[1, ,3] # implies 1,(all),3 n df1[ ,3] # implies (all),3 n list1[[1]] # list needs [[ ]]
  • 40. Access named elements n list3 n list3$small n list3[["small"]] n df1$large n df1[, "large"]