Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Merge Multiple CSV in single data frame using R

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 18 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Merge Multiple CSV in single data frame using R (20)

Anuncio

Merge Multiple CSV in single data frame using R

  1. 1. Merge Multiple files into single dataframe using R Yogesh Khandelwal
  2. 2. Problem Description • The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". • Data Source: http://spark- public.s3.amazonaws.com/compdata/data/specdata.zip
  3. 3. Variable Name
  4. 4. Variables in file • Date: the date of observation in YYYY-MM-DD format (year-month-day) ,Datatype:factor • sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • Id:location id,Datatype:int
  5. 5. Before we start we should know • Functions in R • How to merge data files
  6. 6. Functions in R
  7. 7. Functions in R Functions are created using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. f <- function(<arguments>) { ## Do something interesting } • Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly, • Functions can be passed as arguments to other functions. • Functions can be nested, so that you can define a function inside of another function • The return value of a function is the last expression in the function • body to be evaluated.
  8. 8. Function contd.. • For ex: Function name Function defination Function call
  9. 9. Our objective • How we can merge no. of files into single data frame? • How to apply same function to different files in efficient way?
  10. 10. How to merge two different files?
  11. 11. • No.of options available like 1. Use merge() function 2. Use rbind(),cbind() etc.
  12. 12. How to merge no.of files as a single data frame • Approach 1 files<-list.files("specdata",full.names = TRUE) dat<-NULL for(i in 1:332) { dat<-rbind(dat,read.csv(files[i])) } • Further we can run various command on merged file object as per our need some are like: 1. Str(dat) 2. Head(dat) 3. Tail(dat) etc. Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned.
  13. 13. How to handle missing value in R ?
  14. 14. contd. • In R, NA is used to represent any value that is 'not available' or 'missing' (in the | statistical sense) • Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness. • For ex: • X<-c(1,2,NA,4) • Y<-c(NA,2,3,1) • >x+y • [1] NA 4 NA 5 • Multiple options are available in R to handle NA values like • Is.NA() • Set na.rm=TRUE as a function argument > mean(X) [1] NA > mean(X,na.rm = TRUE) [1] 2.333333
  15. 15. Apply what we learn to our dataset Function defination
  16. 16. Function call pollutantmean('specdata','nitrate',1:10) [1] 0.7976266
  17. 17. Thank You!!

Notas del editor

  • lapply() applies a given function for each element in a list,so there will be several function calls.
    do.call() applies a given function to the list as a whole,so there is only one function call.

×