The free software environment for statistical computing and graphics "R" has been developed and it is maintained by statistical programmers, with the support of an increasing community of users with many different backgrounds, which allows access to both well-established and experimental techniques. Hydrological modelling practitioners spent large amount of time in pre- and post-processing data and results with traditional instruments. In this work "R" and some of its packages are presented as powerful tools to explore and extract patterns from raw information, to pre-process input data of hydrological models, and post-processing its results. In particular, examples are taken from analysing 30-years of daily data for a basin of 85000 km2, saving a large amount of time that could be better spent in doing analysis. In doing so, vectorial and raster GIS files were imported, for carrying out spatial and geostatistical analysis. Thousands of raw text files with time series of precipitation, temperature and streamflow were summarized and organized. Gauging stations to be used in the modelling process are selected according to the amount of days with information, and missing time series data are filled in using spatial interpolation. Time series on the gauging stations are summarized through daily, monthly and annual plots. Input files in dbase format are automatically created in a batch process. Results of a hydrological model are compared with observed values through plots and numerical goodness of fit indexes. Two packages specifically developed to assists hydrologists in the previous tasks are briefly presented. At the end, we think the "R" environment would be a valuable tool to support undergraduate and graduate education in hydrology, because it is helpful to capture the main features of large amount of data; it is a flexible and fully functional programming language, able to be interfaced to existing Fortran and C code and well suited to the ever growing demands of hydrological analysis; and finally because it is a promising environment for tackling most of the practical problems that reality poses to the hydrological modeller.
R: A Statistical Environment for doing Hydrological Analysis and Modelling (EGU 2010)
1. R: A statistical environment for hydrological analysis
R: A statistical environment for hydrological analysis
Zambrano-Bigiarini, Mauricio and Bellin, Alberto
Zambrano-Bigiarini, Mauricio and Bellin, Alberto
Dep. of Civil and Environmental Engineering, Faculty of Engineering, Università degli Studi di Trento, Trento, Italy
Dep. of Civil and Environmental Engineering, Faculty of Engineering, Università degli Studi di Trento, Trento, Italy
Abstract:EGU2010-13008
Session: HS4.22/EOS13 e-mail: mauricio.zambrano@ing.unitn.it
e-mail: mauricio.zambrano@ing.unitn.it
May 05th , 2010
1) Introduction 4) Packages by thematic area:
●
Hydrology: RHydro, HydroMe, hydrogeo, hydrosanity,
The free software environment for statistical computing
topmodel, wasim, hydroTSM, hydroGOF
and graphics “R” has been developed and it is
maintained by statistical programmers, with the
●
Geostatistics: gstat, automap, geoR, fields, RandomFields
support of an increasing community of users with many
●
GIS: spgrass6, RSAGA, rgdal, mapproj, sp, maptools RpyGeo,
different backgrounds, which allows access to both RGoogleMaps, RArcInfo
well-established and experimental techniques. ●
Flood frequency: POT, evd, nsRFA, extremes, lmomco
●
Programming language interfaces: C, Fortran, Python, Perl,
Hydrological modelling practitioners spent large Java (JRI, JGR,...)
amount of time in pre- and post-processing data and ●
Wavelets: wavelets; wavethresh, wmtsa, Rwave
results with traditional instruments. In this work “R” and ●
High Performance Computing:snowfall, multicore, jit, nws,
some of its packages are presented as powerful tools
Rmpi, snow, taskPR
to explore and extract patterns from raw information, to
pre-process input data of hydrological models, and
●
Optimization: DEoptim, optim
post-processing its results. In particular, examples are
●
Spreadsheets & DB: RPostgreSQL, RMySQL, RSQLite,
taken from analyzing 30-years of daily data for a basin RNetCDF, RexcelInstaller, xlsReadWrite
of 85000 km2, saving a large amount of time that could # Figure 2: hydroTSM::hydroplot; hydroTSM::sname2plot; ## Plotting the streamflows at station "Q093" ●
Other statistical software ( e.g, S, SAS, SPSS, Stata, Systat,
data(EbroQts) # Loading the streamflow dataset sname2plot(EbroQts, sname="Q093", dates=1, var.type="Flow")
be better spent in doing analysis. Minitab): foreign
# Figure 1: hydroTSM::matrixplot
●
Bayesian statistics: BAS, BLR, ensembleBMA, evdbayes,
# Matrix with the days with information per year in selected stations
LearnBayes, ramps, spBayes,...
info <- dwi(EbroQts[,61:110], out.unit="years", dates=EbroQts[,1]) ●
Latex: xtable,pgfSweave
# Plotting the previous matrix, with custom title
matrixplot(info, main="Nº of Days with Information (1961-1990) n in
●
Data Mining: Rweka, rattle, party, RandomForest, ...
2) Why an hydrologist should the Selected Streamgauges")
invest time in trying R ? 5) At the end... :
●
Many ready-to-use algorithms. "R" is a valuable environment to support undergraduate and
●
Existing functions, graphics and packages can be graduate education in hydrology, since it is helpful to capture the
easily adapted to particular needs. main features of large amount of data; it is a flexible and fully
●
Write once, use many times. functional programming language, able to be interfaced to
●
Large and active user community. existing Fortran and C code and well suited to the ever growing
●
Documentation is available in several languages. demands of hydrological analysis; and finally because it is a
●
Multi-platform (GNU/Linux, MacOS, Windows). promising environment for tackling most of the practical problems
●
Open Source. that reality poses to the hydrological modeller.
●
Free :)
6) Where to Start ?
●
Hydrology in R (wiki)
●
R reference card
3) hydroTSM and hydroGOF # Figure 3: hydroTSM::hydropairs
hydropairs(EbroQts[,21:24], main="Correlations among Selected
●
Graphics: http://addictedtor.free.fr/graphiques/
Daily Streamflow Stations") # Figure 4: hydroTSM::hydrokrige and hydroTSM::mspplot
●
Quick R: http://www.statmethods.net/index.html
hydroTSM is an R package with S3 functions devoted
●
R-Spatial: http://r-spatial.sourceforge.net/gallery/
to management, analysis, interpolation and plot of
●
CSRL: http://casoilresource.lawr.ucdavis.edu/drupal/node/100
hydrological time series, mainly oriented to provide
●
Manuals: http://cran.r-project.org/manuals.html
support to hydrological modelling tasks.
●
Packages: http://cran.r-project.org/web/packages/
●
Search engine: http://www.rseek.org/, R Site search
hydroGOF is an R package with S3 functions
providing both, numerical and graphical goodness-of-fit
measures between observed and simulated time
7) References :
series. ●
Jones, O., R. Maillardet, A., Robinson. (2009). Introduction to Scientific
Programming and Simulation Using R. 472pp. Chapman & Hall/CRC.
Boca Raton, FL.
Both packages will be soon available on the R website. ●
Spector, P. 2008. Data Manipulation with R. 154pp. Springer-Verlag,
Contributions are particularly welcome. Carey, NC. ISBN 978-0-387-74730-9.
●
Applied Spatial Data Analysis with R. Series: Use R. Bivand, Roger S.,
Pebesma, Edzer J., Gomez-Rubio, Virgilio. 2008. ISBN: 978-0-387-
# Fig 5: hydroGOF:plotbands # Fig 6: hydroGOF:ggof
78170-9