SlideShare una empresa de Scribd logo
1 de 40
Exploratory Data Analysis 
Wesley GOI
In today’s session 
• Principles behind exploratory analyses 
• Plotting data out on to popular exploratory graphs 
• Plotting Systems in R 
• Base (Week1) 
• Lattice (Week2) 
• GGPLOT2 (Week2) 
• Choosing and using Graphic Devices aka the output formats 
Scripts can be downloaded at: 
https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 
https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
Principles behind exploratory analyses 
• Show comparisons 
• Show causality, mechanism, explanation 
• Show multivariate data 
• Integrate multiple modes of evidence 
• Describe and document the evidence 
• Content is king 
• SPEED
Dimensionality 
• Five-number summary 
• Boxplots 
• Histograms 
• Density plot 
• Barplot 
Multiple-overlayed 1D plots 
Scatter plots
Downloading our dataset 
R code 
dir.create("exploring_data") 
setwd(“exploring_data”) 
download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") 
unzip(“data.zip”)
R code 
Boxplots 
weather = read.table("SilwoodWeather.txt",h=T) 
onemonth = subset(weather, 
month==1 & yr == 2004) 
boxplot(onemonth$rain) 
Header = T
Histograms 
R code 
hist(weather$upper) 
rug(weather$upper) ticks for each value
Barplot 
R code 
Barplot( 
table(weather$month), 
col = "wheat", 
main = "Number of Observations in 
Months”)
Raster Vector 
PNG PDF SVG 
grDevices 
Filesize small medium medium 
Scalable No Yes Yes 
Web friendly Yes No Yes
Plotting Systems 
Plotting Systems 
Base Lattice Grid 
Libraries lattice grid, gridExtras 
ggplot2 
Example 
functions 
hist✔ 
barplot✔ 
boxplot✔ 
Plot 
xyplot (scatterplots) 
bwplot (boxplots) 
levelplot 
qplot 
ggplot 
geom 
Facetted plots Yes Yes Yes 
Grammar of 
NO No Yes 
graphics 
Interface with 
statistical 
functions 
Yes Partial Partial + 
Workarounds 
Cannot 
be mixed
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T)
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
#Regression Line 
with(data1, abline(lm(ys~xv))) 
Color
Base plots: Scatterplot 
Set symbol to represent data point
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Using par for multiple plots 
R code 
par(mfrow=c(1,2)) 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#Plot2 
with(data2, 
plot(xv2, ys2, col="blue", 
pch =11)) 
title(“My Title", outer=TRUE)
Par: To set global settings 
R code 
mfrow( 
mar=c(5.1,4.1,4.1,2.1), 
oma=c(2,2,2,2) 
)
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
xyplot( x~y | f, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame 
given
ggplot2 
• Grammar of graphics (gg) 
• Based on GRID plotting system, cannot be 
mixed with base 
ggplot2.org
ggplot 
Components 
• Data & relationship 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting
ggplot 
Data
ggplot 
Mapping
ggplot 
Geometric objects 
aka 
Geoms 
Coordinate system 
wrt 
scales 
Log scale / sqrt / log ratio 
Title 
Plot 
Theme 
etc
ggplot 
Geometric objects 
aka 
Geoms
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting 
R code 
Rmbr to change 
month into a 
factor 
data.frame 
Aesthetics function which maps the relationships 
ggplot(weather, aes(x=month, y=upper))+ 
geom_boxplot()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot
qplot 
A separate function which wraps ggplot, for simpler syntax 
R code 
qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", 
stat="identity")
Ethos behind visualization 
http://keylines.com/network-visualization
Final Challenge
Final Challenge 
R code 
library(ggplot2) 
#Reads in data 
data = read.csv("final.csv") 
#Preparing for the rectangle background 
areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) 
areas=areas[order(areas$Planning_Region),] 
areas$rectid=1:nrow(areas) 
rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 
0.5,xend= max(rectid)+0.5) 
#Order the levels 
data$Planning_Area=factor(data$Planning_Area, 
levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
Final challenge 
#Plot 
p0 = 
ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ 
geom_boxplot(outlier.colour=NA)+ 
geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = 
Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ 
geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ 
scale_color_brewer("Year", palette='RdBu')+ 
scale_fill_brewer(palette="Set1",name='Region')+ 
theme_minimal()+ 
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ 
xlab("Planning Area")+ylab("Unit Price (PSM)") 
R code 
#Save plot 
ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
“Above all else show the data.” 
― Edward R. Tufte, The Visual Display of Quantitative Information 
Thank you for your time
gridExtras

Más contenido relacionado

Destacado

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisDaire Hooper
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysisOzan Baskan
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisAmmar Pervaiz
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in StatisticsAzmi Mohd Tamil
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsBhagya Silva
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysisJames Neill
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Destacado (16)

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Time Series
Time SeriesTime Series
Time Series
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
time series analysis
time series analysistime series analysis
time series analysis
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar a Exploratory Analysis Part1 Coursera DataScience Specialisation

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2jalle6
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in RIlya Zhbannikov
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality GraphicsClaus Wilke
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxhelzerpatrina
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using RChetan Khanzode
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 

Similar a Exploratory Analysis Part1 Coursera DataScience Specialisation (20)

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
R training5
R training5R training5
R training5
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 

Último

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 

Último (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 

Exploratory Analysis Part1 Coursera DataScience Specialisation

  • 2. In today’s session • Principles behind exploratory analyses • Plotting data out on to popular exploratory graphs • Plotting Systems in R • Base (Week1) • Lattice (Week2) • GGPLOT2 (Week2) • Choosing and using Graphic Devices aka the output formats Scripts can be downloaded at: https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
  • 3. Principles behind exploratory analyses • Show comparisons • Show causality, mechanism, explanation • Show multivariate data • Integrate multiple modes of evidence • Describe and document the evidence • Content is king • SPEED
  • 4. Dimensionality • Five-number summary • Boxplots • Histograms • Density plot • Barplot Multiple-overlayed 1D plots Scatter plots
  • 5. Downloading our dataset R code dir.create("exploring_data") setwd(“exploring_data”) download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") unzip(“data.zip”)
  • 6. R code Boxplots weather = read.table("SilwoodWeather.txt",h=T) onemonth = subset(weather, month==1 & yr == 2004) boxplot(onemonth$rain) Header = T
  • 7. Histograms R code hist(weather$upper) rug(weather$upper) ticks for each value
  • 8. Barplot R code Barplot( table(weather$month), col = "wheat", main = "Number of Observations in Months”)
  • 9. Raster Vector PNG PDF SVG grDevices Filesize small medium medium Scalable No Yes Yes Web friendly Yes No Yes
  • 10. Plotting Systems Plotting Systems Base Lattice Grid Libraries lattice grid, gridExtras ggplot2 Example functions hist✔ barplot✔ boxplot✔ Plot xyplot (scatterplots) bwplot (boxplots) levelplot qplot ggplot geom Facetted plots Yes Yes Yes Grammar of NO No Yes graphics Interface with statistical functions Yes Partial Partial + Workarounds Cannot be mixed
  • 11. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T)
  • 12. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) #Regression Line with(data1, abline(lm(ys~xv))) Color
  • 13. Base plots: Scatterplot Set symbol to represent data point
  • 14. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 15. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 16. Base plots: Using par for multiple plots R code par(mfrow=c(1,2)) with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #Plot2 with(data2, plot(xv2, ys2, col="blue", pch =11)) title(“My Title", outer=TRUE)
  • 17. Par: To set global settings R code mfrow( mar=c(5.1,4.1,4.1,2.1), oma=c(2,2,2,2) )
  • 18. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame
  • 19.
  • 20. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) xyplot( x~y | f, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame given
  • 21.
  • 22. ggplot2 • Grammar of graphics (gg) • Based on GRID plotting system, cannot be mixed with base ggplot2.org
  • 23. ggplot Components • Data & relationship • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting
  • 26. ggplot Geometric objects aka Geoms Coordinate system wrt scales Log scale / sqrt / log ratio Title Plot Theme etc
  • 28. ggplot Components • Data & relationship ✔ • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting R code Rmbr to change month into a factor data.frame Aesthetics function which maps the relationships ggplot(weather, aes(x=month, y=upper))+ geom_boxplot()
  • 29. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 30. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 31. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 32. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 34. qplot A separate function which wraps ggplot, for simpler syntax R code qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", stat="identity")
  • 35. Ethos behind visualization http://keylines.com/network-visualization
  • 37. Final Challenge R code library(ggplot2) #Reads in data data = read.csv("final.csv") #Preparing for the rectangle background areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) areas=areas[order(areas$Planning_Region),] areas$rectid=1:nrow(areas) rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 0.5,xend= max(rectid)+0.5) #Order the levels data$Planning_Area=factor(data$Planning_Area, levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
  • 38. Final challenge #Plot p0 = ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ geom_boxplot(outlier.colour=NA)+ geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ scale_color_brewer("Year", palette='RdBu')+ scale_fill_brewer(palette="Set1",name='Region')+ theme_minimal()+ theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ xlab("Planning Area")+ylab("Unit Price (PSM)") R code #Save plot ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
  • 39. “Above all else show the data.” ― Edward R. Tufte, The Visual Display of Quantitative Information Thank you for your time

Notas del editor

  1. In this course we will be learning how to
  2. In this course we will be learning how to
  3. In this course we will be learning how to
  4. In this course we will be learning how to
  5. barplot(table(weather$month), col = "wheat", main = "Number of Observations in Months")
  6. In this course we will be learning how to
  7. In this course we will be learning how to
  8. In this course we will be learning how to
  9. In this course we will be learning how to
  10. In this course we will be learning how to
  11. In this course we will be learning how to title("My Title", outer=TRUE)
  12. In this course we will be learning how to
  13. ggplot(weather, aes(month, upper))+ geom_boxplot()
  14. ggplot(weather, aes(month, upper))+ geom_boxplot()
  15. ggplot(weather, aes(month, upper))+ geom_boxplot()
  16. ggplot(weather, aes(month, upper))+ geom_boxplot()
  17. ggplot(weather, aes(month, upper))+ geom_boxplot()
  18. ggplot(weather, aes(month, upper))+ geom_boxplot()
  19. In this course we will be learning how to
  20. In this course we will be learning how to