1. Air Quality in Taiwan 2013
By Tony Cheng
typhoon.tony2002@gmail.com
NYC Data Science Academy
Student Demo day 11-19-2014
R-002 Taiwan Open Data and Data Science
2. Explore
Using history data to find the pattern of air pollution in different
city
Data sources
EPA Taiwan air quality history data 1987-2013
▪ Hourly data form 79 monitoring stations in Taiwan
▪ Variables contained 11 types of air pollutant (e.g. SO2, NO2, Ozone, PM10…)
and weather monitoring data (e.g. temperature, wind, rainfall…)
Parameters
Data in 2013
Select Stations
▪ Zhongshan (Taipei City), Xitun (Taichung City), Xiaogang (Kaohsiung City)
and Hualien (Hualien County)
3. Process
Clean data and make data frame
Find the characteristic of each pollutant in different cities and
time period
Data visualization
Packages
reshape2
ggplot2
lattice
plyr
scale
shiny
4. Transfer original data from xls file to csv file by
manual before using R.
Read various files automatically by using “dir()”
and “for…loop”.
Before
After
library(reshape2)
## Get the list of csv file.
filelist = dir()
file_i = length(filelist)
data_l = data.frame()
## Read csv file and Change the data format.
for (i in 1:file_i){
data = read.csv(filelist[i], head=TRUE)
names(data) = c("Date", "STN", "Type", paste(0:23, sep=""))
data$Date = as.Date(data$Date) # Change date from string to date format.
# Change dataframe to long.
tmp = melt(data=data,
id=c('STN','Date', 'Type'))
# Combine all data from each station.
if (i == 1){
data_l = tmp
} else {
data_l = rbind(data_l,tmp)
rm(tmp)
}
}
data_l$value = as.numeric(data_l$value) # Change value into numeric data.
names(data_l) = c('STN','Date','Type','Time','value')
data_w = dcast(data_l,STN+Date+Time~Type,value='value',fun=mean)
# Get the year, month, day and weekdays.
data_w$year = format(data_w$Date, "%y")
data_w$month = format(data_w$Date, "%m")
data_w$days = format(data_w$Date, "%d")
data_w$weekdays = weekdays(data_w$Date)
* STN = Station name in Chinese
6. library(ggplot2)
library(reshape2)
# Set Color Table
colortablb = c("#99FFFF", "#00FFFF", "#00FF00", "#CCFF33", "#FFFF00",
"#FFCC00", "#FF6600", "#FF3333", "#FF33CC", "#660033")
# Cut data into ten part
drawSTN = "中山"
Time_of_Day = data_w$Time[data_w$STN==drawSTN]
mag = cut_number(data_w$NO2[data_w$STN==drawSTN], n = 10)
rosedata = data.frame(dir=Time_of_Day,mag=mag)
# Plot rose chart
p <- ggplot(rosedata,aes(x=Time_of_Day,fill=mag))+ geom_bar()+ coord_polar() +
ggtitle("Air Pollutant during A Day")+ scale_fill_manual(values=colortablb)
print(p)
High relation between nitrogen dioxide (NO2)
and human daily activity. The air quality is good
at midnight and bad at rush hour.
Zhongshan (中山)
Xitun (西屯)
7. # Date of Holiday
data_w$holiday = (data_w$weekdays == "星期六" | data_w$weekdays == "星期日")
# List of national holiday
Holidaylist = c("2013-01-01","2013-02-11","2013-02-12","2013-02-13",
"2013-02-14“, "2013-02-15","2013-02-28","2013-04-04","2013-04-05”,
"2013-06-12“, "2013-09-19","2013-09-20","2013-10-10")
Holidaylist = as.Date(Holidaylist, '%Y-%m-%d')
for (i in 1:length(Holidaylist)){
data_w$holiday[data_w$Date==Holidaylist[i]] = TRUE
}
National holiday in 2013
01/01, 02/11~15, 02/28, 04/04~05, 06/12,
09/19~20, 10/10
The concentration of nitrogen
dioxide on weekday is much higher
than holiday except for the
midnight of holiday.
8. Zhongshan (中山) Hualien (花蓮)
Xitun (西屯) Xiaogang (小港)
NO2
Nitrogen dioxide (NO2) at Zhongshan is the
most terrible, rush hour on weekdays
especially!
Avoid exercising outside between 7-9 a.m. and
5-10 p.m. in most of cities.
# Rescale all data to 0 and 1
HPtmp_r = HPtmp[HPtmp$Type==“NO2”, c(1,3,4)]
HPtmp_r$rescale = rescale(HPtmp$value[HPtmp$Type==“NO2”], to=c(0,1))
p = ggplot(HPtmp_r[HPtmp_r$STN == “中山”,], aes(variable, weekdays)) +
geom_tile(aes(fill = rescale),colour = "white")+
scale_fill_gradient(low = "cyan", high = "firebrick4",limits=c(0,1))+
xlab("Time")+ylab("Weekday")+
theme(axis.text = element_text(size=16))+
theme(axis.title = element_text(size=20))
print(p)
9.
10. Using the others stations and data in previous years for
detailed analysis.
Relation between air quality and health data, sale volume of
air cleaner or BBQ on Moon Festival?
Notas del editor
Today I will talk about the air quality. In recent years, the air pollution is more and more terrible, and more and more news report about the haze pollution in China.
So I would like to know that how Taiwan’s air quality is.
The history air quality data from Taiwan’s Environmental Protection Administration is used. It contains 11 types of pollutant and weather data, which are from 79 monitoring station.
I select 4 stations in this project, 中山 is in downtown of Taipei City, 西屯 is in the Taichung City ,小港 is near the industrial zone of Kaohsiung City , and 花蓮 is a small city in eastern Taiwan.
Here is the process in my project and the packages we used.
We cleaned the data and made a data frame at fist.
And then we used the data frame we made to find the characteristic of each air pollutant, such as in different cities and different time period.
After finding out the pattern of pollutant, use ggplot2, lattice and shiny for data visualization.
The dataset format from EPA website is xls file, so we use other software transfer the data to csv file before using R.
In order to read files automatically and make data frame, here we use “dir” to get the file name, and “for…loop” to read all csv file.
This 4 figures show 4 major air pollutants density distribution in different stations.
We find that the concentration of PM10, O3 and NO2 in 花蓮 are lower than other station.
中山 has more NO2 than others
Air quality in 小港 is most terrible, SO2 is much higher then other stations.