3. ottawa.theodi.org
● Define problem or question
● Get the data
● Clean the data
● Explore the data
● Analyze the data
● Communicate results
Data Process
4. ottawa.theodi.org
● Define problem or question
● Get the data
● Clean the data
● Explore the data
● Analyze the data
● Communicate results
Data Process
5. ottawa.theodi.org
● Seoul Subway Data
● 2015 Canadian Federal Election
● Weather Impact on Ottawa Cycling
● Ottawa 311 Data
● Ottawa Crime Statistics
Case study code in Github: https://github.com/robscottd/OpenDataInAction
Case Studies
6. ottawa.theodi.org
Get the Data
The “Good” of getting open data:
Centralized government repositories
Multiple standard formats
The “Bad” of getting open data:
Data is too “clean”, value scrubbed out
Rate-limited/complex APIs
7. ottawa.theodi.org
Get the Data Using R
R basics: read.table, read.csv, read.csv2
Packages: readr, rvest, RSelenium, readxl, rjson
Using APIs: twitteR, httr, jsonlite
8. ottawa.theodi.org
Clean the Data
Tidy the data! Follow Hadley Wickham’s method
Address extreme outliers
Explore outliers graphically (Shiny Gadgets example)
Address missing values
Imputation through MICE, missForest, Hmisc
Or remove the incomplete observations
9. ottawa.theodi.org
Explore the Data: But why not just dive in?
No assumptions, “listen” to the data
Understand data properties
Find patterns in the data
Discover analysis strategies
Begin the visual narrative
11. ottawa.theodi.org
Communicate Results
Complete the narrative - what is the story?
Design with audience in mind
Share your process
Publish for easy access and feedback
If possible, provide link to data
14. ottawa.theodi.org
It just might not tell you what you want to hear
● Survey design is biased
● Sensors are not identically calibrated
● Merged datasets are not temporally aligned
Data Does Not Lie