2. Consulting Goals
Data analysis support and programming services
Research project planning and guidance selecting
appropriate technology for research projects
Facilitating appropriate organization, storage and
sharing of data
Training on the use of both established software
packages and emerging tools
3. Scope
Free!
Support the entire social science
community
Consults measured in hours rather
than weeks or months
Currently doing outreach to
departments, student groups and
centers
Drop-ins on Fridays at 1pm in the
training lab, Appointments, Help
Tickets and casual chats in K306
5. Simo Goshev
BA – Sofia, Bulgaria
Applied Econometrics
MS – McMaster University
Statistics
PhD – McMaster University
Economics
Analysis: Tools:
Econometrics Mainly Stata
Applied Microeconometrics Some R
Panel Data
Applied statistics
6. Help with econometrics
What model is most suitable for my data on
hospital IT innovation?
I am looking at HIV in children. Can you help me
design an overlapping generations model?
Why are the confidence intervals of my spline of
health care spending so wide/narrow?
Could the interaction between an exogenous
and endogenous variable be exogenous?
I am looking for a way to compare survival
between two cancer management programs.
Can you help me?
7. Help with computation/estimation
I am trying to estimate a model but for
some reason the routine fails. Could you
have a look at my script ?
I am working with a large dataset and my
machine is giving up on me. Do you have
any suggestions?
Which routine is best for…?
8. Replication study in health
economics
•Graduate Student •Make sense of a study and Stata code
1
1
.8
.8
.6
.6
.4
.4
.2
.2
65 70 75 80 65 70 75 80
9. Predictors of hospital IT adoption
•Graduate Student, School of Public •Understand what factors facilitate/hinder
Health adoption of IT in US hospitals
Data:
Sample of hospitals clustered within states
Count of IT’s adopted by a hospital in 3 consecutive years
Modeling strategy:
Three-level mixed effects model
10. Alex Storer
BS,BA - UC Berkeley
Electrical Engineering & Computer
Science, Cognitive Science
PhD – Boston University
Cognitive & Neural Systems
Analysis: Tools:
Machine Learning Matlab, R, Python
Signal Processing Emacs, LaTeX, Linux
Surface Based Techniques
Simulation
Optimization
11. Text Analysis
Topic
Models
Large
corpus
Prevalenc
e of
Sentiment
certain
terms
12. Text Analysis
Twitter:
#obamacare
Positive/Ne
gative
Opinions?
13. Text Analysis
Distinct
Content
Groupings
Congress
Speeches
15. Text Analysis
Topic
Models
What models are appropriate to perform our
analysis?
What software is appropriate?
Prevalenc
e of
Sentiment
certain
terms
16. Text Analysis
Where do we obtain this corpus?
How do we pre-process it so we can analyze
it?
Large
corpus
19. Federal Procurement Database
Download atom feeds
Parse XML Tree structure
Python!
Search for union of entries
Output as CSV
For 20gb of data, there is no way to download by hand…
20. Steve Worthington
BA / MS – Durham, UK
Anthropology & Archeology
PhD – NYU
Biological Anthropology
Analysis: Tools:
Linear models (OLS, GLS, PLS, etc.) Mainly R
Resampling (permutation, bootstrap) Some SAS, SPSS
Ordination (PCA, LDA, CVA, etc.)
21. Cleaning / reshaping data
•Department of •171 files, 3 types (2 ascii •Parse messy data
Economics text, 1 binary) into a long-format Stata
•Daily Lat/Long data on •One file for each year data frame
rainfall in India (1951 – (containing 365 daily
2007) matrices)
June 21st 2007
22. Cleaning / reshaping data
• No common delimiter (spaces and tabs)
• Use regexp to parse each datum
• Use template to place each datum into correct row/column
Template
23. Cleaning / reshaping data
Long format
data frame
in Stata
Rainfall for
each day
and lat/long
26. Geospatial Analysis in R
Spatial prediction: interpolation of data points
Spatial autocorrelation analysis
Drug resistant TB
Moldova
27. Ista Zahn
BS – University of Oregon
Psychology
PhD (ABD) – University of Rochester
Social Psychology
Analysis: Tools:
Regression R, Stata, SAS, SPSS
Mixed Models Emacs, LaTeX, Linux
Scale Development