SlideShare una empresa de Scribd logo
1 de 30
Research Technology Consulting
Simo Goshev
Alex Storer
Steve Worthington
Ista Zahn

support@help.hmdc.harvard.edu

http://rtc.iq.harvard.edu
Consulting Goals



 Data analysis support and programming services

 Research project planning and guidance selecting
  appropriate technology for research projects

 Facilitating appropriate organization, storage and
  sharing of data

 Training on the use of both established software
  packages and emerging tools
Scope

 Free!

 Support the entire social science
  community

 Consults measured in hours rather
  than weeks or months

 Currently doing outreach to
  departments, student groups and
  centers

 Drop-ins on Fridays at 1pm in the
  training lab, Appointments, Help
  Tickets and casual chats in K306
Who We
Scope Are
Simo Goshev
                  BA – Sofia, Bulgaria
                     Applied Econometrics

                  MS – McMaster University
                     Statistics

                  PhD – McMaster University
                     Economics

Analysis:                       Tools:
   Econometrics                     Mainly Stata
   Applied Microeconometrics        Some R
   Panel Data
   Applied statistics
Help with econometrics

      What model is most suitable for my data on
       hospital IT innovation?

      I am looking at HIV in children. Can you help me
       design an overlapping generations model?

      Why are the confidence intervals of my spline of
       health care spending so wide/narrow?

      Could the interaction between an exogenous
       and endogenous variable be exogenous?

      I am looking for a way to compare survival
       between two cancer management programs.
       Can you help me?
Help with computation/estimation

 I am trying to estimate a model but for
  some reason the routine fails. Could you
  have a look at my script ?

 I am working with a large dataset and my
  machine is giving up on me. Do you have
  any suggestions?

 Which routine is best for…?
Replication study in health
      economics

•Graduate Student        •Make sense of a study and Stata code




                                               1




                                                                        1
                                                                       .8
                                              .8




                                                                       .6
                                              .6




                                                                       .4
                                              .4




                                                                       .2
                                              .2
                                                   65   70   75   80        65   70   75   80
Predictors of hospital IT adoption

•Graduate Student, School of Public     •Understand what factors facilitate/hinder
 Health                                  adoption of IT in US hospitals




 Data:
     Sample of hospitals clustered within states
     Count of IT’s adopted by a hospital in 3 consecutive years

 Modeling strategy:
     Three-level mixed effects model
Alex Storer
                    BS,BA - UC Berkeley
                         Electrical Engineering & Computer
                         Science, Cognitive Science

                    PhD – Boston University
                         Cognitive & Neural Systems


Analysis:                     Tools:
   Machine Learning               Matlab, R, Python
   Signal Processing              Emacs, LaTeX, Linux
   Surface Based Techniques
   Simulation
   Optimization
Text Analysis

                 Topic
                 Models




                 Large
                 corpus
     Prevalenc
        e of
                          Sentiment
      certain
       terms
Text Analysis




                Twitter:
                #obamacare



                             Positive/Ne
                               gative
                              Opinions?
Text Analysis

                 Distinct
                 Content
                Groupings




                Congress
                Speeches
Text Analysis




                     NY
                   Times
                  Archive

       Term:
     "Medicare"
Text Analysis

                     Topic
                     Models




   What models are appropriate to perform our
    analysis?

   What software is appropriate?


     Prevalenc
        e of
                                    Sentiment
      certain
       terms
Text Analysis
   Where do we obtain this corpus?

   How do we pre-process it so we can analyze
    it?




                    Large
                    corpus
Federal Procurement Database
Federal Procurement Database

           Only first 500 hits, only a few columns




                   All of the data, but…
Federal Procurement Database

          Download atom feeds


         Parse XML Tree structure

                                                    Python!
        Search for union of entries


              Output as CSV



For 20gb of data, there is no way to download by hand…
Steve Worthington

                    BA / MS – Durham, UK
                       Anthropology & Archeology

                    PhD – NYU
                       Biological Anthropology



Analysis:                                Tools:
   Linear models (OLS, GLS, PLS, etc.)       Mainly R
   Resampling (permutation, bootstrap)       Some SAS, SPSS
   Ordination (PCA, LDA, CVA, etc.)
Cleaning / reshaping data

•Department of               •171 files, 3 types (2 ascii    •Parse messy data
 Economics                    text, 1 binary)                 into a long-format Stata
•Daily Lat/Long data on      •One file for each year          data frame
 rainfall in India (1951 –    (containing 365 daily
 2007)                        matrices)




                                                       June 21st 2007
Cleaning / reshaping data
• No common delimiter (spaces and tabs)
• Use regexp to parse each datum
• Use template to place each datum into correct row/column

                                                     Template
Cleaning / reshaping data


 Long format
  data frame
  in Stata

 Rainfall for
  each day
  and lat/long
Rainfall / CEO movie
Rainfall / CEO movie
Geospatial Analysis in R
 Spatial prediction: interpolation of data points

 Spatial autocorrelation analysis


                                                     Drug resistant TB




     Moldova
Ista Zahn
                  BS – University of Oregon
                        Psychology

                  PhD (ABD) – University of Rochester
                       Social Psychology




Analysis:                    Tools:
   Regression                    R, Stata, SAS, SPSS
   Mixed Models                  Emacs, LaTeX, Linux
   Scale Development
Workshops
(schedule at http://rtc.iq.harvard.edu)
IQSS Services




                   THE INSTITUTE FOR
                Quantitative Social Science
                       at Harvard
                       University
Contact Us!

support@help.hmdc.harvard.edu
http://rtc.iq.harvard.edu/
CGIS-Knafel, Room K306
Fridays afternoons, K018

Más contenido relacionado

Similar a IQSS Presentation to Program in Health Policy

Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1eshtiyak
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdfAdhySugara2
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Susanna-Assunta Sansone
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxData Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxcharlslabarda
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 

Similar a IQSS Presentation to Program in Health Policy (20)

Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Cv long
Cv longCv long
Cv long
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxData Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 

IQSS Presentation to Program in Health Policy

  • 1. Research Technology Consulting Simo Goshev Alex Storer Steve Worthington Ista Zahn support@help.hmdc.harvard.edu http://rtc.iq.harvard.edu
  • 2. Consulting Goals  Data analysis support and programming services  Research project planning and guidance selecting appropriate technology for research projects  Facilitating appropriate organization, storage and sharing of data  Training on the use of both established software packages and emerging tools
  • 3. Scope  Free!  Support the entire social science community  Consults measured in hours rather than weeks or months  Currently doing outreach to departments, student groups and centers  Drop-ins on Fridays at 1pm in the training lab, Appointments, Help Tickets and casual chats in K306
  • 5. Simo Goshev  BA – Sofia, Bulgaria Applied Econometrics  MS – McMaster University Statistics  PhD – McMaster University Economics Analysis: Tools: Econometrics Mainly Stata Applied Microeconometrics Some R Panel Data Applied statistics
  • 6. Help with econometrics  What model is most suitable for my data on hospital IT innovation?  I am looking at HIV in children. Can you help me design an overlapping generations model?  Why are the confidence intervals of my spline of health care spending so wide/narrow?  Could the interaction between an exogenous and endogenous variable be exogenous?  I am looking for a way to compare survival between two cancer management programs. Can you help me?
  • 7. Help with computation/estimation  I am trying to estimate a model but for some reason the routine fails. Could you have a look at my script ?  I am working with a large dataset and my machine is giving up on me. Do you have any suggestions?  Which routine is best for…?
  • 8. Replication study in health economics •Graduate Student •Make sense of a study and Stata code 1 1 .8 .8 .6 .6 .4 .4 .2 .2 65 70 75 80 65 70 75 80
  • 9. Predictors of hospital IT adoption •Graduate Student, School of Public •Understand what factors facilitate/hinder Health adoption of IT in US hospitals  Data:  Sample of hospitals clustered within states  Count of IT’s adopted by a hospital in 3 consecutive years  Modeling strategy:  Three-level mixed effects model
  • 10. Alex Storer  BS,BA - UC Berkeley Electrical Engineering & Computer Science, Cognitive Science  PhD – Boston University Cognitive & Neural Systems Analysis: Tools: Machine Learning Matlab, R, Python Signal Processing Emacs, LaTeX, Linux Surface Based Techniques Simulation Optimization
  • 11. Text Analysis Topic Models Large corpus Prevalenc e of Sentiment certain terms
  • 12. Text Analysis Twitter: #obamacare Positive/Ne gative Opinions?
  • 13. Text Analysis Distinct Content Groupings Congress Speeches
  • 14. Text Analysis NY Times Archive Term: "Medicare"
  • 15. Text Analysis Topic Models  What models are appropriate to perform our analysis?  What software is appropriate? Prevalenc e of Sentiment certain terms
  • 16. Text Analysis  Where do we obtain this corpus?  How do we pre-process it so we can analyze it? Large corpus
  • 18. Federal Procurement Database Only first 500 hits, only a few columns All of the data, but…
  • 19. Federal Procurement Database Download atom feeds Parse XML Tree structure Python! Search for union of entries Output as CSV For 20gb of data, there is no way to download by hand…
  • 20. Steve Worthington  BA / MS – Durham, UK Anthropology & Archeology  PhD – NYU Biological Anthropology Analysis: Tools: Linear models (OLS, GLS, PLS, etc.) Mainly R Resampling (permutation, bootstrap) Some SAS, SPSS Ordination (PCA, LDA, CVA, etc.)
  • 21. Cleaning / reshaping data •Department of •171 files, 3 types (2 ascii •Parse messy data Economics text, 1 binary) into a long-format Stata •Daily Lat/Long data on •One file for each year data frame rainfall in India (1951 – (containing 365 daily 2007) matrices) June 21st 2007
  • 22. Cleaning / reshaping data • No common delimiter (spaces and tabs) • Use regexp to parse each datum • Use template to place each datum into correct row/column Template
  • 23. Cleaning / reshaping data  Long format data frame in Stata  Rainfall for each day and lat/long
  • 24. Rainfall / CEO movie
  • 25. Rainfall / CEO movie
  • 26. Geospatial Analysis in R  Spatial prediction: interpolation of data points  Spatial autocorrelation analysis Drug resistant TB Moldova
  • 27. Ista Zahn  BS – University of Oregon Psychology  PhD (ABD) – University of Rochester Social Psychology Analysis: Tools: Regression R, Stata, SAS, SPSS Mixed Models Emacs, LaTeX, Linux Scale Development
  • 29. IQSS Services THE INSTITUTE FOR Quantitative Social Science at Harvard University