SlideShare una empresa de Scribd logo
1 de 19
Future Data Collection

CARMA Internet Research Module
         Jeff Stanton
Several Promising Environments and
                Techniques
•   Visual Surveys
•   Audio and video interviewing
•   Virtual Worlds
•   Web scraping
•   Network extraction and mapping
•   Polls Everywhere
•   Mobility
Visual Surveys
Visual DNA: http://www.visualdna.com/
Also try: http://www.youniverse.com/
Visual Surveys
Provides an engaging alternative to text-based
  surveys; more fun for respondents
Requires considerable set-up time; each screen
  is like an item; each picture is like an item
  response; every item and response must be
  keyed against one or more criteria
  Example: Previous page, “How do you approach
    stress,” could be keyed against other subjective or
    objective measures of stress, coping, general
    health, immune response, etc.
Audio and Video Interviewing
Methods: Structured, semi-
  structured, and unstructured
  interviewing; focus groups
Products: Skype, WebEx, Adobe
  Connect, Cisco Telepresence
Advantages: Reduced travel costs,
  speed
Disadvantages: High bandwidth,
  user technology requirements,
  unreliable connections
Virtual Worlds
VastPark: http://www.vastpark.com/
Virtual Worlds
Combine text and audio chat with social networking and
  3D model building
Methods: Structured, semi-structured, and unstructured
  interviewing; focus groups; unobtrusive observation;
  participant observation; possibly some experimental
  perceptual, cooperative, or navigational tasks
Products: VastPark, OpenSim, EduSim, Teleplace
Advantages: Speed, low cost
Disadvantages: Steep learning curve; high bandwidth;
  user technology requirements; unreliable connections
Web Scraping
Web Scraping
Retrieval and processing of text or images, e.g., from
  blogs; processing may include semantic analysis of
  people, events, emotions
Methods: Archival document analysis
Products: 100s of commercial, mainly focused on brand,
  reputation, marketing; open source product:
  WebHarvest
Advantages: Data are plentiful and cover a wide range of
  topics
Disadvantages: Technology hard to master; even after
  considerable automated processing, analysis has an
  intensive, qualitative flavor
Make a Wordcloud with Twitter and R
• Download R, the open source statistical platform;
  for more fun, also download R-Studio; both
  available for Windows, Mac, and Linux
• You will need four packages to make a word
  cloud: twitteR, stringr, tm, and wordcloud
  – Use install.packages() and library() commands to
    prepare packages for use in R
• Code appears on the following page; explanation
  is in my free eBook, Introduction to Data Science
  on the iTunes Bookstore
# TweetFrame() - Return a dataframe based on a search of Twitter
TweetFrame<-function(searchTerm, maxTweets)
{
  tweetList <- searchTwitter(searchTerm, n=maxTweets)
  tweetDF<- do.call("rbind", lapply(tweetList,as.data.frame))

 # This last step sorts the tweets in arrival order
 return(tweetDF[order(as.integer(tweetDF$created)), ])
}
# CleanTweets() - Takes the junk out of a vector of tweet texts
CleanTweets<-function(tweets)
{
  tweets <- str_replace_all(tweets," "," ")
  tweets <- str_replace_all(tweets, + "http://t.co/[a-z,A-Z,0-9]{8}","")
  tweets <- str_replace(tweets,"RT @[a-z,A-Z]*: ","")
  tweets <- str_replace_all(tweets,"#[a-z,A-Z]*","")
  tweets <- str_replace_all(tweets,"@[a-z,A-Z]*","")
  return(tweets)
}

# Command line code
tweetDF <- TweetFrame(”#yourhashtag",100)
cleanText<-CleanTweets(tweetDF$text)
tweetCorpus<-Corpus(VectorSource(cleanText))
tweetTDM<-TermDocumentMatrix(tweetCorpus)
tdMatrix <- as.matrix(tweetTDM)
sortedMatrix<-sort(rowSums(tdMatrix), decreasing=TRUE)
cloudFrame<-data.frame( word=names(sortedMatrix),freq=sortedMatrix)
wordcloud(cloudFrame$word,cloudFrame$freq)
Example Wordcloud: Hashtag “#solar”
Network Mapping
Mapping Social Networks
Nicholas Christakis of the Framingham Heart Study has shown
   the power of social networks to influence a variety of
   health outcomes
Methods: Traditional self-report & objective measures;
   topographical measures such as network centrality;
   “neighbor” measures
Products: Depends on data types; TouchGraph is a network
   web search engine; InFlow; UCInet; See:
   http://en.wikipedia.org/wiki/Social_network_analysis_soft
   ware
Advantages: Meaningful improvement in predictive capability
Disadvantages: Intensive technique requires careful planning
   and setup; data collection difficult and time consuming
Facebook Polls
Embedded Polls
Collection of short-format survey data from social
  networking and membership sites
Methods: Primarily standard, closed-ended self-
  report; single item scales
Products: Example: Vizu provides a “widget” that
  allows embedding of polls on Facebook pages
Advantages: Quick, cheap, possible to get a large
  sample in a short time
Disadvantages: Difficult to control access, short
  format limits use of multi-item scales
Mobility
http://www.surveyonthespot.com/
Data Collection from Mobile Devices
Using smartphones and other mobile devices as a basis
  for interacting with participants
Methods: Primarily self-report but can include location
  and movement data
Products: Example: Survey On The Spot allows location
  aware surveys to be delivered to smart phones;
  TrailGuru collects route data from hikers and joggers
Advantages: Platform is becoming ubiquitous, location
  data provides new options for understanding behavior
Disadvantages: Privacy issues, small screen, complex
  programming interfaces
iPhone Fun
Reaching Mobile Participants
• Micropayment system built
  into the platform
• Feasible for short
  instruments
• Can be tied to particular
  experiences, e.g., museum
  visits
• Responses can be
  geotagged to support
  mapping

Más contenido relacionado

Similar a Carma internet research module: Future data collection

Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
Noah Flower
 
Overview Of Network Mapping And Analysis Platforms
Overview Of Network Mapping And Analysis PlatformsOverview Of Network Mapping And Analysis Platforms
Overview Of Network Mapping And Analysis Platforms
dianascearce
 
Cloud computing higdon_scott_conway
Cloud computing higdon_scott_conwayCloud computing higdon_scott_conway
Cloud computing higdon_scott_conway
Prerna Agarwal
 

Similar a Carma internet research module: Future data collection (20)

Remote Working in a 2.0 World
Remote Working in a 2.0 WorldRemote Working in a 2.0 World
Remote Working in a 2.0 World
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
 
nonprof2007.ppt
nonprof2007.pptnonprof2007.ppt
nonprof2007.ppt
 
Chapter 7)
Chapter 7)Chapter 7)
Chapter 7)
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
 
Overview Of Network Mapping And Analysis Platforms
Overview Of Network Mapping And Analysis PlatformsOverview Of Network Mapping And Analysis Platforms
Overview Of Network Mapping And Analysis Platforms
 
2008 10 21 Top Ten Tech Tools Agents E Xtension
2008 10 21 Top Ten Tech Tools Agents E Xtension2008 10 21 Top Ten Tech Tools Agents E Xtension
2008 10 21 Top Ten Tech Tools Agents E Xtension
 
Remote Workers
Remote WorkersRemote Workers
Remote Workers
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
Web Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide WebWeb Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide Web
 
Cloud computing higdon_scott_conway
Cloud computing higdon_scott_conwayCloud computing higdon_scott_conway
Cloud computing higdon_scott_conway
 
Digital Practices - introductions
Digital Practices - introductionsDigital Practices - introductions
Digital Practices - introductions
 
E Learning Management System By Tuhin Roy Using PHP
E Learning Management System By Tuhin Roy Using PHPE Learning Management System By Tuhin Roy Using PHP
E Learning Management System By Tuhin Roy Using PHP
 
29.4 mb
29.4 mb29.4 mb
29.4 mb
 
29.4 Mb
29.4 Mb29.4 Mb
29.4 Mb
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
ACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web DesignACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web Design
 
Remote usability testing and remote user research for usability
Remote usability testing and remote user research for usabilityRemote usability testing and remote user research for usability
Remote usability testing and remote user research for usability
 
KISD Board Presentation November 18 2008
KISD Board Presentation November 18 2008KISD Board Presentation November 18 2008
KISD Board Presentation November 18 2008
 

Más de Syracuse University

Carma internet research module scale development
Carma internet research module   scale developmentCarma internet research module   scale development
Carma internet research module scale development
Syracuse University
 
Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)
Syracuse University
 

Más de Syracuse University (20)

Discovery informaticsstanton
Discovery informaticsstantonDiscovery informaticsstanton
Discovery informaticsstanton
 
Basic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University FacultyBasic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University Faculty
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Chapter9 r studio2
Chapter9 r studio2Chapter9 r studio2
Chapter9 r studio2
 
Basic Overview of Data Mining
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data Mining
 
Strategic planning
Strategic planningStrategic planning
Strategic planning
 
Carma internet research module scale development
Carma internet research module   scale developmentCarma internet research module   scale development
Carma internet research module scale development
 
Carma internet research module getting started with question pro
Carma internet research module   getting started with question proCarma internet research module   getting started with question pro
Carma internet research module getting started with question pro
 
Carma internet research module visual design issues
Carma internet research module   visual design issuesCarma internet research module   visual design issues
Carma internet research module visual design issues
 
Siop impact of social media
Siop impact of social mediaSiop impact of social media
Siop impact of social media
 
Basic Graphics with R
Basic Graphics with RBasic Graphics with R
Basic Graphics with R
 
R-Studio Vs. Rcmdr
R-Studio Vs. RcmdrR-Studio Vs. Rcmdr
R-Studio Vs. Rcmdr
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics Course
 
Installing R and R-Studio
Installing R and R-StudioInstalling R and R-Studio
Installing R and R-Studio
 
Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Reducing Response Burden
Reducing Response BurdenReducing Response Burden
Reducing Response Burden
 
PACIS Survey Workshop
PACIS Survey WorkshopPACIS Survey Workshop
PACIS Survey Workshop
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internet
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Carma internet research module: Future data collection

  • 1. Future Data Collection CARMA Internet Research Module Jeff Stanton
  • 2. Several Promising Environments and Techniques • Visual Surveys • Audio and video interviewing • Virtual Worlds • Web scraping • Network extraction and mapping • Polls Everywhere • Mobility
  • 3. Visual Surveys Visual DNA: http://www.visualdna.com/ Also try: http://www.youniverse.com/
  • 4. Visual Surveys Provides an engaging alternative to text-based surveys; more fun for respondents Requires considerable set-up time; each screen is like an item; each picture is like an item response; every item and response must be keyed against one or more criteria Example: Previous page, “How do you approach stress,” could be keyed against other subjective or objective measures of stress, coping, general health, immune response, etc.
  • 5. Audio and Video Interviewing Methods: Structured, semi- structured, and unstructured interviewing; focus groups Products: Skype, WebEx, Adobe Connect, Cisco Telepresence Advantages: Reduced travel costs, speed Disadvantages: High bandwidth, user technology requirements, unreliable connections
  • 7. Virtual Worlds Combine text and audio chat with social networking and 3D model building Methods: Structured, semi-structured, and unstructured interviewing; focus groups; unobtrusive observation; participant observation; possibly some experimental perceptual, cooperative, or navigational tasks Products: VastPark, OpenSim, EduSim, Teleplace Advantages: Speed, low cost Disadvantages: Steep learning curve; high bandwidth; user technology requirements; unreliable connections
  • 9. Web Scraping Retrieval and processing of text or images, e.g., from blogs; processing may include semantic analysis of people, events, emotions Methods: Archival document analysis Products: 100s of commercial, mainly focused on brand, reputation, marketing; open source product: WebHarvest Advantages: Data are plentiful and cover a wide range of topics Disadvantages: Technology hard to master; even after considerable automated processing, analysis has an intensive, qualitative flavor
  • 10. Make a Wordcloud with Twitter and R • Download R, the open source statistical platform; for more fun, also download R-Studio; both available for Windows, Mac, and Linux • You will need four packages to make a word cloud: twitteR, stringr, tm, and wordcloud – Use install.packages() and library() commands to prepare packages for use in R • Code appears on the following page; explanation is in my free eBook, Introduction to Data Science on the iTunes Bookstore
  • 11. # TweetFrame() - Return a dataframe based on a search of Twitter TweetFrame<-function(searchTerm, maxTweets) { tweetList <- searchTwitter(searchTerm, n=maxTweets) tweetDF<- do.call("rbind", lapply(tweetList,as.data.frame)) # This last step sorts the tweets in arrival order return(tweetDF[order(as.integer(tweetDF$created)), ]) } # CleanTweets() - Takes the junk out of a vector of tweet texts CleanTweets<-function(tweets) { tweets <- str_replace_all(tweets," "," ") tweets <- str_replace_all(tweets, + "http://t.co/[a-z,A-Z,0-9]{8}","") tweets <- str_replace(tweets,"RT @[a-z,A-Z]*: ","") tweets <- str_replace_all(tweets,"#[a-z,A-Z]*","") tweets <- str_replace_all(tweets,"@[a-z,A-Z]*","") return(tweets) } # Command line code tweetDF <- TweetFrame(”#yourhashtag",100) cleanText<-CleanTweets(tweetDF$text) tweetCorpus<-Corpus(VectorSource(cleanText)) tweetTDM<-TermDocumentMatrix(tweetCorpus) tdMatrix <- as.matrix(tweetTDM) sortedMatrix<-sort(rowSums(tdMatrix), decreasing=TRUE) cloudFrame<-data.frame( word=names(sortedMatrix),freq=sortedMatrix) wordcloud(cloudFrame$word,cloudFrame$freq)
  • 14. Mapping Social Networks Nicholas Christakis of the Framingham Heart Study has shown the power of social networks to influence a variety of health outcomes Methods: Traditional self-report & objective measures; topographical measures such as network centrality; “neighbor” measures Products: Depends on data types; TouchGraph is a network web search engine; InFlow; UCInet; See: http://en.wikipedia.org/wiki/Social_network_analysis_soft ware Advantages: Meaningful improvement in predictive capability Disadvantages: Intensive technique requires careful planning and setup; data collection difficult and time consuming
  • 16. Embedded Polls Collection of short-format survey data from social networking and membership sites Methods: Primarily standard, closed-ended self- report; single item scales Products: Example: Vizu provides a “widget” that allows embedding of polls on Facebook pages Advantages: Quick, cheap, possible to get a large sample in a short time Disadvantages: Difficult to control access, short format limits use of multi-item scales
  • 18. Data Collection from Mobile Devices Using smartphones and other mobile devices as a basis for interacting with participants Methods: Primarily self-report but can include location and movement data Products: Example: Survey On The Spot allows location aware surveys to be delivered to smart phones; TrailGuru collects route data from hikers and joggers Advantages: Platform is becoming ubiquitous, location data provides new options for understanding behavior Disadvantages: Privacy issues, small screen, complex programming interfaces
  • 19. iPhone Fun Reaching Mobile Participants • Micropayment system built into the platform • Feasible for short instruments • Can be tied to particular experiences, e.g., museum visits • Responses can be geotagged to support mapping