2. Several Promising Environments and
Techniques
• Visual Surveys
• Audio and video interviewing
• Virtual Worlds
• Web scraping
• Network extraction and mapping
• Polls Everywhere
• Mobility
4. Visual Surveys
Provides an engaging alternative to text-based
surveys; more fun for respondents
Requires considerable set-up time; each screen
is like an item; each picture is like an item
response; every item and response must be
keyed against one or more criteria
Example: Previous page, “How do you approach
stress,” could be keyed against other subjective or
objective measures of stress, coping, general
health, immune response, etc.
5. Audio and Video Interviewing
Methods: Structured, semi-
structured, and unstructured
interviewing; focus groups
Products: Skype, WebEx, Adobe
Connect, Cisco Telepresence
Advantages: Reduced travel costs,
speed
Disadvantages: High bandwidth,
user technology requirements,
unreliable connections
7. Virtual Worlds
Combine text and audio chat with social networking and
3D model building
Methods: Structured, semi-structured, and unstructured
interviewing; focus groups; unobtrusive observation;
participant observation; possibly some experimental
perceptual, cooperative, or navigational tasks
Products: VastPark, OpenSim, EduSim, Teleplace
Advantages: Speed, low cost
Disadvantages: Steep learning curve; high bandwidth;
user technology requirements; unreliable connections
9. Web Scraping
Retrieval and processing of text or images, e.g., from
blogs; processing may include semantic analysis of
people, events, emotions
Methods: Archival document analysis
Products: 100s of commercial, mainly focused on brand,
reputation, marketing; open source product:
WebHarvest
Advantages: Data are plentiful and cover a wide range of
topics
Disadvantages: Technology hard to master; even after
considerable automated processing, analysis has an
intensive, qualitative flavor
10. Make a Wordcloud with Twitter and R
• Download R, the open source statistical platform;
for more fun, also download R-Studio; both
available for Windows, Mac, and Linux
• You will need four packages to make a word
cloud: twitteR, stringr, tm, and wordcloud
– Use install.packages() and library() commands to
prepare packages for use in R
• Code appears on the following page; explanation
is in my free eBook, Introduction to Data Science
on the iTunes Bookstore
11. # TweetFrame() - Return a dataframe based on a search of Twitter
TweetFrame<-function(searchTerm, maxTweets)
{
tweetList <- searchTwitter(searchTerm, n=maxTweets)
tweetDF<- do.call("rbind", lapply(tweetList,as.data.frame))
# This last step sorts the tweets in arrival order
return(tweetDF[order(as.integer(tweetDF$created)), ])
}
# CleanTweets() - Takes the junk out of a vector of tweet texts
CleanTweets<-function(tweets)
{
tweets <- str_replace_all(tweets," "," ")
tweets <- str_replace_all(tweets, + "http://t.co/[a-z,A-Z,0-9]{8}","")
tweets <- str_replace(tweets,"RT @[a-z,A-Z]*: ","")
tweets <- str_replace_all(tweets,"#[a-z,A-Z]*","")
tweets <- str_replace_all(tweets,"@[a-z,A-Z]*","")
return(tweets)
}
# Command line code
tweetDF <- TweetFrame(”#yourhashtag",100)
cleanText<-CleanTweets(tweetDF$text)
tweetCorpus<-Corpus(VectorSource(cleanText))
tweetTDM<-TermDocumentMatrix(tweetCorpus)
tdMatrix <- as.matrix(tweetTDM)
sortedMatrix<-sort(rowSums(tdMatrix), decreasing=TRUE)
cloudFrame<-data.frame( word=names(sortedMatrix),freq=sortedMatrix)
wordcloud(cloudFrame$word,cloudFrame$freq)
14. Mapping Social Networks
Nicholas Christakis of the Framingham Heart Study has shown
the power of social networks to influence a variety of
health outcomes
Methods: Traditional self-report & objective measures;
topographical measures such as network centrality;
“neighbor” measures
Products: Depends on data types; TouchGraph is a network
web search engine; InFlow; UCInet; See:
http://en.wikipedia.org/wiki/Social_network_analysis_soft
ware
Advantages: Meaningful improvement in predictive capability
Disadvantages: Intensive technique requires careful planning
and setup; data collection difficult and time consuming
16. Embedded Polls
Collection of short-format survey data from social
networking and membership sites
Methods: Primarily standard, closed-ended self-
report; single item scales
Products: Example: Vizu provides a “widget” that
allows embedding of polls on Facebook pages
Advantages: Quick, cheap, possible to get a large
sample in a short time
Disadvantages: Difficult to control access, short
format limits use of multi-item scales
18. Data Collection from Mobile Devices
Using smartphones and other mobile devices as a basis
for interacting with participants
Methods: Primarily self-report but can include location
and movement data
Products: Example: Survey On The Spot allows location
aware surveys to be delivered to smart phones;
TrailGuru collects route data from hikers and joggers
Advantages: Platform is becoming ubiquitous, location
data provides new options for understanding behavior
Disadvantages: Privacy issues, small screen, complex
programming interfaces
19. iPhone Fun
Reaching Mobile Participants
• Micropayment system built
into the platform
• Feasible for short
instruments
• Can be tied to particular
experiences, e.g., museum
visits
• Responses can be
geotagged to support
mapping