2. http://bit.ly/data-vending
DATA AND ART (PRIMER)
Providing value on the potential of bad news to serve
out a bag of salty potato chips
harnessing the power of open data and sentiment
5. • Provide value to the organization – turn data into
intelligence using an “operational lens”
• Ensure cyclical feedback occurs during
collection, processing, analysis, and consumption
• Validate that a particular network is the right
source of data for the questions you need
answered
Open Source Analysis Goals
6. Common Pitfalls
Analyzing What Instead of Why
The important thing is often not what
people are saying… but why they are
saying it.
7. Common Pitfalls
Using the Wrong Analysis Tools
Reporting tools rarely help dig into the why. Many common
tools, reports, and metrics are misleading:
– Word clouds atomize message context
– Sentiment metrics are often highly inaccurate
– Information in aggregate hides more than it reveals
11. Enron Sentiment Analysis
Caveats
Sentiment was only attributed to the sender
Not a complete representation of an organizations email
corpus
Counteraction of uneven coverage was estimated
Not a full analysis of the set of information (objective was
to use sentiment analysis as a reduction technique)
http://bit.ly/ikanow-and-r
12. Workflow
• Data Ingestion Process
– Extraction of entities, events, facts and some basic
statistics
• Aggregation and Reduction
– Aggregation of keywords with sentiment from each
email
– Average sentiment score
– Follow on aggregation by email address of the
sender over a given week (average sentiment score)
• Visualize and Analyze
– Imported into Infinit.e and R for visualization
http://bit.ly/ikanow-and-r
13. • Horizontal Bar
– Positive sentiment = Green
– Negative sentiment = Red
• Chart on Left
– Positive sentiment = Green
– Negative sentiment = Red
• Chart on Right
– Heuristic – weeks with
abrupt negative shifts
indicated problems in
organization
– Positive sentiment = Blue
– Negative sentiment = Red
One email sender’s Weekly Average Sentiment across time
Workflow
15. Individual analysis based on
the reduction of the
information by the sentiment
analysis process
Workflow
16. Findings
• Indicators and Additional Analysis
– 801 weeks highlighted out of 11,500 weeks as
important for further investigation
– Keywords found could further be used to investigate
statistically the 801 weeks highlighted for manual
review
– Individual evaluation of emails highlighted through a
reduction process (case construction)
– Pipeline created for further analysis
18. Lessons Learned
2. Multiple contexts for this type
of technique
Intelligence Analysis
E-Discovery
Brand management
Social Media Analysis
19. Lessons Learned
3. Negative shifts were only
investigated, analysis of the positivity
side for other use cases could be
applied to different questions easily
20. Lessons Learned
4. R and Infinit.e provide a
interesting technology integration
for evaluating and reducing
unstructured data
No matter what methodology you use…intelligence analysis is an iterative processYou Collect the data, Store it, Analyze it, and Distribute the end results to your organization in some usable format.
Provide value to the organization – turn data into intelligence using an “operational lens” (answer the questions your organization is asking in other words)Ensure cyclical feedback occurs during collection, processing, analysis, and consumption (learn from the process and adjust to based on what you learn, intel gathering and analysis is not a static process)Validate that a particular network is the right source of data for the questions you need answered (i.e. is Twitter the right place to look for data related to weather?)
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.
Why is someone tweeting or posting? If some checks in from a store is it really because the store is so incredible that they need to share that information or is because they are trying to form an impression about their lifestyle (i.e. image shaping)?Why is much harder than What.What you learn from data can be affected as much by the tools you use to analyze data as by what is contained in the data. Picking the right tools for the job is critical.