SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

Presented by Vidar Brekke,
Social Intent LLC

SOCIAL TEXT
ANALYTICS FOR
ENTERPRISE AND
CONSUMER
APPLICATIONS
The International Association of Software
Architects. October 23, 2012

@ividar #nlproc

What is Text Analytics?

Processes that uncover
business value in
A unstructured text via the
application of statistical,
B
linguistic, machine
C learning, and data analysis
and visualization
techniques

@ividar #nlproc 2

Text analytics help answer
business questions faster and
cheaper than before, uncovering
new, hidden insights!

@ividar #nlproc 3

Text analytics is a Big Data problem

Volume Velocity Variety
Hundreds of
languages
Social media,
help inquiries,
email, texts,
surveys

10.2 Million
tweets sent Cryptic (vertical
during the first Formal, inform industry or
presidential al or criminal activity)
debate ridiculously
informal

@ividar #nlproc 4

I’m So Intextuated With You

Unstructured text represents the
biggest opportunity and problem
in Big Data

Text, as opposed to most other
enterprise data, it’s very dirty
data

@ividar #nlproc 5

Correlating consumer confidence with mentions of “jobs” on
Twitter

@ividar #nlproc 6

Yay! Steve Jobs launches a new iPhone!

@ividar #nlproc 7

You can trade on Twitter

@ividar #nlproc 8

Low Signal/Noise Ratio + Naïve Metrics Lead to Wrong Conclusions

• Lack of relevance: Many conversations you think
are about you, aren’t.

• Poor accuracy: Many automated sentiment
solutions are as good as a coin flip.

• Generic: All analysis is applied the same way
across domains

• Language Evolves: Slang, sarcasm is rampant in
social media. Dictionary-based approaches are
largely ineffective.

@ividar #nlproc 9

Relevancy: It’s not all about you.

Let me finish my drink before you drive me to the
Betty Ford clinic!

Call me a bigot, but white guys can’t sprint!
#london2012

My husband is such a baby. He won’t even taste raw
food.

Is Delta’s food prepared by Purina? So much for first
class.

@ividar #nlproc 10

Search and Destroy (the data you’re looking for)

Text analytics got traction in the 80s, but the use-cases
were different than today.

“Word spotting” – not different from a Google search.

Show me all documents containing:
Ford NOT Harrison

But it doesn’t scale

@ividar #nlproc 11

Booleans are like woodcarving with a chainsaw

Query: Ford NOT Harrison ….

…would miss this tweet

Carguy231: Me and a dozen others
have lined up outside the Harrison, NY
Ford dealership to test drive the new
Fusion!

@ividar #nlproc 12

Booleans are like woodcarving with a chainsaw

Query: Ford AND Fusion….

…would get this tweet

Roadrunner123: Stuck with my dad in
his ford listening to horrible jazz fusion

@ividar #nlproc 13

Sentiment Analysis

Early sentiment analysis tools also use word spotting.

“Awesome” = good

“Sucks” = bad

What about sarcasm, slang, new words?

Additionally, the analysis is typically on overall contextual polarity, rather
than targeted.

“I love the new Camaro, it’s better than the Mustang”

@ividar #nlproc 14

You can’t use word spotting for sentiment detection

“It took all morning to sign the lease papers for my new Mustang!”

“I stood on line all morning to get the last Mustang on the lot!”

“The brakes on the Mustang are surprisingly unpredictable.”

“The TV ads for the Mustang are surprisingly unpredictable!”

“The Mustang has never been good”

“The Mustang has never been this good”

@ividar #nlproc 15

Nu-School text analytics is based on Machine Learning

Using training-data to help the system to recognize patterns. We
develop a statistical probability that a sentence is
positive, negative, etc.

What are training data?
These are samples of text annotated by humans in an effort to
show the machine what the right answer is

“I love my iPhone, but hate AT&T”

| iPhone | Positive | AT&T | Negative

Much easier and quicker to develop new languages than
dictionary based approaches

@ividar #nlproc 16

Test: What’s the sentiment here?

“Reuters reports that
Assad continues the
massacre of his own
people amid sanctions
from the international
community.”

@ividar #nlproc 17

How to evaluate a text analytics platform

The accuracy of a sentiment analysis system is, in
principle, how well it agrees with human judgments.

“I can’t believe the bar has a hidden gambling room in
the back!”

An automated system can never be better than
humans. Or can it?

@ividar #nlproc 18

Using Human Parallel Coding to Establish Gold Standards

Confusion Matrix: Human as Gold Standard

POSITIVE NEGATIVE NEUTRAL TOTAL
POSITIVE 365 24 159 548
NEGATIVE 57 81 65 203 Raw Accuracy:
61.5%
NEUTRAL 274 60 415 749
TOTAL 696 165 639 1500

If human agrees with a machine around 60% percent of the time, the
machine would be performing as well as a human being.

@ividar #nlproc 19

Using A Credit Matrix to Create Improved Measurement

POSITIVE NEGATIVE NEUTRAL
POSITIVE 100% 0% 50%
NEGATIVE 0% 100% 50% Credit Matrix
NEUTRAL 50% 50% 100%

Partial Credit Figure of Merit:
82.3%

POSITIVE NEGATIVE NEUTRAL
Confusion Matrix: POSITIVE 365 24 159
Human 1 as Gold NEGATIVE 57 81 65
Standard
NEUTRAL 274 60 415

@ividar #nlproc 20

Precision & Recall (sentiment as an example)

Precision is the fraction of retrieved instances
that are relevant
E.g. How many instances labeled as positive, were
actually positive

Recall is the fraction of relevant instances that are
retrieved
E.g. How many positive instances the system
detected compared to all positive instances.

@ividar #nlproc 21

Top business applications of text/content analytics*

*Alta Plana, 2011

• Brand / product / reputation management
• Market research and social media monitoring, i.e. what are people saying
about my brand or products

• Voice of the Customer / Customer Experience Management
• Do I need to step in and offer customer service?
• How many people recommend my brand vs. advocate against it?

• Search, Information Access, or Questions Answering
• Which bloggers are negative toward Obamacare?
• Which of the hotels on Yelp.com get great reviews for the room service?
• What are some articles similar to this one?

• Competitive intelligence
• What competing products are people considering and why
• Are competitor’s media spend generating purchase intent?

@ividar #nlproc 22

Growing areas for is text analytics being applied

Product development

Intelligence and counter-terrorism, law enforcement

Pharmaceutical drug discovery

Financial services and insurance

Media, publishing & advertising

Political research

CRM

@ividar #nlproc 23

Still awake?

There is money in text analytics.

Here’s a stock tip worth the price of admission
alone

(YMMV….)

@ividar #nlproc 24

Strange Bedfellows

Whenever Anne Hathaway's
name appeared with any
regularity in news
stories, Berkshire Hathaway A
shares rose in value.

@ividar #nlproc 25

Thx & txt u l8tr

Vidar Brekke
vidar@socialintent.com
@ividar

@ividar #nlproc

SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

Recommended

Recommended

More Related Content

Similar to SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

Similar to SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS (20)

More from Meddle

More from Meddle (10)

Recently uploaded

Recently uploaded (20)

SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

Editor's Notes