In this presentation, Connotate will share expertise gained from years of experience extracting data from the Web and making it usable. Connotate’s experts will explain why certain Web data sources are easy to tap into, why others aren’t – what to consider when scoping out a project.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Know Your Market - Know Your Customer: What Web Data Reveals if You Know Where and How to
1. Know Your Market – Know Your Customer:
What Web data reveals if you know where & how to look
Presenters: Christian Giaretta, VP of Sales Engineering, Connotate
Dennis Clark, Chief Strategy Officer, Luminoso
Moderator: Gina Cerami, VP of Marketing, Connotate
Date: November 1, 2012
3. Today’s Discussion
• What Web Data Reveals: The Fundamentals
• The business case
• Where to start? Best practices and the automation process
• Know Your Market
• Use cases: market transparency, digital strategy, PDF extraction
• Differences in data sources
• Know Your Customer: Part 1
• Use case: online advertising - aggregating customer response to ads
• Manual versus automated approaches
• Know Your Customer: Part 2
• Text analysis – overview of options
• Concept-based text analysis
• Use case: consumer packaged goods
• Other considerations
• Q&AQ&A
3
4. What Web Data Reveals:What Web Data Reveals:
The Fundamentals
4
5. The Business Case
news – data points – public notices
trillions of URLstrillions of URLs
online conversations
5
6. IDC Research – October 2012
• CEOs are looking at Big Data on the Web to understand
their markets and customers
• The number of sites with valuable content continues to
expand at a tremendous rate
• Factors to consider when collecting Web data
• Timeliness
• LegitimacyLegitimacy
• Aggregation
6
7. Can I Trust Web Data for Market Research???
Good question! You may have to…
factors to consider:
• It’s harder and harder to get people to answer surveys
Focus groups take time which you may not have• Focus groups take time – which you may not have
• Proprietary data sources may not answer all of your
important questionsimportant questions
• Organizations and government agencies are moving more
and more data, content and forms onto the Web
7
8. Can I Trust Web Data for Market Research???
Timely?
YES!!
Aggregate?
YES!!
Legitimate?
Uhh…S
Refresh primary
research
Expose new
YES!!
Volumes of data
reveal insights
The longer you
Uhh…
Be vigilant about
spam and bias in
Web data Expose new
trends or
questions rapidly
The longer you
retain it, the more
valuable it gets
Web data
Some sites are
better than others
8
9. Polling Question: Web Data Collection
Are you currently collecting data from the Web?Are you currently collecting data from the Web?
Yes – we are doing this using an automated process
Yes – however, we are collecting Web data using a manual process
No – we are not collecting Web data
10. Where to Start? Follow Proven Best Practices
Work with experts with deep experience evaluating
Web sources for data extraction to help youWeb sources for data extraction to help you…
• Clarify “What do you really want to do with this data?”
D id hi h it t t t• Decide which sites to target
• Identify how easy or difficult it will be to extract data from target sites
O tli th f th j t• Outline the scope of the project
• Estimate long-term maintenance costs (and how to minimize them)
10
12. An Overview of the Automation Process
Transform Deliver
• Structure
Classify
• Reports
Dashboards
Collect Data
Internal Sources
• Database
External Sources
• Social Media • Classify
• Prep for Analysis
• Dashboards
• Workflow
• BI Plug-ins
• Database
• Market Basket
• Inventory, etc.
• Social Media
• Surface Web
• Hidden Web
•Secured Sites
12
14. Know Your Market: Use Cases
Government Regulatory
Site Updates (PDFs)
Digital StrategyMarket Transparency
Site Updates (PDFs)
• Insurance coverage,
building permits, etc.
posted as PDFs can
• Paid ads, search
term rankings on
Google trended over
• Job postings, etc. on
company Web sites
may offer indicators of posted as PDFs can
reveal insight into
market trends and
product sales
Google trended over
time reveal insights
about competitors’
digital strategies
may offer indicators of
performance before
quarterly results are
reported product salesdigital strategiesreported
Automated, precise data collection
is key to success
1414
17. Building Permits Reveal Construction Activity
AP_Title Mr &Mrs
AP_Forename Samuel John
AP Surname MacNaughton
AG_RefNo
AG_Forename Sarah
PDF
AP_Surname MacNaughton
AP_CompanyName
AP_Building Orana
AP_AddressLine1 Easter Kinkell
AP_AddressLine2 Dingwall
AP_Town Ross‐Shire
AG_Surname Bryden
AG_CompanyName
AG_Building 12
AG_AddressLine1 Southside Road
AG_AddressLine2
AG_Town InvernessExcel
17
AP_Postcode IV7 8HY AG_Postcode IV2 3AU
Excel
18. Insurance Coverage Predicts Drug Sales
Drug Name Tier
/b
PDF Document Excel File
A/b otic 2
Abilify 4
Accolate 4
Accupril 4
A ti 4Accuretic 4
Accutane 4
Acebutolol HCL 2
Aceon 4 (1/2)
Acetaminophen w/ codeine 2Acetaminophen w/ codeine 2
Acetasol HC 2
Acetazolamide 2
Aciphex X
Aclovate ointment 4Aclovate ointment 4
Acticin 2
Activella 4
Actonel 4
Actoplus met 3
18
Actoplus met 3
Actos 3
19. Benefits of Using Automation to Understand
Markets and Market-Moving Eventsg
• Reduce costs associated with manual processes
• Speed up processes by doing this continually instead
of sporadically
• Improve accuracy
• Repurpose data for new uses by
converting PDFs and otherconverting PDFs and other
unstructured data into a Excel,
XML or other usable formats
19
23. Altitude Digital – Buyer Behavior in Real Time
• Push the boundaries of “Big Data” in interactive advertising
• Use Connotate to collect real-time Web dataUse Co otate to co ect ea t e eb data
• Increase clients’ ad revenues by 30% - 300%
Continually display aggregated dynamic ad exchange data• Continually display aggregated dynamic ad exchange data
• Publishers view real-time, side-by-side comparisons of online ad traffic
• They can instantaneously optimize ad placementThey can instantaneously optimize ad placement
Many of these sites are password-protected….Many of these sites are password protected….
not a problem!
23
25. Manual versus Automated Approaches
Your Data Needs To Automate or Not?
? May want to consider
Complex product-matching tasks
? May want to consider
crowd sourcing
Small amount of data, needed a few ? A manual approach may
times per year
pp y
suffice
Specific external data (under $5K/year) ? Purchase from 3rd party
High volume data monitoring Automate
Variety of sources Automate
Frequent updates and/or monitoring Automate
Need for data post-processing Automate
25
Need for data post processing Automate
26. A Closer Look at Different Approaches
Approach Considerations
Manual offshore No economies of scale; human error compromises quality.
Crowdsourcing
A viable approach for complex tasks like product matching
of apparel for one-shot projects; may be less reliable forCrowdsourcing of apparel for one shot projects; may be less reliable for
ongoing monitoring and long-term projects.
In-house or low-cost
Web scrapers
Not resilient; scrapers break when Web page HTML
changes, creating a maintenance headache; scrapers
Web scrapers
g , g ; p
may not monitor well or support scheduling.
Robust automation
installed on-premise
High degree of control; better resiliency to change but should
consider project complexity and future need to add new Web
installed on premise
sources on short notice.
Robust solution hosted
by vendor
Highest resiliency; no maintenance burden; 24/7 follow-the-
sun support; infinitely scalable and no capital expenditures
for hardware or IT resources
26
y
for hardware or IT resources.
27. Polling Question: Data Analysis
What type of data analysis tools do you use?What type of data analysis tools do you use?
Only basic tools – Excel spreadsheets, etc.
Text analysis and basic tools
Applications built in-house and basic toolspp
None
29. Text Analysis Options
Main ‘Schools’ of Text Analytics
Machine Learners
Understanding through Data
•Learn meaning through correlations
Ontologists
Understanding through Instruction
•People tell computers what words mean
Luminoso Approach
Concept-based text analysis
•Know the “Common Sense” about the world
Add ti f d t t•Add new connections from datasets
29
30. Language is Creative
It was really stuffy. It smelled terrible.
It was like it had Smells like an
been shut away
for a long time.
old house.
Smelled really musty.
Was like a wet dog.
Reminds me of
a dusty closet.Really stale.
31. Concept-based analytics has…
• Shown how reaction to product scent changesShown how reaction to product scent changes
with price point
• Determined the customer segments for a sportsDetermined the customer segments for a sports
Web site
• Discovered if customers notice unannounced
in-store policy changes
• Matched those who should connect at a largeg
enterprise software company’s user conference
32. Digital Intuition
We boil down the meaning of text into actionable,
mathematically justifiable insights.
34. Case Study: Swiffer SweeperVac
Consumer product design example:
S iff S VSwiffer SweeperVac
Idea
Use social data on Twitter to understand customer
reactions to product designp g
Result Failure. Twitter lacks depth.
Better Idea Product Reviews
34
35. Obtaining Customer Sentiment from YouTube
Manually search YouTube for <“product name”> <“review”>
Use the Connotate automation package to follow links
to individual video reviews and more results
Use Connotate to extract comment text
Feed input into analytical engine to reveal sentiment
G hi l U I t f /P t ti f I i ht
35
Graphical User Interface/Presentation of Insights
39. Another Look at the Automation Process
Connotate
Partners
Transform DeliverCollect Data
Connotate Connotate
• Classify
• Structure
• Prep for Analysis
• Reports
• Dashboards
• Workflow
Internal Sources
• Database
• Market Basket
External Sources
• Social Media
• Surface Web
Hidd W b
Prep for Analysis Workflow
• BI Plug-ins
• Inventory, etc. • Hidden Web
•Secured Sites
• Connotate provides precise quality data, structured
for delivery to your analysis and presentation tools.
• Connotate maximizes the value of your investment
in business intelligence, text analytics and semantic
analysis tools. Excel
39
40. Web Data Can Reveal Insights of
Tremendous ValueTremendous Value
Valid insights
require precise,
quality data
Automation
reduces the cost
of monitoring
Web sites for
Automation is
the key to
extracting
Web sites for
updates
Automation
k it i
e t act g
precise,
quality data
makes it easier
to collect data
for trending
40
41. Web Data Can Reveal Insights of
Tremendous ValueTremendous Value
Spot market
trends faster
Detect shifts in
Detect changes
to regulatory
sites, download
PDFs andDetect shifts in
competitor’s
digital strategy
PDFs and
extract data
Obtain new
Monitor buyer
behavior online
and in aggregate
insights into
customer
preferences
41
42. Q & A
Connotate will email a link to this presentation as well as ap
copy of the slides to you within 2 business days.
If you have an immediate need and would like us to contacty
you about a forthcoming project, please check the appropriate
box in the last polling question or call (+1) 732-296-8844.
For more information, you may also visit www.connotate.com
or www.connotate.co.uk.
42
43. Thank You
If you have an immediate need and would like us to contacty
you about a forthcoming project, please check the appropriate
box in the last polling question or call (+1) 732-296-8844.
For more information, visit
www connotate com or www connotate co ukwww.connotate.com or www.connotate.co.uk
43