Slides of the course on big data by C. Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.
-> Definition of text mining, the main categories of tools available (such as topic categorization or sentiment analysis) and their use for business.
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
A Primer on Text Mining for Business
1. MK99 – Big Data 1
Big data & cross-platform analytics
MOOC lectures Pr. Clement Levallois
2. MK99 – Big Data 2
A primer on text mining for business
•
Text mining:
computational methods to find interesting information in texts
•
Quasi synonyms:
–
natural language processing (abbreviated in NLP)
–
computational linguistics (name of a scientific discipline)
3. MK99 – Big Data 3
Text… what kinds?
•
Books
•
Tweets
•
Product reviews on Amazon
•
LinkedIn profiles
•
The whole Wikipedia
•
Free text answers in the results of a survey
•
Tenders, contracts, laws, …
•
Print and online media
•
Archival material
•
…
4. MK99 – Big Data 4
What can be done?
•
Sentiment analysis
–
Is this piece of text of a positive or negative tone?
•
Topic modeling / topic detection
–
What is the main theme of this 20-page booklet?
•
Semantic disambiguation
–
“Paris” is mentioned in this text. Is this Paris Hilton or Paris, France?
•
Named Entity Recognition (NER)
–
Automatically find the individuals, organizations and events named in the text, and the relations between them.
•
Semantic enrichment
–
If you searched Google for “TV”, results for “television” will also show up
•
Language detection
–
“Ich spreche Deutsch” -> this sentence is written in German
•
Automatic Translation
–
See Google Translate
•Summarizing
–Shortening a text while keeping its core message intact
•Spelling correction
–Well, that’s easy
•Topic Classification
–Is this email a spam or not?
5. MK99 – Big Data 5
Amaze me!
•
Demo on sentiment analysis
With a tool by Stanford: http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
•
Demo on semantic disambiguation
With a tool by a collaborative effort: http://dbpedia-spotlight.github.io/demo/
(click on “annotate”, and also change the text for one of your own)
6. MK99 – Big Data 6
What can’t be done yet (but is actively researched)
•
Detection of irony
•
Robust translation
•
Reasoning beyond Q&A
What makes things harder
•
Non English texts
•
Slang and colloquial speech-forms
•
Real time processing
7. MK99 – Big Data 7
Example of routine operations when working with text (or, how to follow the most basic conversation in comput. linguistics)
•
Stemming
–
“liked” and “like” will be reduced to their stem “lik” to facilitate further operations
•
Lemmatizing
–
Grouping “liked”, “like” and “likes” to count them as one basic semantic unit
•
Part-of-Speech tagging (aka POS tagging)
–
Automatically detecting the grammatical function of the terms used in a sentence, to facilitate translation or else
•
“Starting the text analysis with a bag-of-words model”
–
Operation which consists in just listing and counting all different words in the text.
•
N-grams
–
The text “I am Dutch” is made of 3 words: I, am, Dutch. But it can also be interesting to look at bigrams in the text: “I am”, “am Dutch”. Or trigrams: “I am Dutch”.
–
When neighboring words are considered together just like we did, they are called n-grams. This can reveal interesting things about frequent expressions used in the text.
–
A good example of how useful this can be: visit the Ngram Viewer by Google: https://books.google.com/ngrams
8. MK99 – Big Data 8
Chief benefit: Getting to know individuals better
•
Without text mining, we have access to “external”, “cold” states of the individual
–
Behavior (eg, clicks), external attributes (address, gender, encyclopedia entry), social networks (but relatively cold ones.)
•
With text mining, we have access to “internal”, “hot” states:
- opinions - intentions - preferences - degree of consensus - social networks (who mentions whom: how, in which context) - implicit attributes of the speaker
9. MK99 – Big Data 9
How easy is it?
•
Too easy… the limit is legal and ethical, not technical
“Predicting the Political Alignment of Twitter Users” by Conover et al. (2011).
http://cnets.indiana.edu/wp-content/uploads/conover_prediction_socialcom_pdfexpress_ok_version.pdf
“Political Tendency Identification in Twitter using Sentiment Analysis Techniques”
by Pla and Hurtado (2014). http://anthology.aclweb.org/C/C14/C14-1019.pdf
“Private traits and attributes are predictable from digital records of human behavior”
by Kosinski et al. (2013). http://www.pnas.org/content/110/15/5802.abstract
(and this gets even more powerful when mixing text mining, network analysis and machine learning)
10. MK99 – Big Data 10
What use for text mining in a business context?
1.
Client facing
2.
Business management
3.
Business development
11. MK99 – Big Data 11
1. Market facing activities
•
Refined scoring: propensity scores (including churn), scoring of prospects
•Refined individualization of campaigns
–ads, email campaigns, coupons, etc.
•Better community management
–Getting a clear and precise picture of how customers and prospects perceive, talk about, and engage with your brand / product / industry.
12. MK99 – Big Data 12
2. Business Management
•
Organizational mapping
–
Getting a view of the organization through text flows.
–
Example: getting a view on the activity of a business school through a map of its scientific publications.
•
HRM
–
Finding talents in niche industries, based on the mining of their profiles
•
Marketing research
–
refined segmentation + targeting + positioning, measuring customer satisfaction, perceptual mapping.
13. MK99 – Big Data 13
3. Business development
•
Developing adjunct services
–
product recommendation systems (eg, Amazon’s)
–
detection and matching of needs (eg, detection of complaints / mood changes)
–
product enhancements (eg, content enrichment through localization/personalization)
•
Developing new products entirely, based on
–
different search engines
–
alert systems / automated systems based on monitoring textual input
–
knowledge databases
–
new forms of content curation / high value info creation + delivery
14. MK99 – Big Data 14
Interesting players
through their “Data Services” package
+ many APIs listed on www.programmableweb.com
15. MK99 – Big Data 15
This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)
Contact Clement Levallois (levallois [at] em-lyon.com) for more information.