The document discusses the past, present, and future of text analytics. It summarizes that the global market for text analytics grew from $250 million in 2007 to an estimated $350 million in 2008. It describes common applications like intelligence, life sciences, customer intelligence, and e-discovery. The document also outlines key trends like semantic search, question answering, sentiment analysis, and the role of text analytics in enabling the semantic web. It predicts continued strong growth in areas like life sciences, intelligence, and customer experience through analyzing unstructured text data.
2. >>Past, Present & Future He who controls the present, controls the past. He who controls the past, controls the future. -- derived from George Orwell’s 1984
3. >> The Present: Today’s Market I have estimated a $350 million global market in 2008, up 40% from $250 million in 2007. Covers software licenses, vendor provided support and professional services. $(hundreds) million more value created by: Universities and research centers, especially in the life sciences. Government, particularly for intelligence & counter-terrorism. OEM licensees, for listening platforms, e-discovery, etc. Systems integrators and consultants.
4. >> Applications Today Broadly grouped -- Intelligence and counter-terrorism. Life sciences. Content management, publishing & search. Customer & market intelligence. E-discovery. Enterprise feedback. Law enforcement. Risk, fraud, compliance, and investigation.
5. >>On the Demand Side… How do current and prospective users see the market? I recently published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.” Drawing from the findings…
14. >> Finding Business Value Why? In customer-experience initiatives, for example, “more unsolicited, unstructured data [implies] increasing use of text analytics.” -- Bruce Temkin, Forrester Research
16. Please rate your overall experience -- your satisfaction. Fern Halper of Hurwitz & Associates found in her 2009 survey, “all of the companies that had deployed text analytics stated that the implementations either met or exceeded their expectations. And, close to 60% stated that text analytics had actually exceeded expectations.” >>TextAnalytics Satisfaction
17. >> Today’s Text Analytics Players Data mining and analytics. Enterprise- and specialized-application focus. Search tools and services. Software-tool, OEM suppliers.* Text analytics pure-plays, diverse applications.* Web services. * TEMIS categories.
18. >> Today’s Text Analytics Contrast with the 1999 landscape – “The nascent field of text data mining (TDM) has the peculiar distinction of having a name and a fair amount of hype but as yet almost no practitioners.” -- Prof. Marti A. Hearst, “Untangling Text Data Mining,” 1999 (For our purposes, “text analytics” = “text mining” = “text data mining.”)
20. >> Understanding the Challenge Marti Hearst in 1999: “Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.” “[A] way to view text data mining is as a process of exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for which the answer is not currently known.” Challenges: Access, decoding, discovery, application.
21. >> In Business Terms Business intelligence (BI) as defined in 1958: “In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera... The notion of intelligence is also defined here... as ‘the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.’” -- Hans Peter Luhn, “A Business Intelligence System,” IBM Journal, October 1958
22. Document input and processing Information extraction Knowledge management H.P. Luhn, “A Business Intelligence System,” IBM Journal, October 1958
23. >>StatisticalAnalysis of Content “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.” Hans Peter Luhn, “The Automatic Creation of Literature Abstracts,” IBM Journal, April 1958
24. >>SignificancefromSemantics “This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” -- Hans Peter Luhn, 1958
25. >> Methods Technologists developed approaches to taming text: Vector-space representations. Salton, Wong & Yang, 1975, “A Vector Space Model for Automatic Indexing.” Clustering & classification algorithms. Naive Bayes. Support Vector Machine. K-nearest neighbor. Linguistic methods. Machine learning.
29. >>Technology Initiatives 2 Now and near future. Listening platforms. Bruce Temkin, Forrester Research: “The future is clearly about analyzing feedback in any form that your customers give it. That’s a trend that won’t go away.” Text visualization. We’re still coming to terms with the idea of actually extracting and exploiting the information content of rich media. Web 3.0 & the Semantic Web. Ronen Feldman, Bar-Ilan University and Hebrew University: “Text analytics [is] driving the Semantic Web” (2006).
30. >> Search, from Keywords to Intelligence Text analytics enables smarter search that better responds to user goals.
35. >>Web 3.0 & the Semantic Web “We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.” -- “The Diverse and Exploding Digital Universe,” (IDC, 2008) “The Semantic Web is a web of data, in some ways like a global database.” -- Tim Berners-Lee, 1998 Web 3.0 = Web 2.0 + the Semantic Web + semantic tools.
36. >>Web 3.0 & the Semantic Web Recurring themes: Semantically enriched -- context sensitive -- localized. Technical concepts: Linked Data -- Microformats, RDF, SPARQL – OWL. Text analytics enables Web 3.0 and the Semantic Web. Automated content categorization and classification. Text augmentation: metadata generation, content tagging. Information extraction to databases. Exploratory analysis and visualization.