3. Agenda
Who is Chris
The problem
What is text analytics
Why use it
How text analytics evolved
Use cases
The FUTURE
Who is Pingar?
4. Who is Chris
Just learned what Cricket is!
VP Marketing @ Pingar
Author in the area of
Content Management
Twitter: @HoardingInfo
5. Unstructured
Data Problem
Unstructured content makes up 80% of all digital
content *
The value of unstructured content diminishes
exponentially after it is published
Metadata is key to making any use of a document
after it is published
ref: AIIM.org 2012
6. Why use it?
Without metadata the time spent in producing
content is lost, and the content posses a risk for the
organization
Extracting metadata without text analytics is a
manual process, which is expensive and prone to
human error and inconsistency
7. What is Text Analytics
Technology that extracts value from unstructured
content
Turns documents into Keywords and Entities -
Metadata
Transforms unstructured to transactional
8. Evolution of text analytics
Started appearing around 2003
Initial engines were statistical
Accurate but lots of work
Modern engines use machine learning
Power of disambiguation & Linked Data
Several general purpose engines but mostly vertical
solutions
10. Use Cases
Content Migration and Discovery
Content Classification and Organization
Internal Content Publishing
11. Content Migration &
Discovery - Problem
A large oil and gas company in the US was
recently sued and lost ($ millions ). Due to poor
content control, documents left the
organization that should not have.
So the company decided to implement an ECM
system. But 90% of the organizations content
is stored in a File Share, the “Z” drive and no
one knows what is there.
In order to move to ECM they need to quickly
analyze the file share to isolate relevant
content, and remove that which is not
relevant. Also to prepare for migration to ECM.
12. Content Migration &
Discovery - Solution
Analyze the file share to produce a list of content by type and
relationships to other content.
Determine what content is relevant, what content should be
removed, and build an information architecture for a proper
ECM platform.
Visualize the content based on location, people, etc. to help gain
insight and make decisions how to deal with the content to avoid
future litigation.
13. Content Migration &
Discovery - Result
• New ECM system with relevant content only
• Purged non-relevant content
• Better control which means less legal risk
• Ability to make better business decisions
14. Content Classification &
Organization - Problem
One of the US’s largest commercial banks
produces regular collateral and promotional
materials. Because the resulting scripts and
media files are poorly organized they are
finding they are duplicating effort on future
campaigns and losing valuable and
expensive content.
They need to improve organization of these
assets, and cross pollination of information.
15. Content Classification &
Organization - Solution
Build a hierarchy of content, a taxonomy to be used to file content. As
content is saved to the rich media content repository have it
automatically filed according to the taxonomy.
Automatically generate search filters so navigation of the content is
more efficient, and fewer documents are missed by the team.
16. Content Classification &
Organization - Result
• Users spend 50% less time finding content
• Content is now organized by topic automatically
• Save $750,000 a year in duplicated effort
• Improve idea sharing
17. Internal Content
Publishing - Problem
One of the worlds largest chemical
manufactures has many R&D departments. As
new chemicals are invented scientists publish
documents discussing the intellectual property
of these inventions. The articles are to be
published to other scientist so they can use the
knowledge to further their research and
development.
The system for publishing this content is manual
and costly. A high paid chemical scientist has to
manually tag and summarize articles before
they are saved to a content management
system. Scientist have to “search” for content
they might find interesting, but they don’t
always know what to look for. This is costly,
prone to human error, and information is lost.
18. Internal Content
Publishing - Solution
Automatically tag, classify, and summarize content as
it’s being published by scientists.
Generate emails with summaries and links to articles.
Send the emails to scientists based on their profile,
showing only content that is relevant to them.
19. Internal Content
Publishing - Result
• 70% cost reduction in publishing process
• Content is published 150 x faster
• Scientist no longer have to search, content is pushed to
them
• The content auditors can focus on other responsibilities
20. Text Analytics is increasing the
value of unstructured content,
reducing risk, and making
organizations more efficient
21. The future
Text Analytics will be a mandatory for all organizations doing
unified information access
Machine Learning Engines take over
BigData and BigContent join forces
The need for Language Scientist and Data Scientist increases
Buzz Words: Unified Information, Content Intelligence,
BigContent
22. Who Is
• The Text Analytics Subject
Matter Experts
• Helping you make money
with a Text Analytics practice