SlideShare una empresa de Scribd logo
1 de 22
.consulting .solutions .partnership
Text Analysis with SAP HANA
Text Analysis with SAP HANA
2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
3Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
Why do we need Text Analysis?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4
• According to Merril Lynch 80-90% of all potentially usable business information may originate in
unstructured form
(Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• The data might origin from:
 Social Networks
 “Letters” from Customer
 ...
• What is the problem with unstructured data?
• It is unstructured!
 Not organized
 No pre-defined data model
 No metadata or mix of data and metadata
 We have a lot of information that is relevant for the business but we cannot access it 
Text Analysis with SAP HANA
How can we solve that issue?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:
 Parsing of the text
 Adding features like linguistic information
 Entity recognition: Is it an organization or a person or a place including domain facts like
requests?
 Sentiment analysis: What attitudinal information is “hidden” in the text?
 Insertion of information to database in structured manner
Text Analysis with SAP HANA
6Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
What has this to do with SAP HANA?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7
© SAP SE
Text Analysis with SAP HANA
Fulltext Index - Basics
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8
• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)
• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
Text Analysis with SAP HANA
Entity Extraction
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9
• In order to get valuable information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
 EXTRACTION_CORE
 EXTRACTION_CORE_ENTERPRISE
 EXTRACTION_CORE_PUBLIC_SECTOR
 EXTRACTION_CORE_VOICEOFCUSTOMER
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
Text Analysis with SAP HANA
11Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21
Text Analysis with SAP HANA
Custom Dictionary
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12
• In several use cases you need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE
Text Analysis with HANA – Workflow of Enhancement
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
Text Analysis with HANA – What’s next?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples:
 Did somebody attend a CrossFit training?
 Does somebody want to join a CrossFit box?
Text Analysis with HANA – Text Analysis Extraction Rules
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or
token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:
 Extract complex facts based on relations between entities and predicates.
 Identify entities in domain-specific language and capture facts expressed in new, popular
“slang”
Text Analysis with HANA – Text Analysis Extraction Rules
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17
Extraction Rule
Regular ExpressionsTokens
Luck Dictionaries
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
Text Analysis with HANA – “Lessons Learned”
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21
• Text Analysis on SAP HANA is extremely powerful
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the
entities and facts that you need
• This also means you have a lot of options that you can set the wrong way 
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• The documentation is mostly ok but has room for improvement in case of extraction rules
• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell
 No support in IDE 
 You can usually activate all objects, create the index … but the index remains empty 
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22
Q&A
.consulting .solutions .partnership
Dr. Christian Lechner
Principal IT Consultant
+49 (0) 171 7617190
christian.lechner@msg-systems.com
http://scn.sap.com/people/christian.lechner
@lechnerc77
Text Analysis with HANA – Ressources
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24
• SAP HANA Search Developer Guide (Fulltext Index Options)
help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide:
help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide:
help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:
help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:
Text Analysis and Search

Más contenido relacionado

La actualidad más candente

Building Custom Advanced Analytics Applications with SAP HANA
Building Custom Advanced Analytics Applications with SAP HANABuilding Custom Advanced Analytics Applications with SAP HANA
Building Custom Advanced Analytics Applications with SAP HANASAP Technology
 
SAP HANA SPS09 - Full-text Search
SAP HANA SPS09 - Full-text SearchSAP HANA SPS09 - Full-text Search
SAP HANA SPS09 - Full-text SearchSAP Technology
 
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. Gaganpreet Singh
 
SAP Abap on Hana Training Course Content
SAP Abap on Hana Training Course ContentSAP Abap on Hana Training Course Content
SAP Abap on Hana Training Course ContentZaranTech LLC
 
Sapabapcoursecontent 130302033356-phpapp02
Sapabapcoursecontent 130302033356-phpapp02Sapabapcoursecontent 130302033356-phpapp02
Sapabapcoursecontent 130302033356-phpapp02Hemanth Kumar
 
Dmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaDmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaLuc Vanrobays
 
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Luc Vanrobays
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Ocean9, Inc.
 
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...Christian Lechner
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP Technology
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
SAP MM Versus SAP S/4 HANA
SAP MM Versus SAP S/4 HANASAP MM Versus SAP S/4 HANA
SAP MM Versus SAP S/4 HANAAnjali Rao
 
Vdocuments.mx sap retail-55fed4ead31a0
Vdocuments.mx sap retail-55fed4ead31a0Vdocuments.mx sap retail-55fed4ead31a0
Vdocuments.mx sap retail-55fed4ead31a0melisarenovales
 
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveDmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveLuc Vanrobays
 
DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015Luc Vanrobays
 
SQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSAP Technology
 

La actualidad más candente (20)

SAP ABAP Material
SAP ABAP MaterialSAP ABAP Material
SAP ABAP Material
 
The HANA Cloud Platform
The HANA Cloud PlatformThe HANA Cloud Platform
The HANA Cloud Platform
 
Building Custom Advanced Analytics Applications with SAP HANA
Building Custom Advanced Analytics Applications with SAP HANABuilding Custom Advanced Analytics Applications with SAP HANA
Building Custom Advanced Analytics Applications with SAP HANA
 
SAP HANA SPS09 - Full-text Search
SAP HANA SPS09 - Full-text SearchSAP HANA SPS09 - Full-text Search
SAP HANA SPS09 - Full-text Search
 
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators.
 
Prashantini Krishnan Chandrakumar
Prashantini Krishnan ChandrakumarPrashantini Krishnan Chandrakumar
Prashantini Krishnan Chandrakumar
 
SAP Abap on Hana Training Course Content
SAP Abap on Hana Training Course ContentSAP Abap on Hana Training Course Content
SAP Abap on Hana Training Course Content
 
Sapabapcoursecontent 130302033356-phpapp02
Sapabapcoursecontent 130302033356-phpapp02Sapabapcoursecontent 130302033356-phpapp02
Sapabapcoursecontent 130302033356-phpapp02
 
Dmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaDmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hana
 
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
 
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
ABAP Development in time of S/4 - Do's and Don'ts and Golden Rules for Simpli...
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop Integration
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
SAP MM Versus SAP S/4 HANA
SAP MM Versus SAP S/4 HANASAP MM Versus SAP S/4 HANA
SAP MM Versus SAP S/4 HANA
 
Vdocuments.mx sap retail-55fed4ead31a0
Vdocuments.mx sap retail-55fed4ead31a0Vdocuments.mx sap retail-55fed4ead31a0
Vdocuments.mx sap retail-55fed4ead31a0
 
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveDmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
 
DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015
 
SQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of Things
 
SAP ECC to S/4HANA Move
SAP ECC to S/4HANA MoveSAP ECC to S/4HANA Move
SAP ECC to S/4HANA Move
 

Destacado

SAP HANA SPS10- Text Analysis & Text Mining
SAP HANA SPS10- Text Analysis & Text MiningSAP HANA SPS10- Text Analysis & Text Mining
SAP HANA SPS10- Text Analysis & Text MiningSAP Technology
 
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform Christian Lechner
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11SAP Technology
 
SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture Matthias Steiner
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
SAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationSAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationBernhard Luecke
 
What's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptWhat's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptSAP Technology
 
What's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingWhat's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingSAP Technology
 

Destacado (8)

SAP HANA SPS10- Text Analysis & Text Mining
SAP HANA SPS10- Text Analysis & Text MiningSAP HANA SPS10- Text Analysis & Text Mining
SAP HANA SPS10- Text Analysis & Text Mining
 
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11
 
SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
SAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationSAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for Innovation
 
What's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptWhat's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScript
 
What's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingWhat's New in SAP HANA View Modeling
What's New in SAP HANA View Modeling
 

Similar a Text Analysis with SAP HANA

Certified Python Business Analyst
Certified Python Business AnalystCertified Python Business Analyst
Certified Python Business AnalystAnkitSingh2134
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptxJamesKirk79
 
Building an effective sharepoint team
Building an effective sharepoint teamBuilding an effective sharepoint team
Building an effective sharepoint teamBaris Bruce Tuncertan
 
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of Concept
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of ConceptCase Study: Lessons from Newell Rubbermaid's SAP HANA Proof of Concept
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of ConceptSAPinsider Events
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The LandscapeMegan Bowe
 
Resume_Bhavana_Gaur_SAPBW
Resume_Bhavana_Gaur_SAPBWResume_Bhavana_Gaur_SAPBW
Resume_Bhavana_Gaur_SAPBWbhavana gaur
 
In-Memory Analytics - SAP Big Data - Analytics Tools Selection - SAP HANA & ...
In-Memory Analytics - SAP Big Data - Analytics Tools Selection  - SAP HANA & ...In-Memory Analytics - SAP Big Data - Analytics Tools Selection  - SAP HANA & ...
In-Memory Analytics - SAP Big Data - Analytics Tools Selection - SAP HANA & ...Jothi Periasamy
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016IXIASOFT
 
Introduction to SAP and UiPath Automation
Introduction to SAP and UiPath AutomationIntroduction to SAP and UiPath Automation
Introduction to SAP and UiPath AutomationDianaGray10
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
SAP HANA Cookbook for MySQL Developers
SAP HANA Cookbook for MySQL DevelopersSAP HANA Cookbook for MySQL Developers
SAP HANA Cookbook for MySQL Developerssaphanacookbook
 

Similar a Text Analysis with SAP HANA (20)

Certified Python Business Analyst
Certified Python Business AnalystCertified Python Business Analyst
Certified Python Business Analyst
 
Project report
Project reportProject report
Project report
 
Sunil_HANA
Sunil_HANASunil_HANA
Sunil_HANA
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
Building an effective sharepoint team
Building an effective sharepoint teamBuilding an effective sharepoint team
Building an effective sharepoint team
 
Semantic SharePoint
Semantic SharePointSemantic SharePoint
Semantic SharePoint
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of Concept
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of ConceptCase Study: Lessons from Newell Rubbermaid's SAP HANA Proof of Concept
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of Concept
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The Landscape
 
AI for Analysts June 2016
AI for Analysts June 2016AI for Analysts June 2016
AI for Analysts June 2016
 
SAP
SAPSAP
SAP
 
Resume_Bhavana_Gaur_SAPBW
Resume_Bhavana_Gaur_SAPBWResume_Bhavana_Gaur_SAPBW
Resume_Bhavana_Gaur_SAPBW
 
sangeeta
sangeetasangeeta
sangeeta
 
In-Memory Analytics - SAP Big Data - Analytics Tools Selection - SAP HANA & ...
In-Memory Analytics - SAP Big Data - Analytics Tools Selection  - SAP HANA & ...In-Memory Analytics - SAP Big Data - Analytics Tools Selection  - SAP HANA & ...
In-Memory Analytics - SAP Big Data - Analytics Tools Selection - SAP HANA & ...
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
 
sangeeta
sangeetasangeeta
sangeeta
 
Introduction to SAP and UiPath Automation
Introduction to SAP and UiPath AutomationIntroduction to SAP and UiPath Automation
Introduction to SAP and UiPath Automation
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Sap hana
Sap hanaSap hana
Sap hana
 
SAP HANA Cookbook for MySQL Developers
SAP HANA Cookbook for MySQL DevelopersSAP HANA Cookbook for MySQL Developers
SAP HANA Cookbook for MySQL Developers
 

Más de Christian Lechner

Serverless and SAP … Oh Behave
Serverless and SAP … Oh BehaveServerless and SAP … Oh Behave
Serverless and SAP … Oh BehaveChristian Lechner
 
FaaS by Microsoft: Azure Functions and Azure Durable Functions
FaaS by Microsoft: Azure Functions and Azure Durable FunctionsFaaS by Microsoft: Azure Functions and Azure Durable Functions
FaaS by Microsoft: Azure Functions and Azure Durable FunctionsChristian Lechner
 
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...Christian Lechner
 
Serverless side by-side extensions with Azure Durable Functions
Serverless side by-side extensions with Azure Durable FunctionsServerless side by-side extensions with Azure Durable Functions
Serverless side by-side extensions with Azure Durable FunctionsChristian Lechner
 
SAP Embrace - A Look behind the curtains (by minnosphere)
SAP Embrace - A Look behind the curtains (by minnosphere)SAP Embrace - A Look behind the curtains (by minnosphere)
SAP Embrace - A Look behind the curtains (by minnosphere)Christian Lechner
 
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure Christian Lechner
 
Side-by-Side Extensibility with Microsoft Azure
Side-by-Side Extensibility with Microsoft AzureSide-by-Side Extensibility with Microsoft Azure
Side-by-Side Extensibility with Microsoft AzureChristian Lechner
 
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...Christian Lechner
 
NET53494 Extensions in the Age of S/4HANA
NET53494  Extensions in the Age of S/4HANANET53494  Extensions in the Age of S/4HANA
NET53494 Extensions in the Age of S/4HANAChristian Lechner
 

Más de Christian Lechner (10)

Serverless and SAP … Oh Behave
Serverless and SAP … Oh BehaveServerless and SAP … Oh Behave
Serverless and SAP … Oh Behave
 
FaaS by Microsoft: Azure Functions and Azure Durable Functions
FaaS by Microsoft: Azure Functions and Azure Durable FunctionsFaaS by Microsoft: Azure Functions and Azure Durable Functions
FaaS by Microsoft: Azure Functions and Azure Durable Functions
 
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
[SOT322] Serverless Side-by-Side Extensions with Azure Durable Functions - Wh...
 
Serverless side by-side extensions with Azure Durable Functions
Serverless side by-side extensions with Azure Durable FunctionsServerless side by-side extensions with Azure Durable Functions
Serverless side by-side extensions with Azure Durable Functions
 
SAP Embrace - A Look behind the curtains (by minnosphere)
SAP Embrace - A Look behind the curtains (by minnosphere)SAP Embrace - A Look behind the curtains (by minnosphere)
SAP Embrace - A Look behind the curtains (by minnosphere)
 
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
SAP Inside Track Hamburg 2019 - Side-by-Side Extensibility with Microsoft Azure
 
Side-by-Side Extensibility with Microsoft Azure
Side-by-Side Extensibility with Microsoft AzureSide-by-Side Extensibility with Microsoft Azure
Side-by-Side Extensibility with Microsoft Azure
 
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
SAP Inside Track 2018 - "Quidquid agis, prudenter agas ..." - Learnings from ...
 
NET53494 Extensions in the Age of S/4HANA
NET53494  Extensions in the Age of S/4HANANET53494  Extensions in the Age of S/4HANA
NET53494 Extensions in the Age of S/4HANA
 
sitFRA_ BRFplus_TheAPIWay
sitFRA_ BRFplus_TheAPIWaysitFRA_ BRFplus_TheAPIWay
sitFRA_ BRFplus_TheAPIWay
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Text Analysis with SAP HANA

  • 2. Text Analysis with SAP HANA 2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 3. Text Analysis with SAP HANA 3Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 4. Text Analysis with SAP HANA Why do we need Text Analysis? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4 • According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.) • The data might origin from:  Social Networks  “Letters” from Customer  ... • What is the problem with unstructured data? • It is unstructured!  Not organized  No pre-defined data model  No metadata or mix of data and metadata  We have a lot of information that is relevant for the business but we cannot access it 
  • 5. Text Analysis with SAP HANA How can we solve that issue? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 5 • Text Analysis: Extracting high quality information from texts • Typical process of a text analysis:  Parsing of the text  Adding features like linguistic information  Entity recognition: Is it an organization or a person or a place including domain facts like requests?  Sentiment analysis: What attitudinal information is “hidden” in the text?  Insertion of information to database in structured manner
  • 6. Text Analysis with SAP HANA 6Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 7. Text Analysis with SAP HANA What has this to do with SAP HANA? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 7 © SAP SE
  • 8. Text Analysis with SAP HANA Fulltext Index - Basics Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 8 • Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …) • Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
  • 9. Text Analysis with SAP HANA Entity Extraction Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 9 • In order to get valuable information out of the data SAP delivers several configurations • These configurations focus on entity and fact extraction under specific aspects • Types of Extraction:  EXTRACTION_CORE  EXTRACTION_CORE_ENTERPRISE  EXTRACTION_CORE_PUBLIC_SECTOR  EXTRACTION_CORE_VOICEOFCUSTOMER
  • 10. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 10
  • 11. Text Analysis with SAP HANA 11Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich Motivation1 3 Text Analysis with SAP HANA2 7 Enhancement Options - Dictionaries and Rules3 21
  • 12. Text Analysis with SAP HANA Custom Dictionary Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 12 • In several use cases you need to enhance the dictionary due to your business domain • Structure of a dictionary © SAP SE
  • 13. Text Analysis with HANA – Workflow of Enhancement Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 13 1. Find an extraction configuration that is most fitting for you 2. Copy the configuration into the target folder 3. Create a new custom dictionary 4. Reference the dictionary in your configuration copy 5. Recreate the fulltext index using your custom configuration
  • 14. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 14
  • 15. Text Analysis with HANA – What’s next? Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 15 • Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities • Good example for this are sports! • We use the example of CrossFit® … as there are some funny facts to extract • Question: How can we extract complex entities from a text? • Examples:  Did somebody attend a CrossFit training?  Does somebody want to join a CrossFit box?
  • 16. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 16 • Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types. • Goal of the rule sets:  Extract complex facts based on relations between entities and predicates.  Identify entities in domain-specific language and capture facts expressed in new, popular “slang”
  • 17. Text Analysis with HANA – Text Analysis Extraction Rules Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 17 Extraction Rule Regular ExpressionsTokens Luck Dictionaries
  • 18. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 20
  • 19. Text Analysis with HANA – “Lessons Learned” Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 21 • Text Analysis on SAP HANA is extremely powerful • Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need • This also means you have a lot of options that you can set the wrong way  • Since SP09 rules get compiled upon activation (no separate compilation necessary) • The documentation is mostly ok but has room for improvement in case of extraction rules • Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell  No support in IDE   You can usually activate all objects, create the index … but the index remains empty 
  • 20. Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 22 Q&A
  • 21. .consulting .solutions .partnership Dr. Christian Lechner Principal IT Consultant +49 (0) 171 7617190 christian.lechner@msg-systems.com http://scn.sap.com/people/christian.lechner @lechnerc77
  • 22. Text Analysis with HANA – Ressources Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 24 • SAP HANA Search Developer Guide (Fulltext Index Options) help.sap.com -> Search Developer Guide • SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide • SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide • SAP HANA Text Analysis Extraction Customization Guide: help.sap.com -> TA Extraction Customization Guide • YouTube Playlist of SAP HANA Academy: Text Analysis and Search

Notas del editor

  1. Text analysis in SAP HANA is a suite of natural-language processing capabilities based on linguistic, statistical and machine-learning algorithms that model and structure the information content of textual sources in multiple languages. This technology forms the foundation for advanced text processing for a range of applications including search, business intelligence or exploratory data analysis.
  2. LANGUAGE COLUMN <column_name> - Defines the column where the language of a document is specified. LANGUAGE DETECTION ( <string_literal_list> ) - The set of languages to be considered during language detection. MIME TYPE COLUMN <column_name> - Defines the column where the mime-type of a document is specified. FUZZY SEARCH INDEX <on_off> - Specifies whether a fuzzy search index should be used. PHRASE INDEX RATIO <index_ratio> <index_ratio> ::= <exact_numeric_literal> - Specifies the percentage of the phrase index. Value must be between 0.0 and 1.0 Stores information about the occurrence of words and the proximity of words to one another. If a phrase index is present, phrase searches are sped up (e.g. SELECT * FROM T WHERE CONTAINS(COLUMN1, '"cats and dogs"')) . The float value is between 0.0 and 1.0. 1.0 means that the internal phrase index can use 100% of the memory size of the fulltext index. CONFIGURATION <string_literal> - The path to a custom configuration file for text analysis. SEARCH ONLY <on_off> - Defines if the original document should be stored or only the search results. When set to ON the original document content is not stored. FAST PREPROCESS <on_off> - If set to ON, fast preprocessing is used, i.e. linguistic searches are not possible. TEXT ANALYSIS <on_off> - Enables text analysis capabilities on the indexed column. Text analysis can extract entities such as persons, products, or places from documents, which are stored in a new table. MIME TYPE <string_literal> - The default mime type used for preprocessing. The value must be a valid mime type. TOKEN SEPARATORS <string_literal> - A set of characters used for token separation. Only ASCII characters are considered. <change_tracking_elem> ::= SYNC[HRONOUS] | ASYNC[HRONOUS] [FLUSH [QUEUE] <flush_queue_elem>] - The type of index to be created. SYNC[HRONOUS] - Creates a synchronous fulltext index. ASYNC[HRONOUS] - Creates an asynchronous fulltext index. FLUSH [QUEUE] <flush_queue_elem> <flush_queue_elem> ::= EVERY <integer_literal> MINUTES | AFTER <integer_literal> DOCUMENTS | EVERY <integer_literal> MINUTES OR AFTER <integer_literal> DOCUMENTS - Specifies when to update the fulltext index if an asynchronous index is used. When DOCUMENTS is specified, the fulltext index will be updated after the specified number of changes to the table, including updates and deletes. TEXT MINING <on_off> - Enables text mining capabilities on the indexed column. Text mining provides functionality that can compare documents by examining the terms used within them. TEXT MINING CONFIGURATION <string_literal> - The path to a custom configuration file for text mining. If not specified, DEFAULT.textminingconfig is use
  3. Entity Extraction is the identification of named entities (persons, organizations etc.), which eliminates the 'noise' in textual data by highlighting salient information. This process transforms unstructured text into structured information.  Fact Extraction is a higher-level semantic processing that links entities as "facts" in domain-specific applications. For example, "Voice of the Customer" classifies sentiments with their corresponding topics.
  4. CGUL - Custom Group User Language