SlideShare una empresa de Scribd logo
1 de 22
TEXT MINING
seminar submitted by:
Ali Abdul_Zahraa
Msc,MathcompUOK
ali.abdulzahraa@gmail.com
Outline
Introduction
Data Mining vs Text Mining
Text Mining Process
Text Mining Applications
Challenges in Text Mining
Conclusion
Introduction
• What is Text Mining?
– Text mining is the analysis of data contained in
natural language text
Introduction
• Why Text Mining?
– Massive amount of new information being
created World’s data doubles every 18 months
(Jacques Vallee Ph.D)
– 80-90% of all data is held in various
unstructured formats
– Useful information can be derived from this
unstructured data
Unstructured Data Examples “Ore”
• Email
• Insurance claims
• News articles
• Web pages
• Patent portfolios
• Customer
complaint letters
• Contracts
• Transcripts of
phone calls with
customers
• Technical
documents
Reasons for Text Mining
0
10
20
30
40
50
60
70
80
90
Percentage
Collections of
Text
Structured Data
How Text Mining Differs from Data
Mining
Data Mining
• Identify data sets
• Select features
• Prepare data
• Analyze
distribution
Text Mining
• Identify documents
• Extract features
• Select features by
algorithm
• Prepare data
• Analyze
distribution
Mining
 Filtering : remove punctuation, special
characters .
Segmentation: segment document to
words.
Stemming : Techniques used to
find out the root/stem of a word:
– E.g.,
– user engineering
– users engineered
– used engineer
– using
• Stem (root) : use engineer
Usefulness
• improving effectiveness of retrieval and text mining
– matching similar words
• reducing indexing size
– combing words with same roots may reduce indexing size as much
as 40-50%.
Mining
 Basic stemming methods
• remove ending
– if a word ends with a consonant other than s,
followed by an s, then delete s.
– if a word ends in es, drop the s.
– if a word ends in ing, delete the ing unless the remaining word consists only
of one letter or of th.
– If a word ends with ed, preceded by a consonant, delete the ed unless this
leaves only a single letter.
– …...
• transform words
– if a word ends with “ies” but not “eies” or “aies” then “ies ”
Mining
Mining
eliminate excessive words : words that not
give meaning by itself such as preposition
, conjunction , conditional particle.
That is performed by comparison with a list
of these words.
Canonical Names
President Bush
Mr. Bush
George Bush
Canonical Name:
George Bush
• The canonical name is the most explicit, least
ambiguous name constructed from the different
variants found in the document
• Reduces ambiguity of variants
Mining
Clipping : eliminate words that appear in high
or low frequency.
o The low frequency’s words will forms small
clusters that not useful , and high frequency’s
words that is always appear and it’s also not
useful.
o There is many ways to calculate word’s
frequency in document(s)
Mining
Clustering : Clustering interrelated
documents, based on documents topics.
Text Mining: Analysis
• Which words are most present.
• Which words are most interesting .
• Which words help define the document.
• What are the interesting text phrases?
Text mining applications
• Call Center Software.
• Anti-Spam.
• Market Intelligence.
• Mining in web .
Actual examples
• One of clinical center in USA be capable of
determine one of genes that responsible for
one of harmful diseases by treat greater than
150,000 news paper.
• Text mining in holy Quran.
• Etc….
Challenges in Text Mining
• Information is in unstructured textual form and it’s
in Natural Language (NL).
• Not readily accessible to be used by computers.
• Dealing with huge collections of documents.
• Require Skillful person to choose which documents
that will treat , and analysis the output .
• Require more time.
• Cost , 50,000$ just to software.
More information
• Central Intelligence Agency (CIA) the most
supportive to text mining .
- 11/ September events.
- mining in E-mail , chat rooms, and social
networks .
-So its support many companies such as
Attensity ،Inxight , Intelliseek.
More information
• SPSS company statistic’s : text mining software
user’s so little comparing with data mining
software user’s.
conclusion
• Finally, most refer to that the field of text
mining are still in the research phase
• and still its applications limited operation at
the present time
• but the possibilities that can be provided,
which helps to understand the huge amounts
of text and extract the core of which
information is important and useful prospects
in many areas .
MSC STUDENT ALI ABDUL ZAHRAA EXPLAINS TEXT MINING

Más contenido relacionado

La actualidad más candente

Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 

La actualidad más candente (20)

Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Text Mining
Text MiningText Mining
Text Mining
 
Web mining
Web mining Web mining
Web mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data mining
Data miningData mining
Data mining
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 

Destacado

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining Jeremiah Fadugba
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text miningLars Juhl Jensen
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

Destacado (6)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 

Similar a MSC STUDENT ALI ABDUL ZAHRAA EXPLAINS TEXT MINING

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Yunyao Li
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Davood Rafiei
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppttestbest6
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Michael Fanning
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining Bhawi247
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web searchVictor de Boer
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath scienceMitikuTeka1
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureLouis Rosenfeld
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalS. M. Hassan Zaidi
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)WiLS
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppttestbest6
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 

Similar a MSC STUDENT ALI ABDUL ZAHRAA EXPLAINS TEXT MINING (20)

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Oss swot
Oss swotOss swot
Oss swot
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Five Reasons To Clone Librarians
Five Reasons To Clone Librarians
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 
Text Mining
Text MiningText Mining
Text Mining
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 
How to get started on researching your m sc project
How to get started on researching your m sc projectHow to get started on researching your m sc project
How to get started on researching your m sc project
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine Final
 
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppt
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 

Más de Ali A Jalil

Clean Code: Successive Refinement
Clean Code: Successive RefinementClean Code: Successive Refinement
Clean Code: Successive RefinementAli A Jalil
 
Image classification
Image classificationImage classification
Image classificationAli A Jalil
 
Photometric calibration
Photometric calibrationPhotometric calibration
Photometric calibrationAli A Jalil
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection NetworkAli A Jalil
 
Features image processing and Extaction
Features image processing and ExtactionFeatures image processing and Extaction
Features image processing and ExtactionAli A Jalil
 

Más de Ali A Jalil (10)

Clean Code: Successive Refinement
Clean Code: Successive RefinementClean Code: Successive Refinement
Clean Code: Successive Refinement
 
And or graph
And or graphAnd or graph
And or graph
 
Markov model
Markov modelMarkov model
Markov model
 
Image classification
Image classificationImage classification
Image classification
 
HDR
HDRHDR
HDR
 
Photometric calibration
Photometric calibrationPhotometric calibration
Photometric calibration
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Polygon drawing
Polygon drawingPolygon drawing
Polygon drawing
 
Polygon drawing
Polygon drawingPolygon drawing
Polygon drawing
 
Features image processing and Extaction
Features image processing and ExtactionFeatures image processing and Extaction
Features image processing and Extaction
 

Último

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

MSC STUDENT ALI ABDUL ZAHRAA EXPLAINS TEXT MINING

  • 1. TEXT MINING seminar submitted by: Ali Abdul_Zahraa Msc,MathcompUOK ali.abdulzahraa@gmail.com
  • 2. Outline Introduction Data Mining vs Text Mining Text Mining Process Text Mining Applications Challenges in Text Mining Conclusion
  • 3. Introduction • What is Text Mining? – Text mining is the analysis of data contained in natural language text
  • 4. Introduction • Why Text Mining? – Massive amount of new information being created World’s data doubles every 18 months (Jacques Vallee Ph.D) – 80-90% of all data is held in various unstructured formats – Useful information can be derived from this unstructured data
  • 5. Unstructured Data Examples “Ore” • Email • Insurance claims • News articles • Web pages • Patent portfolios • Customer complaint letters • Contracts • Transcripts of phone calls with customers • Technical documents
  • 6. Reasons for Text Mining 0 10 20 30 40 50 60 70 80 90 Percentage Collections of Text Structured Data
  • 7. How Text Mining Differs from Data Mining Data Mining • Identify data sets • Select features • Prepare data • Analyze distribution Text Mining • Identify documents • Extract features • Select features by algorithm • Prepare data • Analyze distribution
  • 8. Mining  Filtering : remove punctuation, special characters . Segmentation: segment document to words.
  • 9. Stemming : Techniques used to find out the root/stem of a word: – E.g., – user engineering – users engineered – used engineer – using • Stem (root) : use engineer Usefulness • improving effectiveness of retrieval and text mining – matching similar words • reducing indexing size – combing words with same roots may reduce indexing size as much as 40-50%. Mining
  • 10.  Basic stemming methods • remove ending – if a word ends with a consonant other than s, followed by an s, then delete s. – if a word ends in es, drop the s. – if a word ends in ing, delete the ing unless the remaining word consists only of one letter or of th. – If a word ends with ed, preceded by a consonant, delete the ed unless this leaves only a single letter. – …... • transform words – if a word ends with “ies” but not “eies” or “aies” then “ies ” Mining
  • 11. Mining eliminate excessive words : words that not give meaning by itself such as preposition , conjunction , conditional particle. That is performed by comparison with a list of these words.
  • 12. Canonical Names President Bush Mr. Bush George Bush Canonical Name: George Bush • The canonical name is the most explicit, least ambiguous name constructed from the different variants found in the document • Reduces ambiguity of variants
  • 13. Mining Clipping : eliminate words that appear in high or low frequency. o The low frequency’s words will forms small clusters that not useful , and high frequency’s words that is always appear and it’s also not useful. o There is many ways to calculate word’s frequency in document(s)
  • 14. Mining Clustering : Clustering interrelated documents, based on documents topics.
  • 15. Text Mining: Analysis • Which words are most present. • Which words are most interesting . • Which words help define the document. • What are the interesting text phrases?
  • 16. Text mining applications • Call Center Software. • Anti-Spam. • Market Intelligence. • Mining in web .
  • 17. Actual examples • One of clinical center in USA be capable of determine one of genes that responsible for one of harmful diseases by treat greater than 150,000 news paper. • Text mining in holy Quran. • Etc….
  • 18. Challenges in Text Mining • Information is in unstructured textual form and it’s in Natural Language (NL). • Not readily accessible to be used by computers. • Dealing with huge collections of documents. • Require Skillful person to choose which documents that will treat , and analysis the output . • Require more time. • Cost , 50,000$ just to software.
  • 19. More information • Central Intelligence Agency (CIA) the most supportive to text mining . - 11/ September events. - mining in E-mail , chat rooms, and social networks . -So its support many companies such as Attensity ،Inxight , Intelliseek.
  • 20. More information • SPSS company statistic’s : text mining software user’s so little comparing with data mining software user’s.
  • 21. conclusion • Finally, most refer to that the field of text mining are still in the research phase • and still its applications limited operation at the present time • but the possibilities that can be provided, which helps to understand the huge amounts of text and extract the core of which information is important and useful prospects in many areas .