SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Text Mining – Techniques & Limitations
Pharmaceutical Industry Perspective
14.9.2013, EC-L4E-WG4

Frank Oellien
Secretary EuCheMS DCC
Overview
• Text Mining Sources
• Text Mining Techniques (Basic Concepts)
– (Incomplete) Software Overview
– Identification and Retrieval
– Content Processing (Information to Knowledge)
 Classification
 Semantics/Ontology
 Data Extraction (e.g. Chemical Objects)

– Delivery and Presentation

• Current Limitations
• Workarounds
• (Ideal) Future Perspective
2
Text Mining Sources

Databases

Sharepoints

Social
Media
Web

BIG DATA
Patents

Papers
Data
Repositories

3

Wikipedia
Text Mining Techniques - Software

4
Text Mining Techniques - Software
Use Accelrys Pipeline Pilot (Text Mining
Collection) to introduce some basic concepts

5
Techniques – Identify and Retrieve
• A lot of readers and
search components are
available
• Simple text queries
• But also specific search
capabilities like “TAC
Universal Query
Language” or “PubMed
Fields syntax”
• But more sophisticated
approaches are
sometimes necessary
→ Semantics/Ontology
6
Techniques – Identify and Retrieve II
Semantics: Expands text query terms by using Concept
Dictionaries (MeSH)
Search: “esophageal neoplasms”

7
Techniques – Identify and Retrieve II
Semantics: Expands text query terms by using Concept
Dictionaries (MeSH)
Search: “esophageal neoplasms”

8
Techniques – Content Processing
How to retrieve Knowledge from Information?
• Search: “RNAi”
• Concept: MeSH
• Specific MeSH
Concept

9
Techniques – Content Processing
How to retrieve Knowledge from Information?
• Search: “RNAi”
• Concept: MeSH
• Specific MeSH
Concept

10
Techniques – Content Processing II
Advanced Classifications, Text Fingerprints

11
Techniques – Content Processing III
Relative Mutual Information (RMI)
 Use Text to Find Anti-AIDS Actives

12
Techniques – Content Processing III
Relative Mutual Information (RMI)
 Use Text to Find Anti-AIDS Actives

13
Techniques – Content Processing III
Relative Mutual Information (RMI)
 Use Text to Find Anti-AIDS Actives

14
Techniques – Content Processing IV
RMI Correlation and Trends
 Chemical Trends in Arthritis

15
Techniques – Content Processing IV
RMI Correlation and Trends
 Chemical Trends in Arthritis

16
Techniques – Content Processing V
Data Mining – Extract Chemical Knowledge/Objects
•
•

Identify and extract chemicals by name, convert them to chemical structures and
store them into a chemical database
Store related literature in text database

17
Techniques – Content Processing V
Data Mining – Extract Chemical Knowledge/Objects
•
•

Identify and extract chemicals by name, convert them to chemical structures and
store them into a chemical database
Store related literature in text database

18
Techniques – Delivery and Presentation

19
Current Limitations
Nature News, 21.03.2013

“Fearful that their content might be freely redistributed, publishers tend to block
programs that they find crawling the full text of articles, making no exceptions for
users who have paid for access. They give permission only on a case-by-case
basis to those who negotiate agreements on access and use.”

20
Full Paper Accessibility Limitations
General Aspects
Old Full Papers (<1990)
• Not available as full text PDF, but as scanned images saved as
PDFs
• OCR software needed to get text

Subscriptions
• Customers do not subscribe to each and every journal
• Only Full Papers that are covered by subscriptions or OA papers are
(could be) available for text mining

21
Full Paper Accessibility Limitations II
Licenses and Copyright
• Almost all publishers have clauses in licenses prohibiting any
robotic, systematic mining of their websites
• This also affects customers with valid subscriptions
• Copyright issues prevent bulk downloading and local storage of full
papers

•
•
•
•

Some progress has already been made by publishers
Intermediary workarounds like CCC will be used
Additional solutions might become available by the end of this year
But currently, we are still facing some limitations

• Even if there are no copyright and license issues like in the case of
Open Access publishers, restrictions still exists that handle if and
how frequently you scrape or make calls to the publisher websites
22
Full Paper Accessibility Limitations III
Hardware Infrastructure
• Publishers common websites and web interfaces are generally not
equipped to handle extreme traffic
• Massive impact on Server infrastructure
• Text Mining activities are interfering with their normal web traffic
– Bulk download of full papers
– On the fly text mining of full papers

Clauses protect publishers also for text mining caused breakdowns of
their common websites

23
Full Paper Accessibility Limitations IV
Improper Distribution and Search Capabilities
• Full text papers are improper distributed for text mining purposes
– Many publishers and sources
– Specific readers for each source needed

• No standardized APIs to perform searches
–
–
–
–

In some cases no API at all
Different APIs force the development of specific readers
No standardized feeds available
Additional post-processing needed

• Limited Search Capabilities
– Only simple text queries are available or simple advanced queries
– Semantic/Concept Dictionary-driven searches are not possible
 More sophisticated search
 Covers more of the relevant information
 Limits searches to the important full paper  limitation of the number of paper
that have to be downloaded)
24
Full Paper Accessibility Limitations V
Financial Aspects
• Many, sophisticated text mining tools are available and will be used
•
•
•

•
•

by customers. However these tools are costly and sometimes
require specialists to operate
Downloading tools or additional services as CCC and QUOSA are
also costly and even increase the financial expenses
Text Mining agreements are often related to additional costs, even if
subscription contracts are already in place
A text mining-driven full paper download has not the same value as
a common, manual full paper download! Most/many papers will be
rejected during the text mining approach. Only few papers contain
the desired information
Text Mining cannot be performed on a financial negotiated case-bycase basis (long-term planning)  text mining tasks are too diverse
and occur as part of the daily work
In the current practice lies another obstacle in times of budgetary
25
constraints
Workarounds – Abstracts, PM Central
Use Public Available Information - Abstracts
• The greatest success has been made in mining publicly available
information such as citations and abstracts from PubMed/Medline
• This has been/can be licensed in as a flat file for text mining purpose
• Limited to Life Science field
• Can be used as starting point to identify promising and valuable full papers

Use PubMed Central (OA Repository)
•
•
•
•

Access to full papers without license restrictions
This has been/can be licensed in as downloaded copy for text mining
Only 2-3 million papers available, 10% of PubMed/Medline
Only Papers after 1990, old articles are not included

Other OA Full Papers from OA Publishers
•
•

No copyright issue
But, all other restrictions (incl. restricting clauses)
26
Workarounds – Customized Solution
Publisher-specific, customized Solutions
• Evaluate customized text mining opportunities in private
projects/contracts between customer and publisher
• Pharma custromers have experimented with having access to a
publisher’s website to mine at a specified rate during off-hours, and
in having feeds or APIs from publishers
– Try to prevent interfering with publishers normal web traffic
– Try to spare publishers infrastructure

• These options come at additional cost and the coverage of
published literature has not been comprehensive
 therefore most of these efforts have been short-lived

27
Workarounds – CCC and QUOSA
QUOSA: Scientific Literature Management Software
(Elsevier)
“QUOSA has been used in a limited fashion by pharma industry at least 7 years
as a downloading tool. It works by linking from specified search results to
download pdf files for mining. The product has been very problematic in our
environment, both in terms of speed and accuracy. PubGet was a competitor,
primarily focused on PubMed search results and had limited
practicality. PubGet was purchased by the Copyright Clearance Center which
plans to develop a new text mining tool, but has nothing to offer at this time.”

Copyright Clearance Center (CCC)
“The Copyright Clearance Center (CCC) in Danvers, Massachusetts, which
works with publishers on rights licensing, is pursuing a more ambitious effort. It
would act as an intermediary, collecting publishers’ terms and content and
storing them on a website for researchers. It is working with six publishers
(including Nature Publishing Group) and with drug and chemical firms eager
to mine the literature.”
Help to handle the copy rights and other issues (e.g. bulk downloading)
with the publishers
28
(Ideal) Future Perspective
• Use of standardized APIs and feeds (over all publishers incl. OA)
• Use mirror sites where tools can operate without interfering with the
•
•
•
•
•
•

primary publisher’s website
Proper distribution / platforms that cover several publishers
(CrossRef)
Availability of semantic searches (concept dictionaries) to allow
advanced searches  limit number of full paper to retrieve
Text Mining should be possible directly between publishers and
customers based on existing subscriptions
Reasonable costs / Adapted pricing schemes
Copyright/License harmonization
No case-by-case basis for access, agreements and licenses but
a general agreement that will allow text mining approaches in
general

It’s about having Access and not about costs
29
Thank You

30

Más contenido relacionado

La actualidad más candente

OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE
 
The Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersThe Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersWorld Bank Publications
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020OpenAIRE
 
2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum TalkPaul Bracke
 
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)OpenAIRE
 
FHIR for clinicians
FHIR for clinicians FHIR for clinicians
FHIR for clinicians David Hay
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesOpenAIRE
 
Alma Swan - PASTEUR4OA: Policy alignment and effectiveness
Alma Swan - PASTEUR4OA: Policy alignment and effectivenessAlma Swan - PASTEUR4OA: Policy alignment and effectiveness
Alma Swan - PASTEUR4OA: Policy alignment and effectivenessOpenAIRE
 
Books and Monographs in Horizon Europe- OPERAS 2020
Books and Monographs in Horizon Europe- OPERAS 2020Books and Monographs in Horizon Europe- OPERAS 2020
Books and Monographs in Horizon Europe- OPERAS 2020Victoria Tsoukala
 
Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Carolyn Ten Holter
 
Actions and Updates on the Standards and Best Practices Front
Actions and Updates on the Standards and Best Practices FrontActions and Updates on the Standards and Best Practices Front
Actions and Updates on the Standards and Best Practices FrontNASIG
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola MasuzzoOpen Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola MasuzzoOpenAIRE
 
EOSC-Nordic: FAIR support for data repositories
EOSC-Nordic: FAIR support for data repositoriesEOSC-Nordic: FAIR support for data repositories
EOSC-Nordic: FAIR support for data repositoriesJosefine Nordling
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE
 

La actualidad más candente (20)

OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
 
ER&L SUSHI ALI Feb 2015
ER&L SUSHI ALI Feb 2015ER&L SUSHI ALI Feb 2015
ER&L SUSHI ALI Feb 2015
 
The Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge ProducersThe Open Access Policy and its Benefits For Knowledge Producers
The Open Access Policy and its Benefits For Knowledge Producers
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020
 
2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk2014 ALA MW SPARC-ACRL Forum Talk
2014 ALA MW SPARC-ACRL Forum Talk
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
 
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)
OpenAIRE webinar: Open Access to Publications in Horizon 2020 (May 2017)
 
FHIR for clinicians
FHIR for clinicians FHIR for clinicians
FHIR for clinicians
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
 
Uncork Your Licenses!
Uncork Your Licenses!Uncork Your Licenses!
Uncork Your Licenses!
 
Alma Swan - PASTEUR4OA: Policy alignment and effectiveness
Alma Swan - PASTEUR4OA: Policy alignment and effectivenessAlma Swan - PASTEUR4OA: Policy alignment and effectiveness
Alma Swan - PASTEUR4OA: Policy alignment and effectiveness
 
Books and Monographs in Horizon Europe- OPERAS 2020
Books and Monographs in Horizon Europe- OPERAS 2020Books and Monographs in Horizon Europe- OPERAS 2020
Books and Monographs in Horizon Europe- OPERAS 2020
 
Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...Secure data management, analysis, infrastructure and policy in an internation...
Secure data management, analysis, infrastructure and policy in an internation...
 
Actions and Updates on the Standards and Best Practices Front
Actions and Updates on the Standards and Best Practices FrontActions and Updates on the Standards and Best Practices Front
Actions and Updates on the Standards and Best Practices Front
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola MasuzzoOpen Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
 
EOSC-Nordic: FAIR support for data repositories
EOSC-Nordic: FAIR support for data repositoriesEOSC-Nordic: FAIR support for data repositories
EOSC-Nordic: FAIR support for data repositories
 
Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositories
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
 

Destacado

OUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionOUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionFlorian Leitner
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance
 
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsMedical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsLars Juhl Jensen
 
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوب
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوبتحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوب
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوبHamed Azizi
 
عظيم داده چيست؟
عظيم داده چيست؟عظيم داده چيست؟
عظيم داده چيست؟digidanesh
 
کلان داده کاربردها و چالش های آن
کلان داده کاربردها و چالش های آنکلان داده کاربردها و چالش های آن
کلان داده کاربردها و چالش های آنHamed Azizi
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text miningLars Juhl Jensen
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search EngineJay R Modi
 

Destacado (11)

OUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: IntroductionOUTDATED Text Mining 1/5: Introduction
OUTDATED Text Mining 1/5: Introduction
 
Text mining
Text miningText mining
Text mining
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
 
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsMedical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
 
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوب
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوبتحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوب
تحلیل احساسات شبکه اجتماعی متن کاوی نظرکاوی حامد عزیزی تهران جنوب
 
عظيم داده چيست؟
عظيم داده چيست؟عظيم داده چيست؟
عظيم داده چيست؟
 
کلان داده کاربردها و چالش های آن
کلان داده کاربردها و چالش های آنکلان داده کاربردها و چالش های آن
کلان داده کاربردها و چالش های آن
 
بیگ دیتا
بیگ دیتابیگ دیتا
بیگ دیتا
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 

Similar a Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)

Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Ina Smith
 
ufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfTeshome Oljira
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG: connecting the knowledge community
 
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG: connecting the knowledge community
 
Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Emma Warren-Jones
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...Open Science Fair
 
Practical Steps to Address Piracy
Practical Steps to Address PiracyPractical Steps to Address Piracy
Practical Steps to Address PiracyChris Shillum
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]Frederick Zarndt
 
Open access savvy skills 2011
Open access savvy skills 2011Open access savvy skills 2011
Open access savvy skills 2011Robert Perret
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
Metadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsMetadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsChelcie Rowell
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateTACNISO
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisMarcus Hanwell
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...redsys
 
Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals KidsintheCloud
 

Similar a Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint) (20)

Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)
 
ufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdf
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
NISO April 30th RA21 Webinar
NISO April 30th RA21 WebinarNISO April 30th RA21 Webinar
NISO April 30th RA21 Webinar
 
Open Development
Open DevelopmentOpen Development
Open Development
 
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
UKSG 2018 Breakout - TERMS redefined: developing the combination of electroni...
 
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...UKSG webinar - TERMS revisited: developing the combination of electronic reso...
UKSG webinar - TERMS revisited: developing the combination of electronic reso...
 
Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.Text Data Mining: Unlocking the hidden potential from scholarly content.
Text Data Mining: Unlocking the hidden potential from scholarly content.
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Practical Steps to Address Piracy
Practical Steps to Address PiracyPractical Steps to Address Piracy
Practical Steps to Address Piracy
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]
 
Open access savvy skills 2011
Open access savvy skills 2011Open access savvy skills 2011
Open access savvy skills 2011
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Metadata Ownership & Metadata Rights
Metadata Ownership & Metadata RightsMetadata Ownership & Metadata Rights
Metadata Ownership & Metadata Rights
 
McCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint WebinarMcCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint Webinar
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project Update
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
II-SDV 2016 RightsDirect
II-SDV 2016 RightsDirectII-SDV 2016 RightsDirect
II-SDV 2016 RightsDirect
 
Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...Improving library services with semantic web technology in the realm of repos...
Improving library services with semantic web technology in the realm of repos...
 
Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals
 

Más de Frank Oellien

Algorithmen und Applikationen zur interaktiven Visualisierung und Analyse ch...
Algorithmen und Applikationen zur interaktiven  Visualisierung und Analyse ch...Algorithmen und Applikationen zur interaktiven  Visualisierung und Analyse ch...
Algorithmen und Applikationen zur interaktiven Visualisierung und Analyse ch...Frank Oellien
 
Cheminformatics – Computergestützte Anwendungen in der Chemie
Cheminformatics – Computergestützte Anwendungen in der ChemieCheminformatics – Computergestützte Anwendungen in der Chemie
Cheminformatics – Computergestützte Anwendungen in der ChemieFrank Oellien
 
Tautomers - Advanced Databases for in-silico Screening
Tautomers - Advanced Databases for in-silico ScreeningTautomers - Advanced Databases for in-silico Screening
Tautomers - Advanced Databases for in-silico ScreeningFrank Oellien
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Frank Oellien
 
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Frank Oellien
 
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...Frank Oellien
 
Chemische Visualisierung im Internet
Chemische Visualisierung im InternetChemische Visualisierung im Internet
Chemische Visualisierung im InternetFrank Oellien
 
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Frank Oellien
 
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...Frank Oellien
 
Interactive Datamining of Large-Scale Screening Datasets
Interactive Datamining of Large-Scale Screening DatasetsInteractive Datamining of Large-Scale Screening Datasets
Interactive Datamining of Large-Scale Screening DatasetsFrank Oellien
 

Más de Frank Oellien (10)

Algorithmen und Applikationen zur interaktiven Visualisierung und Analyse ch...
Algorithmen und Applikationen zur interaktiven  Visualisierung und Analyse ch...Algorithmen und Applikationen zur interaktiven  Visualisierung und Analyse ch...
Algorithmen und Applikationen zur interaktiven Visualisierung und Analyse ch...
 
Cheminformatics – Computergestützte Anwendungen in der Chemie
Cheminformatics – Computergestützte Anwendungen in der ChemieCheminformatics – Computergestützte Anwendungen in der Chemie
Cheminformatics – Computergestützte Anwendungen in der Chemie
 
Tautomers - Advanced Databases for in-silico Screening
Tautomers - Advanced Databases for in-silico ScreeningTautomers - Advanced Databases for in-silico Screening
Tautomers - Advanced Databases for in-silico Screening
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?
 
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
 
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
 
Chemische Visualisierung im Internet
Chemische Visualisierung im InternetChemische Visualisierung im Internet
Chemische Visualisierung im Internet
 
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
 
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...
Algorithmen und Applikationen zur interaktiven Visualisierung und visuellen D...
 
Interactive Datamining of Large-Scale Screening Datasets
Interactive Datamining of Large-Scale Screening DatasetsInteractive Datamining of Large-Scale Screening Datasets
Interactive Datamining of Large-Scale Screening Datasets
 

Último

Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiSuhani Kapoor
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...CALL GIRLS
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...Neha Kaur
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...narwatsonia7
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomdiscovermytutordmt
 

Último (20)

Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
 
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
 
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 9907093804 Top Class Call Girl Service Available
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 

Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)

  • 1. Text Mining – Techniques & Limitations Pharmaceutical Industry Perspective 14.9.2013, EC-L4E-WG4 Frank Oellien Secretary EuCheMS DCC
  • 2. Overview • Text Mining Sources • Text Mining Techniques (Basic Concepts) – (Incomplete) Software Overview – Identification and Retrieval – Content Processing (Information to Knowledge)  Classification  Semantics/Ontology  Data Extraction (e.g. Chemical Objects) – Delivery and Presentation • Current Limitations • Workarounds • (Ideal) Future Perspective 2
  • 3. Text Mining Sources Databases Sharepoints Social Media Web BIG DATA Patents Papers Data Repositories 3 Wikipedia
  • 4. Text Mining Techniques - Software 4
  • 5. Text Mining Techniques - Software Use Accelrys Pipeline Pilot (Text Mining Collection) to introduce some basic concepts 5
  • 6. Techniques – Identify and Retrieve • A lot of readers and search components are available • Simple text queries • But also specific search capabilities like “TAC Universal Query Language” or “PubMed Fields syntax” • But more sophisticated approaches are sometimes necessary → Semantics/Ontology 6
  • 7. Techniques – Identify and Retrieve II Semantics: Expands text query terms by using Concept Dictionaries (MeSH) Search: “esophageal neoplasms” 7
  • 8. Techniques – Identify and Retrieve II Semantics: Expands text query terms by using Concept Dictionaries (MeSH) Search: “esophageal neoplasms” 8
  • 9. Techniques – Content Processing How to retrieve Knowledge from Information? • Search: “RNAi” • Concept: MeSH • Specific MeSH Concept 9
  • 10. Techniques – Content Processing How to retrieve Knowledge from Information? • Search: “RNAi” • Concept: MeSH • Specific MeSH Concept 10
  • 11. Techniques – Content Processing II Advanced Classifications, Text Fingerprints 11
  • 12. Techniques – Content Processing III Relative Mutual Information (RMI)  Use Text to Find Anti-AIDS Actives 12
  • 13. Techniques – Content Processing III Relative Mutual Information (RMI)  Use Text to Find Anti-AIDS Actives 13
  • 14. Techniques – Content Processing III Relative Mutual Information (RMI)  Use Text to Find Anti-AIDS Actives 14
  • 15. Techniques – Content Processing IV RMI Correlation and Trends  Chemical Trends in Arthritis 15
  • 16. Techniques – Content Processing IV RMI Correlation and Trends  Chemical Trends in Arthritis 16
  • 17. Techniques – Content Processing V Data Mining – Extract Chemical Knowledge/Objects • • Identify and extract chemicals by name, convert them to chemical structures and store them into a chemical database Store related literature in text database 17
  • 18. Techniques – Content Processing V Data Mining – Extract Chemical Knowledge/Objects • • Identify and extract chemicals by name, convert them to chemical structures and store them into a chemical database Store related literature in text database 18
  • 19. Techniques – Delivery and Presentation 19
  • 20. Current Limitations Nature News, 21.03.2013 “Fearful that their content might be freely redistributed, publishers tend to block programs that they find crawling the full text of articles, making no exceptions for users who have paid for access. They give permission only on a case-by-case basis to those who negotiate agreements on access and use.” 20
  • 21. Full Paper Accessibility Limitations General Aspects Old Full Papers (<1990) • Not available as full text PDF, but as scanned images saved as PDFs • OCR software needed to get text Subscriptions • Customers do not subscribe to each and every journal • Only Full Papers that are covered by subscriptions or OA papers are (could be) available for text mining 21
  • 22. Full Paper Accessibility Limitations II Licenses and Copyright • Almost all publishers have clauses in licenses prohibiting any robotic, systematic mining of their websites • This also affects customers with valid subscriptions • Copyright issues prevent bulk downloading and local storage of full papers • • • • Some progress has already been made by publishers Intermediary workarounds like CCC will be used Additional solutions might become available by the end of this year But currently, we are still facing some limitations • Even if there are no copyright and license issues like in the case of Open Access publishers, restrictions still exists that handle if and how frequently you scrape or make calls to the publisher websites 22
  • 23. Full Paper Accessibility Limitations III Hardware Infrastructure • Publishers common websites and web interfaces are generally not equipped to handle extreme traffic • Massive impact on Server infrastructure • Text Mining activities are interfering with their normal web traffic – Bulk download of full papers – On the fly text mining of full papers Clauses protect publishers also for text mining caused breakdowns of their common websites 23
  • 24. Full Paper Accessibility Limitations IV Improper Distribution and Search Capabilities • Full text papers are improper distributed for text mining purposes – Many publishers and sources – Specific readers for each source needed • No standardized APIs to perform searches – – – – In some cases no API at all Different APIs force the development of specific readers No standardized feeds available Additional post-processing needed • Limited Search Capabilities – Only simple text queries are available or simple advanced queries – Semantic/Concept Dictionary-driven searches are not possible  More sophisticated search  Covers more of the relevant information  Limits searches to the important full paper  limitation of the number of paper that have to be downloaded) 24
  • 25. Full Paper Accessibility Limitations V Financial Aspects • Many, sophisticated text mining tools are available and will be used • • • • • by customers. However these tools are costly and sometimes require specialists to operate Downloading tools or additional services as CCC and QUOSA are also costly and even increase the financial expenses Text Mining agreements are often related to additional costs, even if subscription contracts are already in place A text mining-driven full paper download has not the same value as a common, manual full paper download! Most/many papers will be rejected during the text mining approach. Only few papers contain the desired information Text Mining cannot be performed on a financial negotiated case-bycase basis (long-term planning)  text mining tasks are too diverse and occur as part of the daily work In the current practice lies another obstacle in times of budgetary 25 constraints
  • 26. Workarounds – Abstracts, PM Central Use Public Available Information - Abstracts • The greatest success has been made in mining publicly available information such as citations and abstracts from PubMed/Medline • This has been/can be licensed in as a flat file for text mining purpose • Limited to Life Science field • Can be used as starting point to identify promising and valuable full papers Use PubMed Central (OA Repository) • • • • Access to full papers without license restrictions This has been/can be licensed in as downloaded copy for text mining Only 2-3 million papers available, 10% of PubMed/Medline Only Papers after 1990, old articles are not included Other OA Full Papers from OA Publishers • • No copyright issue But, all other restrictions (incl. restricting clauses) 26
  • 27. Workarounds – Customized Solution Publisher-specific, customized Solutions • Evaluate customized text mining opportunities in private projects/contracts between customer and publisher • Pharma custromers have experimented with having access to a publisher’s website to mine at a specified rate during off-hours, and in having feeds or APIs from publishers – Try to prevent interfering with publishers normal web traffic – Try to spare publishers infrastructure • These options come at additional cost and the coverage of published literature has not been comprehensive  therefore most of these efforts have been short-lived 27
  • 28. Workarounds – CCC and QUOSA QUOSA: Scientific Literature Management Software (Elsevier) “QUOSA has been used in a limited fashion by pharma industry at least 7 years as a downloading tool. It works by linking from specified search results to download pdf files for mining. The product has been very problematic in our environment, both in terms of speed and accuracy. PubGet was a competitor, primarily focused on PubMed search results and had limited practicality. PubGet was purchased by the Copyright Clearance Center which plans to develop a new text mining tool, but has nothing to offer at this time.” Copyright Clearance Center (CCC) “The Copyright Clearance Center (CCC) in Danvers, Massachusetts, which works with publishers on rights licensing, is pursuing a more ambitious effort. It would act as an intermediary, collecting publishers’ terms and content and storing them on a website for researchers. It is working with six publishers (including Nature Publishing Group) and with drug and chemical firms eager to mine the literature.” Help to handle the copy rights and other issues (e.g. bulk downloading) with the publishers 28
  • 29. (Ideal) Future Perspective • Use of standardized APIs and feeds (over all publishers incl. OA) • Use mirror sites where tools can operate without interfering with the • • • • • • primary publisher’s website Proper distribution / platforms that cover several publishers (CrossRef) Availability of semantic searches (concept dictionaries) to allow advanced searches  limit number of full paper to retrieve Text Mining should be possible directly between publishers and customers based on existing subscriptions Reasonable costs / Adapted pricing schemes Copyright/License harmonization No case-by-case basis for access, agreements and licenses but a general agreement that will allow text mining approaches in general It’s about having Access and not about costs 29