The document discusses text and data mining (TDM) projects in Europe. It describes how TDM can be used to understand the past by mining historical books, predict the future by mining newspapers, and save lives by mining scientific publications about diseases. It also outlines some current barriers to TDM in Europe like a lack of awareness, skills and tools, licensing and copyright issues. Two EU projects are highlighted: FutureTDM which aims to identify TDM barriers and policy solutions, and OpenMinTeD which builds a collaborative TDM infrastructure.
3. Text and data mining is
the future
“Text and data mining (TDM) is the
process of deriving information from
machine-read material. It works by
copying large quantities of material,
extracting the data, and recombining it
to identify patterns.”
JISC
Projects funded by
@openminted_eu
@futuretdm
4. Text and data mining
helps us understand the
past
Mining historical
books:
the evolution of
language
Source: http://www.sciencemag.org/content/331/6014/176 (Baylor College of Medicine, Houston)
Projects funded by
@openminted_eu
@futuretdm
5. Text and data mining
predicts the future
Mining newspapers:
Predicts revolutions
Source: http://journals.uic.edu/ojs/index.php/fm/article/view/3663/3040 (University of Illinois)
Projects funded by
@openminted_eu
@futuretdm
6. Text and data mining
saves the future
Mining scientific
publications about
diseases:
Save lives
Source: http://dl.acm.org/citation.cfm?id=2623667 (Baylor College of Medicine, Houston)
Projects funded by
@openminted_eu
@futuretdm
7. Text mining – it seems so easy:
Linguistic
Analysis:
Entity
Recognition
Data Mining
Knowledge
Discovery
Information
Extraction
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Information
Retrieval
Projects funded by
@openminted_eu
@futuretdm
8. But it actually poses many
challenges…
?
?
?
?
?
?
?
??
?? ?
?
??
?
?
How do I
make my texts
readable by
machines?
?Which mining
method to
use?
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Where do I
find data?
Projects funded by
@openminted_eu
@futuretdm
9. 9
Current Barriers in Europe
Awareness across Institutions & Stakeholders
Lack of awareness among research
communities
Lack of guidance to uncover TDM potential
Skills and Tools
Availability and accessibility across disciplines
Gap in skills across various sectors
Licensing & Open Access
License proliferation and interoperability
issues
License barriers to transparent open access
Copyright and Data Protection
TDM activities infringing current copyright laws
Legal and policy limitations and barriers for
TDM
Projects funded by
@openminted_eu
@futuretdm
10. EU PROJECTS on TDM
FutureTDM
Identify TDM
barriers and
policy solutions
Open mine
Build a TDM
eInfrastructure
Projects funded by
@openminted_eu
@futuretdm
11. ELABORATE a legal and
policy framework for future
TDM and specify a research
agenda to foster the spread
of TDM
BUILD a website: a
Collaborative
Knowledge Base and
an Open Information
Hub combined
ANALYSE current
application areas and best
practices in TDM
ASSESS existing
studies, legal
regulations and
policies on TDM
Main Objectives of FutureTDM
INVOLVE all key
stakeholders to
identify practices,
requirements, and
specific challenges
INCREASE
awareness of
TDM to attract
new target
groups and
science domains
@openminted_eu
@futuretdm
This project has received funding from the European Union’s Horizon 2020
Research and Innovation Programme under Grant Agreement No 665940.
13. Data centre Data centre Data centre Data centre
in public cloud
Publisher text
corpus
OpenAIRE/CORE text
corpus
PMC text
corpus
Other text
corpora
Other text
corpora
Other text
corpora
Other types of text
corpora
Layer 3:
Interoperability
to shared storage and
computing resources
Language resources
Language resources
Language resources Language resources
Layer 2:
Interoperability of
language resources
& corpora
Layer 1:
Interoperability
of text mining services
(platforms or
components)
Language resources and corpora registry service
Platform services Registry Workflow ManagementAuth2 & Policy management Annotator Accounting
Mining Platforms Mining Platforms Mining Platforms
Proprietary architectures
Mining Platforms
Objective of OpenMinTeD
@openminted_eu
Projects funded by@futuretdm
14. OpenMinTeD brings together:
14
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
TDM
COMMUNITIES
VALUE ADDED
APPS
Via standardised programmatic
interfaces and access rules
Easily discoverable text mining
services and workflows which
process, analyse and annotate text
Operate on public e-Infrastructures
via standarized APIs
Different scientific communities
have different challenges
Community-driven applications to
illustrate the value of the
infastructure. Engage with industry.
OPENMINTED = The Open Mining Infrastructure for Text and Data
15. Become involved
Follow us on Twitter for the latest updates and blogs
@openminted_eu
@futuretdm
Follow our websites
www.openminted.eu
www.futuretdm.eu
Projects funded by
@openminted_eu
@futuretdm
16. THANK YOU
• Athena RIC
• Univ. of Manchester (NacTem)
• Univ. of Darmstadt
• INRA
• EMBL-EBI
• Agro-Know
• LIBER
• Univ. of Amsterdam
• Open University UK
• EPFL
• CNIO
• Univ. of Sheffield (GATE)
• GESIS
• GRNET
• Frontiers
• Univ. of Stirling
PARTNERS OPENMINTEDPARTNERS FUTURETDM
• SYNYO GmbH (SYNYO)
• LIBER Europe
• Open Knowledge Foundation
LBG (OK/CM)
• Radboud Univ. Nijmegen
• The British Library Board
• Univ. of Amsterdam
• Athena RIC
• Ubiquity Press
• Fundacja Projekt: Polska (FPP)