1. OpenDataMonitor
Horizon 2020
Coordination and Support Action
GARRI-3-2014 Scientific Information in the Digital Age: Text and Data Mining (TDM)
Project number: 665940
Flash presentations/ Demos
FutureTDM
Reducing Barriers and Increasing Uptake of Text and Data Mining for Research Environments
using a Collaborative Knowledge and Open Information Approach
FutureTDM Symposium, Salzburg
June 13th, 2017
4. TDM Use-Case Tutorial
Demoing the three text data mining tutorials from ContentMine
▪ TDM for Pandemics with Zika
▪ Systematic Literature Review
▪ P-Cracking: finding statistical measures
4FutureTDM
6. Plazi
Liberating and disseminating biodiversity data from scientific publications
▪ Issue:
▪ continually growing corpus of 500 Million pages of scientific literature covering the description of the world’s
living diversity. E.g. > 17,000 new species description published every year
▪ Only incomplete data on publications, even less included facts
▪ Challenge: Provide real time, and promote access to ongoing as well as legacy publications
▪ Solutions:
▪ 1. Provide and maintain a TDM workflow to find articles, extract and disseminate facts (Plazi workflow)
▪ 2. Promote journal production workflows to create semantically enhanced publications upfront (e.g.
TaxPub/JATS based Pensoft workflow)
6FutureTDM
Treatment
Bank
Data mine, text
extraction
& markup
store &
access
Biodiversity
Literature
Repository
Persistent, resolvable identifers minted for:
• Articles: DOI (if no DOI exists)
• Treatments: httpURI
• Illustrations: DOI
9. OpenDataMonitor
Horizon 2020
Coordination and Support Action
GARRI-3-2014 Scientific Information in the Digital Age: Text and Data Mining (TDM)
Project number: 665940
Flash presentations/ Demos
CORE - Bringing science to all …
FutureTDM Symposium, Salzburg
June 13th, 2017
10. CORE – Millions of research papers ready to text mine
10FutureTDM
11. CORE – Millions of research papers ready to text mine
11FutureTDM
13. RapidMiner – Unified Open Source Data Science Platform
13FutureTDM
DATA MASHUP
ENGINE
MODERN, AGILE ENTERPRISE PLATFORM
Ingestion
Blending
Cleansing
Best Practice
Recommendations
Unified Workflows Intelligent Utilization
In Hadoop In-Memory on Desktop or Server In Database Web Services Process Scheduler Web Apps
PRESCRIPTIVE
DECISION ENGINE
Diagnostic
Relationships
Predictive Insights
Prescriptive Actions
Business
Processes &
Applications
OPERATIONA-
LIZATION ENGINE
High-Velocity
Scoring
Honest Validation
Process Integration
Automation Services
WISDOM OF CROWDS ADVISOR
EFFORTLESS WORKFLOW DESIGNER
FEDERATED ANALYTICS DRIVER
Marketplace Innovations & Extensions
Any
Data Source
Data at Rest and
Data in Motion
14. RapidMiner – Open Source Data Science Platform
14FutureTDM
Lightning Fast: Visual interface
for rapidly building complete
analytic workflows
Powerful: Rich library of
algorithms and functions to
build the strongest possible
model for any use case
Open & Extensible: Open source
innovation keeps pace with
changing business needs
Unified Platform: Seamlessly integrates structured and
unstructured data from all types of sources as well as machine
learning algorithms by RapidMiner, R, Python, H2O, Hadoop,
Spark, PySpark, SparkR, SparkRM, etc. in a single visual
platform and allows easy deployment on-Server, in-Hadoop,
in- Cloud, as web services, in web apps, via Java API, etc.
16. CLARIN and CLARIN:EL
CLARIN integrates
▪ Language Datasets: digital content of any medium (text, sound, image,
video), raw and annotated, lexica, ontologies, grammars etc.
▪ Language Technology tools: lemmatizers, taggers, term extractors,
sentiment annotators, summarizers, etc.
in a federation of trusted repositories
• available to researchers
▪ through national networks of organizations in each country
(today: 21 member-countries, 42 certified centers)
CLARIN:EL (www.clarin.gr)
• the Greek Language Resources, Tools/ Services Infrastructure
• for documenting, sharing and processing language data
16FutureTDM
19. ALCIDE
Online platform to perform temporal, geographical, and linguistic analysis of
historical documents.
▪ Extract information
▪ State of the art Human Language Technologies
▪ Tint (for Italian)
▪ Stanford CoreNLP (for English)
▪ Visualise data
▪ Intuitive and understandable data representation
19FutureTDM
20. Flash Presentations / Demos
Find out more!
Join us in the Demo Session
Thank you!
20FutureTDM