STI Summit 2011 - LS4 LS Khaos

•

0 recomendaciones•376 vistas

Semantic Technology Institute International

Tecnología Educación

Linked Data and Life Sciences
Riga STI Summit 6,8 july 2011

José F. Aldana Montes

Life Sciences Linked Data

Producing Consuming

Producing Life Sciences Linked Data
(Problems)
Most Linked Open Data is created and provided
without the help of the original data provider who

Almost all Linked Open Data in Life Sciences is provided by Bio2RDF

Producing Life Sciences Linked Data
(Problems)

• Data Base is a life’s work for a biologist and He/she
wants to publish it
– but not to lose the control
• An RDF dump of the DB is cheap
– but supporting Queries and Data Analysis is expensive
– where is the money comming from?
• They are very motivated to add value to the data
– but they are still lacking up to date ICT skills
• Help is wanted to kill Bio2RDF

Almost all Linked Open Data in Life Sciences is provided by Bio2RDF

Consuming Linked Data
• Number of Linked Data repositories will keep growing
• Use of Linked Data in Life Sciences means Linking data
with existing tools which are de facto standards in certain
subdomains:
• Pathways
http://sbmm.uma.es

• Proteins

Consuming Linked Data

• Data Analysis Services not only queries but also Data
Mining, Crawling, and Reasoning are need to engage
community
– BioMedical uses (Pharmaceuticals testing, drug screening)

Consuming Linked Data

• Reasoning, removed to make data reuse possible,
should be re-introduced in some cases over real
complex ontologies with large sets of data
– BioPax Level 3 (Level 4 under development)
• OWL Species: DL
• DL Expressivity: SHIF(D)
• Consistent: Yes
– BioPax Level 3 (4 officially identified databases, more DBs public
data as BioPax Level 3 instances)
• Reactome Database
– 1.54 GB
– 2 980 230 triples
– BioPax Level 2 (9 officially identified databases)
• Previously, data and ontologies should be cleaned up

Consuming Linked Data
• Reasoning Services over real complex ontologies with
large sets of data
– Cost reduction in experiment design
– Hypothesis demonstration/refutation
– Privacy in reasoning with public + private data

Consuming Linked Data

• Reasoning for classification problems
– Disease classification / diagnosis
– Protein identification
– Pathway alignment

Consuming Linked Data

• Digital Data Curation / cross-validation

Consuming Linked Data
• Domain oriented (customizable) user interfaces

Scalability Issues in Life Sciences

• Real scenarios with rich ontologies are starting to
appear:
– BioPax Level 3 4: complex OWL ontology (transitive, reflexive,
inverse and functional properties, restrictions in most of the
classes, 70 classes)
– Big data sets in OWL format (from 20MB to 45GB of data)
– Problems with the data:
• undetected Abox (even Tbox problems) inconsistencies because of
the lack of scalable reasoners
• Lack of SPARQL endpoints to query these data

Summary: Are we losing the war?

• Producing Linked Data in Life Sciences: Some risks and
some needs detected:
– A motivating rewarding schema for the data owner
– Some specific infrastructure (action, facility, institute, foundation,
private…) support could be useful
• to engage data owners,
• to aport tecnnical capability and
• to share costs

Summary: Are we losing the war?
• Consuming Linked Data in Life Sciences Opportunities
– Connecting Linking data with existing tools which are de facto
standards in certain LS subdomains
• to multiply impact
– Not only Queries Services but also Data Analysis Services
(Crawling, Mining, Reasoning, etc.) should be provided to the
community
• but this is expensive for the average DB owner
– Data must be cleaned up, curate and cross-validated
• main thread
– Domain is lacking specific user interfaces
• this is related with the connection of LD to (de facto) standard tools
– In this domain makes sense to reason
• but scalability is still an issue

Linked Data and Life Sciences

José F. Aldana Montes
jfam@lcc.uma.es

Más contenido relacionado

La actualidad más candente

Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble

BioDBCore: Current Status and Next DevelopmentsPascale Gaudet

Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble

Reproducible and citable data and models: an introduction.FAIRDOM

The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp

Reproducible Research: how could Research Objects helpCarole Goble

Introduction to FAIRDOMCarole Goble

FAIR data and model management for systems biology.FAIRDOM

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble

Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas

FAIRy stories: tales from building the FAIR Research CommonsCarole Goble

Payton Eliminating Conflicts in Ebook MetadataNational Information Standards Organization (NISO)

Research Objects, SEEK and FAIRDOMCarole Goble

Better Software, Better ResearchCarole Goble

Small Science: First Impressions of Curation Needs. Presentation at Digital L...Sarah Shreeves

Let’s go on a FAIR safari!Carole Goble

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...Open Science Fair

FAIR Data Management and FAIR Data SharingMerce Crosas

Dataverse for JournalsMerce Crosas

FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble

La actualidad más candente (20)

Reproducibility (and the R*) of Science: motivations, challenges and trends

BioDBCore: Current Status and Next Developments

Reflections on a (slightly unusual) multi-disciplinary academic career

Reproducible and citable data and models: an introduction.

The Dryad Digital Repository: Published data as part of the greater data ecos...

Reproducible Research: how could Research Objects help

Introduction to FAIRDOM

FAIR data and model management for systems biology.

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.

Data Publishing at Harvard's Research Data Access Symposium

FAIRy stories: tales from building the FAIR Research Commons

Payton Eliminating Conflicts in Ebook Metadata

Research Objects, SEEK and FAIRDOM

Better Software, Better Research

Small Science: First Impressions of Curation Needs. Presentation at Digital L...

Let’s go on a FAIR safari!

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...

FAIR Data Management and FAIR Data Sharing

Dataverse for Journals

FAIR Data, Operations and Model management for Systems Biology and Systems Me...

Destacado

Real GroovyAnika Rani

Social security robertsLisa Stack

Understanding your learning stylesLisa Stack

Classroom technology resources_for_integrationLisa Stack

Chapter 3: Constitutional RightsFannett-Metal School District

2011 telangana modelRamana Brf

FL/AL ARVC Internet Marketing PresentationJohn Barron

Planet Earthguest12016d

Destacado (8)

Real Groovy

Social security roberts

Understanding your learning styles

Classroom technology resources_for_integration

Chapter 3: Constitutional Rights

2011 telangana model

FL/AL ARVC Internet Marketing Presentation

Planet Earth

Similar a STI Summit 2011 - LS4 LS Khaos

Biological data bioinformatics AakifahAmreen

EiTESAL eHealth Conference 14&15 May 2017 EITESANGO

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm

Composite protein databasesShritilekhaDash

HKU Data Curation MLIM7350 Class 9 Scott Edmunds

e-Science, Research Data and LibariesRob Grim

The biodiversity informatics landscape: a systematics perspectiveVince Smith

Data publishing at the UQ LibraryARDC

BioMed Central's open data initiativesiainh_z

eROSA Stakeholder WS1: Data discovery through federated dataset cataloguese-ROSA

GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson

ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace

The Biodiversity Informatics LandscapeVince Smith

Data discovery through federated dataset catalogsValeria Pesce

Cross-Community User Requirements and the Biodiversity Heritage LibraryChris Freeland

Designing Biological DatabasesArjei Balandra

Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster

Big data challenges associated with building a national data repository for c...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone

David Van Enckevort - FAIR sample and data access DataSciSIG

Similar a STI Summit 2011 - LS4 LS Khaos (20)

Biological data bioinformatics

EiTESAL eHealth Conference 14&15 May 2017

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...

Composite protein databases

HKU Data Curation MLIM7350 Class 9

e-Science, Research Data and Libaries

The biodiversity informatics landscape: a systematics perspective

Data publishing at the UQ Library

BioMed Central's open data initiatives

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

GARNet workshop on Integrating Large Data into Plant Science

ESI Supplemental Webinar 2 - DataONE presentation slides

The Biodiversity Informatics Landscape

Data discovery through federated dataset catalogs

Cross-Community User Requirements and the Biodiversity Heritage Library

Designing Biological Databases

Materials Data Facility: Streamlined and automated data sharing, discovery, ...

Big data challenges associated with building a national data repository for c...

NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data

David Van Enckevort - FAIR sample and data access

Más de Semantic Technology Institute International

Summit2013 sw in russian universitiesSemantic Technology Institute International

Summit2013 semantic web in russiaSemantic Technology Institute International

Summit2013 john domingue - introductionSemantic Technology Institute International

Summit2013 john domingue - horizon2020Semantic Technology Institute International

Summit2013 ho-jin choi - summit2013Semantic Technology Institute International

Summit2013 georg gottlob and tim furche - diademSemantic Technology Institute International

Summit2013 eventos onto quadSemantic Technology Institute International

Summit2013 choi - wise kb-introdSemantic Technology Institute International

Summit2013 choi - kaist-cs-introSemantic Technology Institute International

STI Summit 2011 - ConclusionSemantic Technology Institute International

STI Summit 2011 - Dynamic webSemantic Technology Institute International

STI Summit 2011 - Mlr-smSemantic Technology Institute International

STI Summit 2011 - Linked data-services-streamsSemantic Technology Institute International

STI Summit 2011 - Linked servicesSemantic Technology Institute International

STI Summit 2011 - di@scaleSemantic Technology Institute International

STI Summit 2011 - A personal look at the future of Semantic TechnologiesSemantic Technology Institute International

STI Summit 2011 - Visual analytics and linked dataSemantic Technology Institute International

STI Summit 2011 - Making linked data workSemantic Technology Institute International

STI Summit 2011 - ShortipediaSemantic Technology Institute International

STI Summit 2011 - Beyond privacySemantic Technology Institute International

Más de Semantic Technology Institute International (20)

Summit2013 sw in russian universities

Summit2013 semantic web in russia

Summit2013 john domingue - introduction

Summit2013 john domingue - horizon2020

Summit2013 ho-jin choi - summit2013

Summit2013 georg gottlob and tim furche - diadem

Summit2013 eventos onto quad

Summit2013 choi - wise kb-introd

Summit2013 choi - kaist-cs-intro

STI Summit 2011 - Conclusion

STI Summit 2011 - Dynamic web

STI Summit 2011 - Mlr-sm

STI Summit 2011 - Linked data-services-streams

STI Summit 2011 - Linked services

STI Summit 2011 - di@scale

STI Summit 2011 - A personal look at the future of Semantic Technologies

STI Summit 2011 - Visual analytics and linked data

STI Summit 2011 - Making linked data work

STI Summit 2011 - Shortipedia

STI Summit 2011 - Beyond privacy

Último

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

A Call to Action for Generative AI in 2024Results

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

STI Summit 2011 - LS4 LS Khaos

1. Linked Data and Life Sciences Riga STI Summit 6,8 july 2011 José F. Aldana Montes

2. Life Sciences Linked Data Producing Consuming

3. Producing Life Sciences Linked Data (Problems) Most Linked Open Data is created and provided without the help of the original data provider who Almost all Linked Open Data in Life Sciences is provided by Bio2RDF

4. Producing Life Sciences Linked Data (Problems) • Data Base is a life’s work for a biologist and He/she wants to publish it – but not to lose the control • An RDF dump of the DB is cheap – but supporting Queries and Data Analysis is expensive – where is the money comming from? • They are very motivated to add value to the data – but they are still lacking up to date ICT skills • Help is wanted to kill Bio2RDF Almost all Linked Open Data in Life Sciences is provided by Bio2RDF

5. Consuming Linked Data • Number of Linked Data repositories will keep growing • Use of Linked Data in Life Sciences means Linking data with existing tools which are de facto standards in certain subdomains: • Pathways http://sbmm.uma.es • Proteins

6. Consuming Linked Data • Data Analysis Services not only queries but also Data Mining, Crawling, and Reasoning are need to engage community – BioMedical uses (Pharmaceuticals testing, drug screening)

7. Consuming Linked Data • Reasoning, removed to make data reuse possible, should be re-introduced in some cases over real complex ontologies with large sets of data – BioPax Level 3 (Level 4 under development) • OWL Species: DL • DL Expressivity: SHIF(D) • Consistent: Yes – BioPax Level 3 (4 officially identified databases, more DBs public data as BioPax Level 3 instances) • Reactome Database – 1.54 GB – 2 980 230 triples – BioPax Level 2 (9 officially identified databases) • Previously, data and ontologies should be cleaned up

8. Consuming Linked Data • Reasoning Services over real complex ontologies with large sets of data – Cost reduction in experiment design – Hypothesis demonstration/refutation – Privacy in reasoning with public + private data

9. Consuming Linked Data • Reasoning for classification problems – Disease classification / diagnosis – Protein identification – Pathway alignment

10. Consuming Linked Data • Digital Data Curation / cross-validation

11. Consuming Linked Data • Domain oriented (customizable) user interfaces

12. Scalability Issues in Life Sciences • Real scenarios with rich ontologies are starting to appear: – BioPax Level 3 4: complex OWL ontology (transitive, reflexive, inverse and functional properties, restrictions in most of the classes, 70 classes) – Big data sets in OWL format (from 20MB to 45GB of data) – Problems with the data: • undetected Abox (even Tbox problems) inconsistencies because of the lack of scalable reasoners • Lack of SPARQL endpoints to query these data

13. Summary: Are we losing the war? • Producing Linked Data in Life Sciences: Some risks and some needs detected: – A motivating rewarding schema for the data owner – Some specific infrastructure (action, facility, institute, foundation, private…) support could be useful • to engage data owners, • to aport tecnnical capability and • to share costs

14. Summary: Are we losing the war? • Consuming Linked Data in Life Sciences Opportunities – Connecting Linking data with existing tools which are de facto standards in certain LS subdomains • to multiply impact – Not only Queries Services but also Data Analysis Services (Crawling, Mining, Reasoning, etc.) should be provided to the community • but this is expensive for the average DB owner – Data must be cleaned up, curate and cross-validated • main thread – Domain is lacking specific user interfaces • this is related with the connection of LD to (de facto) standard tools – In this domain makes sense to reason • but scalability is still an issue

15. Linked Data and Life Sciences José F. Aldana Montes jfam@lcc.uma.es

STI Summit 2011 - LS4 LS Khaos

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a STI Summit 2011 - LS4 LS Khaos

Similar a STI Summit 2011 - LS4 LS Khaos (20)

Más de Semantic Technology Institute International

Más de Semantic Technology Institute International (20)

Último

Último (20)

STI Summit 2011 - LS4 LS Khaos