SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Processing of scientific data
From field capture to web delivery

Hector Quintero Casanova

Postgraduate in e-Science
Why e-Science? Data-intensive

●

GMEP ticks all the boxes:
✔ Highly multidisciplinary: social, landscape, water, birds
plants...
✔ Large volumes of data: covers the whole of Wales.
✔ Cross-organisational collaboration: 13 institutions.
Why e-Science? Metadata
●

NERC's data policy says it all
–

●

“It is essential that metadata are submitted”

Metadata = context information about data
–

Provenance = who, when, where, how
●

–

Workflow = how. Essential if using models
●

●

Exposes data relationships → traceability

Enables reproducing outcome → repeatability

Exactly what information depends on the stage.
Data collection
●

Raw data from the field
–

Metadata: method, calibration, place, units...
Data analysis

●

Information products: e.g. data from models
–

Metadata: name, conditions, where it applies
Data analysis
●

Workflow metadata avoids costly reruns
–

●

Identify model output needed → reuse

But not enough for cross-organisation collab.
–
–

●

13 institutions in Glastir.
Differences in storage structure, metadata defs...

Need extra layer(s) for seamless access
–

Web already offers tools needed.
Publication: linked data
●

HTTP for generic retrieval of resources

●

URIs for unique identification of those resources
–

●

E.g. http://www.ceh.ac.uk

Both can be used to build web services
–
–

●

Amount to remote functions.
Eg: seamless recording of workflows across institutions.

Semantics for automated reasoning
–

Acts as standardised metadata aimed at machines.
… We've come full circle!

¿?
Thank you
www.hqcasanova.com

Hector Quintero Casanova
Postgraduate in e-Science

Más contenido relacionado

Similar a Processing of scientific data: from field capture to web delivery

Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 Shanghai
Victoria López
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
GyanaKarn
 

Similar a Processing of scientific data: from field capture to web delivery (20)

BigDataGrapes_Wine Making Pilot
BigDataGrapes_Wine Making Pilot BigDataGrapes_Wine Making Pilot
BigDataGrapes_Wine Making Pilot
 
Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 Shanghai
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected home
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Présentation de F. Joudelat Congrès IIRB février 2024
Présentation de F. Joudelat Congrès IIRB février 2024Présentation de F. Joudelat Congrès IIRB février 2024
Présentation de F. Joudelat Congrès IIRB février 2024
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
01 intro
01 intro01 intro
01 intro
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
Webinar@AIMS: Big Data challenges and solutions in agricultural and environme...
 
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesIncreasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
Research Data Overview
Research Data OverviewResearch Data Overview
Research Data Overview
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Sdsc pi-mtg-ecss-sgci-7-12-16
Sdsc pi-mtg-ecss-sgci-7-12-16Sdsc pi-mtg-ecss-sgci-7-12-16
Sdsc pi-mtg-ecss-sgci-7-12-16
 
Himansu sahoo resume-ds
Himansu sahoo resume-dsHimansu sahoo resume-ds
Himansu sahoo resume-ds
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Processing of scientific data: from field capture to web delivery

  • 1. Processing of scientific data From field capture to web delivery Hector Quintero Casanova Postgraduate in e-Science
  • 2. Why e-Science? Data-intensive ● GMEP ticks all the boxes: ✔ Highly multidisciplinary: social, landscape, water, birds plants... ✔ Large volumes of data: covers the whole of Wales. ✔ Cross-organisational collaboration: 13 institutions.
  • 3. Why e-Science? Metadata ● NERC's data policy says it all – ● “It is essential that metadata are submitted” Metadata = context information about data – Provenance = who, when, where, how ● – Workflow = how. Essential if using models ● ● Exposes data relationships → traceability Enables reproducing outcome → repeatability Exactly what information depends on the stage.
  • 4. Data collection ● Raw data from the field – Metadata: method, calibration, place, units...
  • 5. Data analysis ● Information products: e.g. data from models – Metadata: name, conditions, where it applies
  • 6. Data analysis ● Workflow metadata avoids costly reruns – ● Identify model output needed → reuse But not enough for cross-organisation collab. – – ● 13 institutions in Glastir. Differences in storage structure, metadata defs... Need extra layer(s) for seamless access – Web already offers tools needed.
  • 7. Publication: linked data ● HTTP for generic retrieval of resources ● URIs for unique identification of those resources – ● E.g. http://www.ceh.ac.uk Both can be used to build web services – – ● Amount to remote functions. Eg: seamless recording of workflows across institutions. Semantics for automated reasoning – Acts as standardised metadata aimed at machines.
  • 8. … We've come full circle! ¿?
  • 9. Thank you www.hqcasanova.com Hector Quintero Casanova Postgraduate in e-Science