SlideShare una empresa de Scribd logo
1 de 15
Table mining and data 
curation from biomedical 
literature 
Nikola Milosevic 
Supervisors: Dr Goran Nenadic, Robert Hernandez
Why are we doing this? 
 Growth of published research
Information growth
Text mining 
 Text mining developed tools and 
methods to help scientists 
 Focused mainly on the body of the 
article 
 Tables and figures are typically 
ignored
What about tables?
What about tables?
Challenge 
 Visually structured text 
 May be ungrammatical and 
ambiguous 
 Various layouts 
 Value representation types 
◦ Numeric 
◦ Text 
◦ Ranges 
◦ Formulas 
◦ Complex
Method overview
Method overview
Table decomposition 
 Aim: Decompose table into the 
structures suitable for further processing 
 Cell structures that keep information 
about navigational path (headers, stubs, 
etc.) 
 Heuristic based approach 
 Cell structure, alignment, content, 
neigbourhood
Table decomposition
Information extraction 
 Performed a number of experiments 
 Extraction of number of patients, 
weight, BMI 
 Approaches: 
◦ Rules 
◦ Metamap 
◦ White and black lists
Results 
 Achieved promising results 
 Some of the information classes are 
easier to extract than other
Conclusion & Future work 
 Information extraction from tables is 
feasible 
 Future work: 
◦ Value and table type categorisation 
◦ Development of normalization and 
extraction engine 
◦ Extraction rules 
◦ Data storing format (triple store, linked 
data) 
◦ Data curation interface 
◦ Data querying interface
Thank you! Q&A 
Email: nikola.milosevic@cs.man.ac.uk

Más contenido relacionado

Similar a Table mining and data curation from biomedical literature

Student Research Session 4
Student Research Session 4Student Research Session 4
Student Research Session 4
englishonecfl
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
NASIG
 
Data analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptxData analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptx
Juma675663
 
Advantages And Disadvantages Of Chronic Kidney Disease
Advantages And Disadvantages Of Chronic Kidney DiseaseAdvantages And Disadvantages Of Chronic Kidney Disease
Advantages And Disadvantages Of Chronic Kidney Disease
Karen Oliver
 
DataGathering-Qualitative and Quantitative
DataGathering-Qualitative and QuantitativeDataGathering-Qualitative and Quantitative
DataGathering-Qualitative and Quantitative
Sreenivas Ravi
 

Similar a Table mining and data curation from biomedical literature (20)

Introduction to systematic reviews
Introduction to systematic reviewsIntroduction to systematic reviews
Introduction to systematic reviews
 
Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology Practical applications and analysis in Research Methodology
Practical applications and analysis in Research Methodology
 
Data Extraction for Systematic Reviews - Dr Ekpereonne Esu
Data Extraction for Systematic Reviews - Dr Ekpereonne EsuData Extraction for Systematic Reviews - Dr Ekpereonne Esu
Data Extraction for Systematic Reviews - Dr Ekpereonne Esu
 
Data processing.pdf
Data processing.pdfData processing.pdf
Data processing.pdf
 
Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptx
 
Student Research Session 4
Student Research Session 4Student Research Session 4
Student Research Session 4
 
Data Extraction
Data ExtractionData Extraction
Data Extraction
 
Research Methodology
Research Methodology  Research Methodology
Research Methodology
 
Transparency in Data Analysis
Transparency in Data AnalysisTransparency in Data Analysis
Transparency in Data Analysis
 
Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
 
Systematic Review
Systematic ReviewSystematic Review
Systematic Review
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
Data analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptxData analysis plan in medicine and nurse.pptx
Data analysis plan in medicine and nurse.pptx
 
Advantages And Disadvantages Of Chronic Kidney Disease
Advantages And Disadvantages Of Chronic Kidney DiseaseAdvantages And Disadvantages Of Chronic Kidney Disease
Advantages And Disadvantages Of Chronic Kidney Disease
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and Tabulation
 
LEAD 901 Chapter 8
LEAD 901 Chapter 8LEAD 901 Chapter 8
LEAD 901 Chapter 8
 
DataGathering-Qualitative and Quantitative
DataGathering-Qualitative and QuantitativeDataGathering-Qualitative and Quantitative
DataGathering-Qualitative and Quantitative
 
Systematic review my presentation.pptx
Systematic review my presentation.pptxSystematic review my presentation.pptx
Systematic review my presentation.pptx
 
Grounded theory new
Grounded theory newGrounded theory new
Grounded theory new
 

Más de Nikola Milosevic

Software Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenjaSoftware Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenja
Nikola Milosevic
 

Más de Nikola Milosevic (20)

Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Veštačka inteligencija
Veštačka inteligencijaVeštačka inteligencija
Veštačka inteligencija
 
AI an the future of society
AI an the future of societyAI an the future of society
AI an the future of society
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
Equity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningEquity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learning
 
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
BelBi2016 presentation: Hybrid methodology for information extraction from ta...BelBi2016 presentation: Hybrid methodology for information extraction from ta...
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
 
Supporting clinical trial data curation and integration with table mining
Supporting clinical trial data curation and integration with table miningSupporting clinical trial data curation and integration with table mining
Supporting clinical trial data curation and integration with table mining
 
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP SeraphimdroidMobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
 
Serbia2
Serbia2Serbia2
Serbia2
 
Malware
MalwareMalware
Malware
 
Sentiment analysis for Serbian language
Sentiment analysis for Serbian languageSentiment analysis for Serbian language
Sentiment analysis for Serbian language
 
Http and security
Http and securityHttp and security
Http and security
 
Android business models
Android business modelsAndroid business models
Android business models
 
Android(1)
Android(1)Android(1)
Android(1)
 
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
 
Mašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jezikuMašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jeziku
 
Malware
MalwareMalware
Malware
 
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
 
Software Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenjaSoftware Freedom day Serbia - Owasp open source resenja
Software Freedom day Serbia - Owasp open source resenja
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Table mining and data curation from biomedical literature

  • 1. Table mining and data curation from biomedical literature Nikola Milosevic Supervisors: Dr Goran Nenadic, Robert Hernandez
  • 2. Why are we doing this?  Growth of published research
  • 4. Text mining  Text mining developed tools and methods to help scientists  Focused mainly on the body of the article  Tables and figures are typically ignored
  • 7. Challenge  Visually structured text  May be ungrammatical and ambiguous  Various layouts  Value representation types ◦ Numeric ◦ Text ◦ Ranges ◦ Formulas ◦ Complex
  • 10. Table decomposition  Aim: Decompose table into the structures suitable for further processing  Cell structures that keep information about navigational path (headers, stubs, etc.)  Heuristic based approach  Cell structure, alignment, content, neigbourhood
  • 12. Information extraction  Performed a number of experiments  Extraction of number of patients, weight, BMI  Approaches: ◦ Rules ◦ Metamap ◦ White and black lists
  • 13. Results  Achieved promising results  Some of the information classes are easier to extract than other
  • 14. Conclusion & Future work  Information extraction from tables is feasible  Future work: ◦ Value and table type categorisation ◦ Development of normalization and extraction engine ◦ Extraction rules ◦ Data storing format (triple store, linked data) ◦ Data curation interface ◦ Data querying interface
  • 15. Thank you! Q&A Email: nikola.milosevic@cs.man.ac.uk