SlideShare una empresa de Scribd logo
1 de 15
text mining, machine learning,
NLP and all that (in 10 minutes)
Byron C Wallace
Brown Center for Evidence Based Medicine
#CochraneTech
why do we need this stuff?
[Bastian et al, PLoS Medicine 2010]
why do we need this stuff?
[Bastian et al, PLoS Medicine 2010]
PubMed growth
[http://altmetrics.org/wp-content/uploads/2010/10/medline-articles-by-year-lg.png]
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
what can we automate
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
what can we automate
what can we automate?
learner
unlabeled
data U
expert
labeled
data L
predictive
model
abstracts from PubMed search
doctor conducting review
manually screened abstracts
SVM
how does this work?
SVMs
o
x
o
o
o o o
o
o
o
x
x
x
x
x
x xx
x xx
x
support
vectors
margino
bag of words1.2 Supervised M achine Learn
I am a Nigerian
prince writing
to you about an
inheritance...
...
dinner
about
prince
call
...
work
nigerian
yesterday
office
inheritance
...
...
0
1
1
0
...
0
1
0
0
1
...
Figure 1.4: The (binary) Bag-of-Words (BoW) representation.
special considerations for the case
of systematic reviews
• class imbalance – far fewer relevant than
irrelevant abstracts
– asymmetric costs sensitivity more important than
specificity
• reviewer time is scarce and expensive
– better models, fewer labels: active learning and
dual supervision
how do we do?
“Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining”
Genetics in Medicine 2012
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
beyond citation screening
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
beyond citation screening
Questions?
byron_wallace@brown.edu
http://www.cebm.brown.edu/software
www.cebm.brown.edu/byron

Más contenido relacionado

Más de Cochrane.Collaboration

Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Cochrane.Collaboration
 
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus MuellnerCochrane.Collaboration
 
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth GilbertCochrane.Collaboration
 
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain ChalmersCochrane.Collaboration
 
5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd AntesCochrane.Collaboration
 
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang GaissmaierCochrane.Collaboration
 
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane.Collaboration
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane.Collaboration
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane.Collaboration
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processesCochrane.Collaboration
 
Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Cochrane.Collaboration
 
Globalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careGlobalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careCochrane.Collaboration
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processesCochrane.Collaboration
 
Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Cochrane.Collaboration
 
Balancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentBalancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentCochrane.Collaboration
 
Let’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalLet’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalCochrane.Collaboration
 
Evidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerEvidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerCochrane.Collaboration
 
Corporate responsibility for the right to health
Corporate responsibility for the right to healthCorporate responsibility for the right to health
Corporate responsibility for the right to healthCochrane.Collaboration
 

Más de Cochrane.Collaboration (20)

Crowdsourcing and Cochrane
Crowdsourcing and CochraneCrowdsourcing and Cochrane
Crowdsourcing and Cochrane
 
Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013
 
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
 
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
 
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
 
5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes
 
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
 
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies Consultation
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies Consultation
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processes
 
Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...
 
Globalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careGlobalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health care
 
Evidence in the era of globalization
Evidence in the era of globalizationEvidence in the era of globalization
Evidence in the era of globalization
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processes
 
Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...
 
Balancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentBalancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatment
 
Let’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalLet’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journal
 
Evidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerEvidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision maker
 
Corporate responsibility for the right to health
Corporate responsibility for the right to healthCorporate responsibility for the right to health
Corporate responsibility for the right to health
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Text mining, machine learning, NLP and all that (in 10 minutes)

  • 1. text mining, machine learning, NLP and all that (in 10 minutes) Byron C Wallace Brown Center for Evidence Based Medicine #CochraneTech
  • 2. why do we need this stuff? [Bastian et al, PLoS Medicine 2010]
  • 3. why do we need this stuff? [Bastian et al, PLoS Medicine 2010]
  • 5. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data what can we automate
  • 6. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data what can we automate
  • 7. what can we automate?
  • 8. learner unlabeled data U expert labeled data L predictive model abstracts from PubMed search doctor conducting review manually screened abstracts SVM how does this work?
  • 9. SVMs o x o o o o o o o o x x x x x x xx x xx x support vectors margino
  • 10. bag of words1.2 Supervised M achine Learn I am a Nigerian prince writing to you about an inheritance... ... dinner about prince call ... work nigerian yesterday office inheritance ... ... 0 1 1 0 ... 0 1 0 0 1 ... Figure 1.4: The (binary) Bag-of-Words (BoW) representation.
  • 11. special considerations for the case of systematic reviews • class imbalance – far fewer relevant than irrelevant abstracts – asymmetric costs sensitivity more important than specificity • reviewer time is scarce and expensive – better models, fewer labels: active learning and dual supervision
  • 12. how do we do? “Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining” Genetics in Medicine 2012
  • 13. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data beyond citation screening
  • 14. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data beyond citation screening