SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
PROBLEM
Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.

WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.

Fig. 1: Problem - biologist
searching for an entity

META-SEARCH
INTERFACE

GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA

RE
VE
RS

CLUSTERING
SEARCH ENTITIES
AND LABELS

FUTURE WORK

EE
INE
NG

In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.

ER

MAPPING WI
WITH
METAMODEL

WI MetaModel

ING

We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.

HYPOTHESIS & ASSUMPTIONS

Fig.2: WI Metamodel

www.ischool.drexel.edu

INFORMATION
RETRIEVAL

INFORMATION
EXTRACTION

OUR SOLUTION

OLDB

OLDB

OLDB

The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.

METHODOLOGY
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?

CURRENT AND PREDICTED RESULTS

OLDB

OLDB

Fig. 3: Methodology

REFERENCES
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,…, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.

1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 419.

Más contenido relacionado

La actualidad más candente

From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Catherine Canevet
 

La actualidad más candente (20)

DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
FAIR data and the Etsin service
FAIR data and the Etsin serviceFAIR data and the Etsin service
FAIR data and the Etsin service
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource Discovery
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
NCBO Overview and Biositemaps
NCBO Overview and BiositemapsNCBO Overview and Biositemaps
NCBO Overview and Biositemaps
 
2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSSWheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 

Destacado

Word Document Format
Word Document FormatWord Document Format
Word Document Format
butest
 
Dn13 u3 a18_hbra
Dn13 u3 a18_hbraDn13 u3 a18_hbra
Dn13 u3 a18_hbra
Raul13_11
 
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
The Children's Hospital of Philadelphia
 
An atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lensesAn atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lenses
Sérgio Sacani
 
1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss
Kees De Jager
 
Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602
Sérgio Sacani
 
Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822
Sérgio Sacani
 

Destacado (20)

Word Document Format
Word Document FormatWord Document Format
Word Document Format
 
Trust or Control ?
Trust or Control ? Trust or Control ?
Trust or Control ?
 
Dn13 u3 a18_hbra
Dn13 u3 a18_hbraDn13 u3 a18_hbra
Dn13 u3 a18_hbra
 
Unwrapping a standard2
Unwrapping a standard2Unwrapping a standard2
Unwrapping a standard2
 
WebShoppers 22ª Edição
WebShoppers 22ª EdiçãoWebShoppers 22ª Edição
WebShoppers 22ª Edição
 
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
 
Summary to cv
Summary to cvSummary to cv
Summary to cv
 
2 bra aktier för den långsiktige
2 bra aktier för den långsiktige2 bra aktier för den långsiktige
2 bra aktier för den långsiktige
 
Eclass Model
Eclass ModelEclass Model
Eclass Model
 
Collaborative and agile development of mobile applications
Collaborative and agile development of mobile applicationsCollaborative and agile development of mobile applications
Collaborative and agile development of mobile applications
 
An atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lensesAn atlas of_predicted_exotic_gravitational_lenses
An atlas of_predicted_exotic_gravitational_lenses
 
Outlook
OutlookOutlook
Outlook
 
2001 mnras 32-452-instabregions
2001 mnras 32-452-instabregions2001 mnras 32-452-instabregions
2001 mnras 32-452-instabregions
 
1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss1988 a+a 203-355-vrot-massloss
1988 a+a 203-355-vrot-massloss
 
Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602Star formation history_in_the_smc_the_case_of_ngc602
Star formation history_in_the_smc_the_case_of_ngc602
 
Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822Three newly discovered_globular_clusters_in_ngc6822
Three newly discovered_globular_clusters_in_ngc6822
 
Publicar banners (wordpress)
Publicar banners (wordpress)Publicar banners (wordpress)
Publicar banners (wordpress)
 
RFID in Austria
RFID in AustriaRFID in Austria
RFID in Austria
 
Versão 1.66
Versão 1.66Versão 1.66
Versão 1.66
 
Report
ReportReport
Report
 

Similar a iBioSearch: The Integrated Biological Database Search

NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
Suman Lama
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD Editor
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 

Similar a iBioSearch: The Integrated Biological Database Search (20)

Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Semantic Conflicts and Solutions in Integration of Fuzzy Relational DatabasesSemantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
 
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSTWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
 
NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Bio4j
Bio4jBio4j
Bio4j
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

iBioSearch: The Integrated Biological Database Search

  • 1. iBioSearch: The Integrated Biological Database Search Ritu Khare and Yuan An PROBLEM Presence, of a very large number of biological Web databases and their interfaces, makes it difficult for biologists to search for any biological entity (See Fig. 1). Currently, the only option biologists have is to search each of these numerous interfaces individually. WI Metamodel: We observe that all input Web Interfaces (WIs) have an underlying global model. We created this global model manually and termed it as the "WI Metamodel". See Fig. 2. WI: Every Web Interface (WI) can be represented as an instance of the metamodel. Fig. 1: Problem - biologist searching for an entity META-SEARCH INTERFACE GENERATION OF GLOBAL BIOLOGICAL WI SCHEMA RE VE RS CLUSTERING SEARCH ENTITIES AND LABELS FUTURE WORK EE INE NG In future, we intend to dynamically update biological databases repository, maintain semantic mappings when base databases evolve, translate user queries, and consolidate, reconcile, and rank the query results using data cleansing and relevance computing algorithms. In addition to this, our plan includes performing usability testing of iBioSearch system with the help of biologists. ER MAPPING WI WITH METAMODEL WI MetaModel ING We aim to provide a unified search interface with capability of searching multiple (1000+) biological databases. This interface would be a representation of the biological search interface ontology. For finding the global search ontology, we take a novel approach of reverse engineering individual search interface into a conceptual model, and then finding an integrated model that would be consistent with all the interfaces up to a level of significance. HYPOTHESIS & ASSUMPTIONS Fig.2: WI Metamodel www.ischool.drexel.edu INFORMATION RETRIEVAL INFORMATION EXTRACTION OUR SOLUTION OLDB OLDB OLDB The GBWS or ontology could be represented as a meta-search interface for biologists wherein they can search for most of the biological entities on several search criteria available on different databases. Eventually, we aim to find the answers to other research questions such as: 1. Differences between commercial and biological databases. 2. Automatic identification of biological search interfaces. 3. Reverse Engineering of a WI into an ER diagram. 4. Integration of multiple ER diagrams 5. Extracting relationships between biological search entities. METHODOLOGY Which interface to search? Which database to access? What all search criteria do I have? How many sources to consider? CURRENT AND PREDICTED RESULTS OLDB OLDB Fig. 3: Methodology REFERENCES 1. Web Interface (Wis) Collection: Collect WIs to biological databases. 2. Information Extraction: For each WI, extract attributes corresponding to the WI metamodel. Broadly, a WI can be represented as a collection of search entities and their respective labels (search criteria). 3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate the instances of the metamodel. Then, we have a list of search entities and their respective criteria (labels). For a given search entity Si , there will be label set (li1, li2, li3,…, lim). 4. Clustering: Find non-overlapping classes of search entities representing synonyms, and for each class, find a list of non-redundant labels. 5. Generation of GBWS: Eventually, we generate another conceptual model that we call as a “Global Biological WI Schema“ (GBWS). It would represent all possible input WIs in a non-redundant manner, and capture matchings between individual instances of the WI metamodel. 1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data , San Diego, California. 337-348. 2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing a directory of molecular biology databases. Proceedings of the International Workshop on Data Integration in the Life Sciences 2007 (DILS), Philadelphia, PA. 3. He, B., & Chang, K. C. (2003). Statistical schema matching across web query interfaces. 2003 ACM SIGMOD International Conference on Management of Data , San Diego, Californi. 217-228. 4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based schema matching for web databases by domain-specific query probing. Thirtieth International Conference on very Large Data Bases, 30, 408 419.