SlideShare una empresa de Scribd logo
1 de 22
Biothings.api
https://github.com/SuLab/biothings.api
Generalizing MyGene and MyVariant
History
http://biogps.org/ is a user-defined and user-extensible tool to
analyze genes. Given a gene of interest, different people are
interested in different data about the gene. BioGPS allows
you to select and display the data you are interested in about
your gene.
To power the backend queries, a database of gene information
was abstracted from BioGPS. The database contained
aggregated (on entrez gene id) and up to date (weekly)
information about all genes.
To access the data seamlessly from BioGPS, a REST API was
implemented giving an annotation lookup service (/gene/)
and a full text query service (/query/). The combination of
these (data aggregation/API front end) became MyGene.info
MyGene.info
• MyGene.info provides simple-to-use REST web services to
query/retrieve gene annotation data. Aggregated on entrez
ID.
• Examples:
– http://mygene.info/v2/gene/1017
– http://mygene.info/v2/query?q=cdk*&fields=pdb
– http://mygene.info/v2/metadata
• Hosted entirely on AWS cloud computers (3 8 GB 2-core data
nodes and 2 4GB 2-core web nodes). Currently serves
millions of requests per month.
MyVariant.info
• MyVariant.info provides simple-to-use REST web services to
query/retrieve variant annotation data, aggregated from
many popular data resources. Aggregated on HGVS ID.
• Examples:
– http://myvariant.info/v1/variant/chr6:g.152708291G%3E
A
– http://myvariant.info/v1/query?q=clinvar.chrom:10&fiel
ds=clinvar
– http://myvariant.info/v1/query?q=chr1:69000-
70000&fields=dbnsfp,dbsnp
Biothings.api - abstracting web
front end
From the point of view of the front end, the
nature of the document is inconsequential,
i.e., whether we serve a documents of genes
or variants or chemicals isn’t particularly
important => How much can we abstract out
of mygene and myvariant and apply it to
Motivation
• Isolate the common aspects of MyGene and
MyVariant codebases and make them
available in a separate framework:
biothings.api
• Allows easier development of additional
biothings APIs (Disease, Drug/Chemical, GO,
Species… -> JSON, aggregate on a single field)
• Allows easier maintenance and development
of current biothings (gene, variant).
System Overview
• The tornado HTTP server consists of handlers that contain the code to run
when a particular URL pattern is matched, e.g. /variant/, or /metadata
• The biothing codebase essentially contains the connection between the
appropriate Tornado HTTP Request Handler for a request and the elasticsearch
query that executes that request. Conceptually very similar to model-
controller framework, where the model is the elasticsearch python box, and
the controller is the tornado HTTP server.
Biothings – HTTP Handling
• tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods:
get/post, get_arguments, write
• biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers.
Important class methods: get_query_params, return_json
• biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint
• biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS
ELB
Biothings – HTTP Handling
• biothings.www.api.handlers.BiothingHandler:
– GET request (e.g. /variant/chr6:g.152708291G>A)
– POST request (e.g. /variant/)
Biothings – HTTP Handling
• biothings.www.api.handlers.QueryHandler:
– GET request (e.g. /query?q=_exists_:dbsnp)
– POST request (e.g. /query/)
Biothings – Elasticsearch query
• biothings.www.api.es.ESQuery – contains the python code
for constructing the elasticsearch query and formatting the resulting data
– query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a
GET or POST to the /query/ endpoint.
– get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data
obtained from a GET to the /annotation/ endpoint.
– mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data
obtained from a POST to the /annotation/ endpoint.
– _cleaned_res(res) – Contains the code to format the return object for get_biothing and
mget_biothings.
– _cleaned_res2(res) – Contains the code to format the return object for query.
– _get_biothingdoc(hit) – Contains the code to format a single biothing object from any
elasticsearch query. Called by _cleaned_res and _cleaned_res2.
– _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in
_get_biothingdoc. Currently empty -> for overriding.
Biothings - Settings
• Problem: Until now, we have left out the problem of how to
refer to things that MUST be project specific (e.g., the name
of the elasticsearch index to search, the type of the
document, etc). How do we do this?
• Solution: We make a settings module in biothings that all
code within biothings refers to. That module looks for an
environment variable called BIOTHING_SETTINGS with the
name of a module that can be imported to set project specific
variables.
– export BIOTHING_SETTINGS = ‘biothings.config’
• Similar to Django.
Biothings - Settings
Biothings – Project template
• At this point, we have the tools necessary to easily create and
subclass 4 types of biothings handlers (BiothingHandler,
QueryHandler, MetaDataHandler, StatusHandler), and the
elasticsearch query class (ESQuery)
• Could definitely stop here and have a useful tool, but we
wanted to make it even easier to create a new project (also
enforces a uniform project structure across all biothings APIs).
• To do this we have a project template folder containing the
project directory structure and some skeleton code:
– config.py,
– URL patterns to Handlers connection
– Handlers to ESQuery connection
Biothings - Project template
• To create the actual project directory from the
template, we wrote a small function: biothings-
admin.py
– Usage: biothings-admin.py <path-to-project-directory>
<biothing-object-name>
– biothings-admin.py ~ variant
• Any folder or file in the template directory will be
created in the project directory. The contents of any
file are passed through the python String.template
function before they are created in the project
directory.
Small project structure review
Recreating MyVariant.info using biothings.api
• Recreated current MyVariant.info service using the
biothings.api framework
– Very little extra code required (~100 lines)
– Less than a day of time to create the web front end from start.
– https://github.com/SuLab/myvariant.info/tree/biothings.variant
• Seems disingenuous to gauge the utility of a tool by
recreating a codebase if that tool was itself created from the
codebase => Should try implementing other APIs, especially
MyGene.info (has more varied gene specific query options),
and modify biothings as needed.
MyGene.info v3
• Sebastien reimplemented MyGene using
biothings framework
• Currently live at mygene.info/v3 for testing
purposes
• Some structural changes to data also
• Examples:
–http://mygene.info/v3/gene/1017
–http://mygene.info/v3/query?q=cdk*&fields=pd
b
Small Biothing Cluster
• With biothings, new front end frameworks are very easy to
set up => We are limited only by our ability to parse,
aggregate, index etc. new data.
• For small ES indices (<1 or 2 GB), we set up a small biothings
cluster with 1 m4.large data node serving all search requests,
and 1 t2.micro web node per biothing.
• Currently, this consists of:
small biothing
data/master
m4.large
Taxonomy
t2.micro
Chemical
t2.micro
Taxonomy biothing
• Using a taxonomy parser written by Greg.
Aggregated on NCBI taxonomy ID.
• Currently live at http://52.34.211.113
• Examples:
–http://52.34.211.113/v1/species/9606
–http://52.34.211.113/v1/query?q=human
• Soon to become http://s.biothings.io
Chemicals biothing
• Data from several chemical databases aggregated by Julee
on InChIKey (hash of string representation of chemical)
https://en.wikipedia.org/wiki/International_Chemical_Identi
fier#InChIKey
• currently live at: http://52.38.192.121/
• Examples:
– http://52.38.192.121/v1/drug/CHEMBL1201666
– http://52.38.192.121/v1/query?q=chembl.pref_name:ne
o*&fields=chembl.pref_name
• Soon to become http://c.biothings.io
Future work
• Integrate data load and data index functions into biothings
(WIP, large project)
• Documentation! – Projects like this need very good
documentation to be of any use to an API developer (on the
level of tornado’s excellent documentation:
http://www.tornadoweb.org/en/stable/web.html) (also, WIP)
• Host API services for external users data (essentially possible
without too much work already).
• Auto-generate clients (python client, R client)
• Auto-generate ansible-playbook to create cluster hardware
on AWS
• One-click API…

Más contenido relacionado

Similar a Biothings presentation

NCBO Technology Overview
NCBO Technology OverviewNCBO Technology Overview
NCBO Technology OverviewTrish Whetzel
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_internSai Ganesh
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilitiesmkim8
 
NCBI API - Integration into analysis code
NCBI API - Integration into analysis codeNCBI API - Integration into analysis code
NCBI API - Integration into analysis codeJiwoong Kim
 
CTS2 Development Framework In Action
CTS2 Development Framework In ActionCTS2 Development Framework In Action
CTS2 Development Framework In Actioncts2framework
 
Web services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsWeb services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsTrish Whetzel
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Toni Hermoso Pulido
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428Vivek Krishnakumar
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)Dag Endresen
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardEMBL-ABR
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Maori Ito
 

Similar a Biothings presentation (20)

NCBO Technology Overview
NCBO Technology OverviewNCBO Technology Overview
NCBO Technology Overview
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
Opinioz_intern
Opinioz_internOpinioz_intern
Opinioz_intern
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
NCBI API - Integration into analysis code
NCBI API - Integration into analysis codeNCBI API - Integration into analysis code
NCBI API - Integration into analysis code
 
CTS2 Development Framework In Action
CTS2 Development Framework In ActionCTS2 Development Framework In Action
CTS2 Development Framework In Action
 
Web services and the Development of Semantic Applications
Web services and the Development of Semantic ApplicationsWeb services and the Development of Semantic Applications
Web services and the Development of Semantic Applications
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
 
People aggregator
People aggregatorPeople aggregator
People aggregator
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013
 

Último

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 

Último (20)

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 

Biothings presentation

  • 2. History http://biogps.org/ is a user-defined and user-extensible tool to analyze genes. Given a gene of interest, different people are interested in different data about the gene. BioGPS allows you to select and display the data you are interested in about your gene. To power the backend queries, a database of gene information was abstracted from BioGPS. The database contained aggregated (on entrez gene id) and up to date (weekly) information about all genes. To access the data seamlessly from BioGPS, a REST API was implemented giving an annotation lookup service (/gene/) and a full text query service (/query/). The combination of these (data aggregation/API front end) became MyGene.info
  • 3. MyGene.info • MyGene.info provides simple-to-use REST web services to query/retrieve gene annotation data. Aggregated on entrez ID. • Examples: – http://mygene.info/v2/gene/1017 – http://mygene.info/v2/query?q=cdk*&fields=pdb – http://mygene.info/v2/metadata • Hosted entirely on AWS cloud computers (3 8 GB 2-core data nodes and 2 4GB 2-core web nodes). Currently serves millions of requests per month.
  • 4. MyVariant.info • MyVariant.info provides simple-to-use REST web services to query/retrieve variant annotation data, aggregated from many popular data resources. Aggregated on HGVS ID. • Examples: – http://myvariant.info/v1/variant/chr6:g.152708291G%3E A – http://myvariant.info/v1/query?q=clinvar.chrom:10&fiel ds=clinvar – http://myvariant.info/v1/query?q=chr1:69000- 70000&fields=dbnsfp,dbsnp
  • 5. Biothings.api - abstracting web front end From the point of view of the front end, the nature of the document is inconsequential, i.e., whether we serve a documents of genes or variants or chemicals isn’t particularly important => How much can we abstract out of mygene and myvariant and apply it to
  • 6. Motivation • Isolate the common aspects of MyGene and MyVariant codebases and make them available in a separate framework: biothings.api • Allows easier development of additional biothings APIs (Disease, Drug/Chemical, GO, Species… -> JSON, aggregate on a single field) • Allows easier maintenance and development of current biothings (gene, variant).
  • 7. System Overview • The tornado HTTP server consists of handlers that contain the code to run when a particular URL pattern is matched, e.g. /variant/, or /metadata • The biothing codebase essentially contains the connection between the appropriate Tornado HTTP Request Handler for a request and the elasticsearch query that executes that request. Conceptually very similar to model- controller framework, where the model is the elasticsearch python box, and the controller is the tornado HTTP server.
  • 8. Biothings – HTTP Handling • tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods: get/post, get_arguments, write • biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers. Important class methods: get_query_params, return_json • biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query endpoint. Important class methods: get, post, _examine_kwargs • biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation endpoint. Important class methods: get, post, _examine_kwargs • biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint • biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS ELB
  • 9. Biothings – HTTP Handling • biothings.www.api.handlers.BiothingHandler: – GET request (e.g. /variant/chr6:g.152708291G>A) – POST request (e.g. /variant/)
  • 10. Biothings – HTTP Handling • biothings.www.api.handlers.QueryHandler: – GET request (e.g. /query?q=_exists_:dbsnp) – POST request (e.g. /query/)
  • 11. Biothings – Elasticsearch query • biothings.www.api.es.ESQuery – contains the python code for constructing the elasticsearch query and formatting the resulting data – query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a GET or POST to the /query/ endpoint. – get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data obtained from a GET to the /annotation/ endpoint. – mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data obtained from a POST to the /annotation/ endpoint. – _cleaned_res(res) – Contains the code to format the return object for get_biothing and mget_biothings. – _cleaned_res2(res) – Contains the code to format the return object for query. – _get_biothingdoc(hit) – Contains the code to format a single biothing object from any elasticsearch query. Called by _cleaned_res and _cleaned_res2. – _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in _get_biothingdoc. Currently empty -> for overriding.
  • 12. Biothings - Settings • Problem: Until now, we have left out the problem of how to refer to things that MUST be project specific (e.g., the name of the elasticsearch index to search, the type of the document, etc). How do we do this? • Solution: We make a settings module in biothings that all code within biothings refers to. That module looks for an environment variable called BIOTHING_SETTINGS with the name of a module that can be imported to set project specific variables. – export BIOTHING_SETTINGS = ‘biothings.config’ • Similar to Django.
  • 14. Biothings – Project template • At this point, we have the tools necessary to easily create and subclass 4 types of biothings handlers (BiothingHandler, QueryHandler, MetaDataHandler, StatusHandler), and the elasticsearch query class (ESQuery) • Could definitely stop here and have a useful tool, but we wanted to make it even easier to create a new project (also enforces a uniform project structure across all biothings APIs). • To do this we have a project template folder containing the project directory structure and some skeleton code: – config.py, – URL patterns to Handlers connection – Handlers to ESQuery connection
  • 15. Biothings - Project template • To create the actual project directory from the template, we wrote a small function: biothings- admin.py – Usage: biothings-admin.py <path-to-project-directory> <biothing-object-name> – biothings-admin.py ~ variant • Any folder or file in the template directory will be created in the project directory. The contents of any file are passed through the python String.template function before they are created in the project directory.
  • 17. Recreating MyVariant.info using biothings.api • Recreated current MyVariant.info service using the biothings.api framework – Very little extra code required (~100 lines) – Less than a day of time to create the web front end from start. – https://github.com/SuLab/myvariant.info/tree/biothings.variant • Seems disingenuous to gauge the utility of a tool by recreating a codebase if that tool was itself created from the codebase => Should try implementing other APIs, especially MyGene.info (has more varied gene specific query options), and modify biothings as needed.
  • 18. MyGene.info v3 • Sebastien reimplemented MyGene using biothings framework • Currently live at mygene.info/v3 for testing purposes • Some structural changes to data also • Examples: –http://mygene.info/v3/gene/1017 –http://mygene.info/v3/query?q=cdk*&fields=pd b
  • 19. Small Biothing Cluster • With biothings, new front end frameworks are very easy to set up => We are limited only by our ability to parse, aggregate, index etc. new data. • For small ES indices (<1 or 2 GB), we set up a small biothings cluster with 1 m4.large data node serving all search requests, and 1 t2.micro web node per biothing. • Currently, this consists of: small biothing data/master m4.large Taxonomy t2.micro Chemical t2.micro
  • 20. Taxonomy biothing • Using a taxonomy parser written by Greg. Aggregated on NCBI taxonomy ID. • Currently live at http://52.34.211.113 • Examples: –http://52.34.211.113/v1/species/9606 –http://52.34.211.113/v1/query?q=human • Soon to become http://s.biothings.io
  • 21. Chemicals biothing • Data from several chemical databases aggregated by Julee on InChIKey (hash of string representation of chemical) https://en.wikipedia.org/wiki/International_Chemical_Identi fier#InChIKey • currently live at: http://52.38.192.121/ • Examples: – http://52.38.192.121/v1/drug/CHEMBL1201666 – http://52.38.192.121/v1/query?q=chembl.pref_name:ne o*&fields=chembl.pref_name • Soon to become http://c.biothings.io
  • 22. Future work • Integrate data load and data index functions into biothings (WIP, large project) • Documentation! – Projects like this need very good documentation to be of any use to an API developer (on the level of tornado’s excellent documentation: http://www.tornadoweb.org/en/stable/web.html) (also, WIP) • Host API services for external users data (essentially possible without too much work already). • Auto-generate clients (python client, R client) • Auto-generate ansible-playbook to create cluster hardware on AWS • One-click API…