Unknown Genes, Community Profiling, & Biotorrents.net

•Descargar como PPTX, PDF•

2 recomendaciones•992 vistas

Morgan Langille

Tecnología

Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?

Community Profiling KEGG COG Delong, et al., Science, 2006

Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes

So what have I done? "all metagenomics peptides" from CAMERA 43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples

Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams

Example of phage Pfams clustering together

Measuring functional relatedness Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above. PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term 695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term

Measuring GO similarity G-SESAME Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster had to check if terms were in same namespace

Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484 10 clusters had a score of 0.60 or greater. The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater

Community Profiling Results ,[object Object]

1 or 0 clusters are > 0.60,[object Object]

Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small pieces between several users

Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data

What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents

Other Features Forum RSS Feed Top 10 FAQ Links

Who will upload data? Everyone! Realistically, Large organizations (e.g. NCBI, CAMERA, etc.) May need some convincing to host their data via torrents in addition to FTP, HTTP, etc. Scientists that really support open science Sharing data before formally complete and published

Technical Challenges Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy

Más contenido relacionado

La actualidad más candente

Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard

Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark

The Chemtools LaBLogCameron Neylon

Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran

Overview of Next Gen Sequencing Data AnalysisBioinformatics and Computational Biosciences Branch

150219 agbt giab_poster_marcGenomeInABottle

Metabolic Network AnalysisMas Kot

Full textbutest

TGAC Browser bosc 2014Anil Thanki

Ondex: Data integration and visualisationBiogeeks

Ontologies for life sciences: examples from the gene ontologyMelanie Courtot

Sequence assemblyBioinformatics and Computational Biosciences Branch

Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov

MicrobeDB OverviewMorgan Langille

UniProt-GOAEBI

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab

140127 Performance Metrics WGGenomeInABottle

Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres

Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute

Gene Ontology Projectvaibhavdeoda

La actualidad más candente (20)

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data

Fairport domain specific metadata using w3 c dcat & skos w ontology views

The Chemtools LaBLog

Metagenomic Data Provenance and Management using the ISA infrastructure --- o...

Overview of Next Gen Sequencing Data Analysis

150219 agbt giab_poster_marc

Metabolic Network Analysis

Full text

TGAC Browser bosc 2014

Ondex: Data integration and visualisation

Ontologies for life sciences: examples from the gene ontology

Sequence assembly

Gene Ontology Enrichment Network Analysis -Tutorial

MicrobeDB Overview

UniProt-GOA

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...

140127 Performance Metrics WG

Apollo annotation guidelines for i5k projects Diaphorina citri

Advanced Bioinformatics for Genomics and BioData Driven Research

Gene Ontology Project

Destacado

司馬光對話方塊honan4108

Final presentation 2012Sana Batool ACMA

International Group Work For Sustainable DevelopmentKatherine Haxton

Infolit day 24_may2016HELIGLIASA

Comunicado de la oficina del coordinador residente de naciones unidasCasa de la Mujer

Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks

Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna

Evolver ArchitectsArdiansyah Basha

Squaw LakeWilliam Hazard

Shimla Kullu Manali DalhousieApna Bharat Tours & Travels

和菓子ここだけの話stucon

Spring3.1 aop-mvcYuichi Hasegawa

The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013seoinhouse

¿Hablamos de futuro?groupVision | optimizing group collaboration

User experience for drupalAnne Stefanyk

How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...Paul R. DiModica

Chuong 1 newthuydoan2016

Fall Simmer Pot RecipesMaid Pro Tulsa, OK

NFS: para la gestion de espacios de trabajogroupVision | optimizing group collaboration

Destacado (19)

司馬光對話方塊

Final presentation 2012

International Group Work For Sustainable Development

Infolit day 24_may2016

Comunicado de la oficina del coordinador residente de naciones unidas

Quarterly Technology Briefing, Manchester, UK September 2013

Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Evolver Architects

Squaw Lake

Shimla Kullu Manali Dalhousie

和菓子ここだけの話

Spring3.1 aop-mvc

The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013

¿Hablamos de futuro?

User experience for drupal

How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...

Chuong 1 new

Fall Simmer Pot Recipes

NFS: para la gestion de espacios de trabajo

Similar a Unknown Genes, Community Profiling, & Biotorrents.net

Genome science intermineELIXIR UK

Basic BLAST (BLASTn)Syed Lokman

Investigating plant systems using data integration and network analysisCatherine Canevet

BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar

Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard

PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics

RDA Wheat Data Interoperability Cookbook and last developmentsCIARD Movement

GlyGen Warren Workshop in BostonGlyGen

GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle

Computing on the shoulders of giantsBenjamin Good

Finding knowledge, data and answers on the Semantic Webebiquity

provenance of microarray experimentsHelena Deus

Text and data integrationLars Juhl Jensen

Interpret gene expression results 2013Elsa von Licy

Build your own gene panels 2013Elsa von Licy

The Past, Present and Future of Knowledge in Biologyrobertstevens65

BioAssay Express: Creating and exploiting assay metadataPhilip Cheung

Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH

Improving VIVO search through semantic ranking.Deepak K

Zmasek TOPSAN Biohackathon 2011cmzmasek

Similar a Unknown Genes, Community Profiling, & Biotorrents.net (20)

Genome science intermine

Basic BLAST (BLASTn)

Investigating plant systems using data integration and network analysis

BIOLINK 2008: Linking database submissions to primary citations with PubMe...

Plant Pathogen Genome Data: My Life In Sequences

PhoenixBio 2020 Stanford Workshop on PhyloGenes

RDA Wheat Data Interoperability Cookbook and last developments

GlyGen Warren Workshop in Boston

GIAB-GRC workshop oct2015 giab introduction 151005

Computing on the shoulders of giants

Finding knowledge, data and answers on the Semantic Web

provenance of microarray experiments

Text and data integration

Interpret gene expression results 2013

Build your own gene panels 2013

The Past, Present and Future of Knowledge in Biology

BioAssay Express: Creating and exploiting assay metadata

Sequencealignmentinbioinformatics 100204112518-phpapp02

Improving VIVO search through semantic ranking.

Zmasek TOPSAN Biohackathon 2011

Más de Morgan Langille

GLBIO/CCBC Metagenomics WorkshopMorgan Langille

Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille

Inferring microbial community function from taxonomic compositionMorgan Langille

Characterizing Protein Families of Unknown FunctionMorgan Langille

HMMER 3 & Community ProfilingMorgan Langille

Computational prediction and characterization of genomic islands: insights i...Morgan Langille

Microbial Genomics 2008 Conference ReviewMorgan Langille

A graduate student's experience in bioinformaticsMorgan Langille

Más de Morgan Langille (8)

GLBIO/CCBC Metagenomics Workshop

Leveraging ancestral state reconstruction to infer community function from a ...

Inferring microbial community function from taxonomic composition

Characterizing Protein Families of Unknown Function

HMMER 3 & Community Profiling

Computational prediction and characterization of genomic islands: insights i...

Microbial Genomics 2008 Conference Review

A graduate student's experience in bioinformatics

Último

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Slack Application Development 101 Slidespraypatel2

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Real Time Object Detection Using Open CVKhem

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

A Year of the Servo Reboot: Where Are We Now?Igalia

Histor y of HAM Radio presentation slidevu2urc

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Unknown Genes, Community Profiling, & Biotorrents.net

1. unknown genes, Community Profiling,& Biotorrents.net Morgan Langille UC Davis

2. Genes with unknown function

3. Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?

4. Outline of project

5. Community Profiling

6. Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins

7. Community Profiling KEGG COG Delong, et al., Science, 2006

8. Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes

9. So what have I done? "all metagenomics peptides" from CAMERA 43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples

10. Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams

11. Example of phage Pfams clustering together

12. Measuring functional relatedness Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above. PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term 695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term

13. Measuring GO similarity G-SESAME Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster had to check if terms were in same namespace

14. Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484 10 clusters had a score of 0.60 or greater. The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater

15.

16.

17.

18. Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small pieces between several users

19. Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data

20. What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents

21. www.biotorrents.net

22. Browse & Search

23. Details

24. Sign Up

25. Upload

26. Other Features Forum RSS Feed Top 10 FAQ Links

27. Who will upload data? Everyone! Realistically, Large organizations (e.g. NCBI, CAMERA, etc.) May need some convincing to host their data via torrents in addition to FTP, HTTP, etc. Scientists that really support open science Sharing data before formally complete and published

28. Technical Challenges Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy

Unknown Genes, Community Profiling, & Biotorrents.net

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (19)

Similar a Unknown Genes, Community Profiling, & Biotorrents.net

Similar a Unknown Genes, Community Profiling, & Biotorrents.net (20)

Más de Morgan Langille

Más de Morgan Langille (8)

Último

Último (20)

Unknown Genes, Community Profiling, & Biotorrents.net