SlideShare a Scribd company logo
1 of 1
Download to read offline
Acknowledgements
Thanks go to Elizabeth Smikle, Jireh Agda, Nolan Dickson, Dina Soliman and Megan Milton for programming,
image generation and layout support.
Next Steps
• What types of data need to be included
to make RepeatFUnL useful?
• What formats should be
supported/compatible?
• Please fill out our questionnaire at
www.repeatfunl.org
• Gaining cooperation and support of MGE
and repeat community and private
databases
• Please contact if you are interested in
furthering this project
• telliott@boldsystems.org
• @TransposableMan
Future Goals
• Provide analysis tools to aid in data
curation and generation for users
• Serve as a platform to enforce
community developed standards for
MGE and repeat annotation and
classification
• Develop teaching applications to
introduce students to genomic data and
curation
• Understand the impact of MGEs and
repeats on phenotypic variation and
disease across the Tree of Life
• Unravel the evolutionary diversity of
MGEs and other mobile DNA
Value Added by RepeatFUnL
• Aggregate data across sources in single,
searchable format for easy download
• Build off expertise and reputation of the
Centre for Biodiversity Genomics in
developing and maintaining mature
sequence databases and NGS analysis
resources (BOLD, mBRAVE)
• Make computational intensive data
generated by experts more discoverable
and usable to general scientific
community
• Universal data schema for repeat and
MGE transactions and storage of data
RepeatFUnL: Filterable Universal
Library
• RepeatFUnL will aggregate MGE and
repeat information across databases,
support and enhance current databases
rather than replace them
• The central units of RepeatFUnL are
Repeat Records
• Data stored in NoSQL format to aid in
searching and filtering a large distributed
dataset
• Will include data from databases, primary
literature, uploaded from users and
generated de novo
Repeat Data Challenges
• Mobile genetic element (MGE) and repeat information is of value for a variety of disciplines
(evolution, ecology, agriculture, medicine, biotechnology)
• MGE and repeat data is difficult to generate, requires curation, with few standards for storage,
classification and annotation
• Long read and cheaper sequencing will enable large projects to generate millions of genomes
over the next decade and managing repeat information will be crucial (Figure 1)
• Many databases exist (Table 1), but these can be hard to search and download, along with data
being duplicated and fragmentated across multiple databases
• Repeat information would greatly benefit from better connectivity and searchability
Analyze
Download Upload
Curate
Collaborate Search
Tyler A. Elliott and Sujeevan Ratnasingham
Centre for Biodiversity Genomics, University of Guelph, Ontario, Canada
Developing a comprehensive, integrative repeat database for the broad
scientific community
Genomes Databases
MGE/Repeat
Community Literature
Figure 1. Projected growth in genomes sequenced over the next decade.
Table 1. Current repeat and MGE information. * indicates an underestimate.
MGE/Repeat Statistic Number
MGE records in Databases 1.3 million
Accessions with MGEs in
GenBank
6 million*
Repeat records in
Databases
8 million
Species with MGE/repeat
records
~3000
Taxonomy
Repeat Records
References
Associated Data
#
External IDs
0.0
2.5
5.0
7.5
10.0
2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028
Year
Genomes(millions)
Archaea and Bacteria
Eukaryote
Plasmid
Virus
Number of Genomes Sequenced

More Related Content

Similar to Tyler cshlte18 repeat_f_unl

10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
Alex Hardisty
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 

Similar to Tyler cshlte18 repeat_f_unl (20)

Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13
 
FutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxFutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptx
 
Cri big data
Cri big dataCri big data
Cri big data
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
E science2015
E science2015E science2015
E science2015
 
AVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentationAVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentation
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYU
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
The Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food CommunityThe Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food Community
 

Recently uploaded

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 

Tyler cshlte18 repeat_f_unl

  • 1. Acknowledgements Thanks go to Elizabeth Smikle, Jireh Agda, Nolan Dickson, Dina Soliman and Megan Milton for programming, image generation and layout support. Next Steps • What types of data need to be included to make RepeatFUnL useful? • What formats should be supported/compatible? • Please fill out our questionnaire at www.repeatfunl.org • Gaining cooperation and support of MGE and repeat community and private databases • Please contact if you are interested in furthering this project • telliott@boldsystems.org • @TransposableMan Future Goals • Provide analysis tools to aid in data curation and generation for users • Serve as a platform to enforce community developed standards for MGE and repeat annotation and classification • Develop teaching applications to introduce students to genomic data and curation • Understand the impact of MGEs and repeats on phenotypic variation and disease across the Tree of Life • Unravel the evolutionary diversity of MGEs and other mobile DNA Value Added by RepeatFUnL • Aggregate data across sources in single, searchable format for easy download • Build off expertise and reputation of the Centre for Biodiversity Genomics in developing and maintaining mature sequence databases and NGS analysis resources (BOLD, mBRAVE) • Make computational intensive data generated by experts more discoverable and usable to general scientific community • Universal data schema for repeat and MGE transactions and storage of data RepeatFUnL: Filterable Universal Library • RepeatFUnL will aggregate MGE and repeat information across databases, support and enhance current databases rather than replace them • The central units of RepeatFUnL are Repeat Records • Data stored in NoSQL format to aid in searching and filtering a large distributed dataset • Will include data from databases, primary literature, uploaded from users and generated de novo Repeat Data Challenges • Mobile genetic element (MGE) and repeat information is of value for a variety of disciplines (evolution, ecology, agriculture, medicine, biotechnology) • MGE and repeat data is difficult to generate, requires curation, with few standards for storage, classification and annotation • Long read and cheaper sequencing will enable large projects to generate millions of genomes over the next decade and managing repeat information will be crucial (Figure 1) • Many databases exist (Table 1), but these can be hard to search and download, along with data being duplicated and fragmentated across multiple databases • Repeat information would greatly benefit from better connectivity and searchability Analyze Download Upload Curate Collaborate Search Tyler A. Elliott and Sujeevan Ratnasingham Centre for Biodiversity Genomics, University of Guelph, Ontario, Canada Developing a comprehensive, integrative repeat database for the broad scientific community Genomes Databases MGE/Repeat Community Literature Figure 1. Projected growth in genomes sequenced over the next decade. Table 1. Current repeat and MGE information. * indicates an underestimate. MGE/Repeat Statistic Number MGE records in Databases 1.3 million Accessions with MGEs in GenBank 6 million* Repeat records in Databases 8 million Species with MGE/repeat records ~3000 Taxonomy Repeat Records References Associated Data # External IDs 0.0 2.5 5.0 7.5 10.0 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 Year Genomes(millions) Archaea and Bacteria Eukaryote Plasmid Virus Number of Genomes Sequenced