SlideShare una empresa de Scribd logo
1 de 29
Managing Data Flow Through the
      Barcoding Pipeline

                   Amy Driskell
      Laboratories of Analytical Biology (LAB)
       National Museum of Natural History
             Smithsonian Institution
What is the “pipeline”?


                  LIMS
Specimen



                               Data Deposition



                 Data QC
Outline
1. BEFORE the LIMS
2. LIMS
  – Data recorded
  – Exploring laboratory success/failure
  – Tracking project completion
3. Data QC
  – Criteria and data requirements
  – Checking for contamination and validity
Critical data management BEFORE
specimen enters the laboratory pipeline

• Data elements (“metadata”) necessary for laboratory
  processing:
   – Taxonomy, collection information, etc.


IMPORTANT!
• Assess laboratory successes/failures in light of this
  information
• Tailor/change lab protocols
Careful Metadata Collection at
        Specimen Collection or Harvest
• Metadata can be formatted at the beginning of a
  project (e.g. at specimen collection) to guarantee a
  smooth information transfer into the LIMS
• Multiple sources for metadata:
   –   Spreadsheets
   –   Field Information Management Systems (FIMS)
   –   Museum databases
   –   Fusion tables
Rockin’ It “Old School” -- Spreadsheets




•   Modified BOLD specimen spreadsheet for use in field/museum
•   Additional fields desired by PIs
•   Modified easily to interface with multiple kinds of databases
•   96-well format – 2D barcoded tubes, extraction plates
•   NOT directly connected to other databases, including LIMS
An Elegant Solution:
                       BiocodeMoorea FIMS
                                      Actively connected to their LIMS




http://biocode.berkeley.edu/
bioValidator – cleaning up the
             collection of metadata
• Many aspects of metadata require specific formats:
  digital lat/long, meters, names
• bioValidator enforces adherence to formatting and
  other rules
• Photo matcher




http://biovalidator.sourceforge.net/
Museum Collection Databases
• Sampling directly from existing collections?
• Some museum databases cannot link directly
  to lab-based information systems (LIMS)
• Requires output from collection
  database, input into lab database – no
  automatic updates
Why?
     1. Downstream insertion of data into other databases simplified
2. Because metadata has important uses in the lab
• Determine possible causes of failure: taxonomy, collection
  event, specimen age
• adjust extraction or amplification protocols
• design new primers – e.g. smaller fragments
Specimens enter the lab
           Metadata enters the LIMS


                  LIMS
Specimen
   &
Metadata
                                Data Deposition



                 Data QC
What is a LIMS?
• An electronic lab “notebook” (aka database) to
  replace our traditional paper lab notebooks.
• Tracks a specimen through lab processes from
  extraction through to barcode sequence completion
  (data QC may use external software).
• Records every lab procedure.
• Provides information to guide further lab efforts –
  success rates, “redo” lists
• Records the physical location of extracts, etc.
My requirements for a LIMS
• I want a system that records every piece of
  information about each specimen/extract for
  which I produce a barcode sequence.
• I want my procedures and protocols to be
  transparent enough so that anyone can
  reproduce my results.
• This includes my QC procedures.
• Currently no good place to publish these data.
Data to be recorded
• Extraction: protocol, digestion time, etc.
• PCR: recipes, DNA [ ], cycling parameters, clean-up
  method (PCR machine, brand of enzyme, lot #)
• Gel photos
• Sequencing: recipe, clean-up, machine, etc.

• Bonus: success or failure can be mapped back to any
  of these recorded values. Maybe the Taq was bad?
  Or the PCR machine needs repair?
• A LIMS can be homegrown (like LAB’s barcoding
     LIMS, or SI’s plant barcoding LIMS) – relatively simple
     relational databases
   • Sophisticated, commercially produced – Geneious
     plug-in MooreaBiocode LIMS (plug-in is free)



•Software updated and maintained
•Plugs into the Geneious data analysis software




   http://software.mooreabiocode.org
Workflow
Mapping workflow elements to success
Tracking project progress
      & identifying next steps
• Which specimens have
  completed barcodes?
• Which specimens need
  additional labwork?
• Which specimens should be
  abandoned?
• Where are the original DNA
  extracts or tissue samples?
Project Progress
Raw data enters the QC process


             LIMS
Specimen



                          Data Deposition



            Data QC
Data QC
• OUTSIDE of LIMS database
• “Clean up” raw data – trim, examine quality
• Assemble passed traces (“contig”) for a
  specimen
• Examine/edit contigs
• Check validity of resulting sequences
My data QC ethos
• All criteria for each step of data analysis is
  recorded
• For raw trace processing: trimming
  criteria, length and quality requirements, binning
  criteria
• For assembly: assembly parameters, product
  length, etc.
• Hand editing is minimized*
• It would be possible for anyone to recreate the
  barcode sequence
Any DNA sequence analysis software can be
               used for data QC

• Sequencher (Genecodes) &Geneious (Biomatters)
   – Trim ends of raw sequences with adjustable criteria, explore
     effects of trim criteria
   – Discard short or poor sequences
   – Assemble trimmed reads with stringent, but adjustable criteria
   – Output completed sequences
• Geneious LIMS is plugged into the data analysis software
   – direct communication
   – binning*
• Sequencher data must be exported and imported into LIMS
Data analysis



               Here are the traces. You can see some
                FIMS data in the document fields (eg
            identified by, tissue id). You will also notice
             a binning column (see the following slide)
Binning
Automatic categorization of reads and
              assemblies

                                 •Change binning
                            parameters, examine effects
                           •Trimming and assembly dialog
                                   boxes similar
Final Steps:
  Is it a contaminant? Is it identified correctly?

• A number of procedures for identifying
  contamination or incorrect identification
   – BLASTingdatabase of known contaminants; Genbank;
     BOLD
   – Quick and dirty assembly tests
   – NJ trees
   – Geneious taxonomy verification tool
Verify Taxonomy
• BLASTs your sequences
• Gets the NCBI taxonomy for the best hit(s)
• Compares to the taxonomy from the FIMS
Good, clean, barcode sequences
  • Feed back into LIMS*
      – Monitor progress
      – Connect sequences and traces to specimen data
  • Prepare for output to databases Genbank or
    BOLD upload packages
                     LIMS
                       &
Specimen
                    Data QC

                                            Data Deposition
Positive Information Flow from field or
     museum to final data deposition

1. Collect metadata to flow easily into LIMS and
   other databases
2. Record all aspects of all laboratory procedures
   (LIMS)
3. Use LIMS system for reporting and protocol
   investigation, monitoring of project progress
4. Input information and data from QC procedures
   into LIMS*
5. LIMS output upload packages for public
   databases

Más contenido relacionado

Similar a Amy Driskell - Information management and data Quality

Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
GenomeInABottle
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Alpha analytical edd_services_2012
Alpha analytical edd_services_2012Alpha analytical edd_services_2012
Alpha analytical edd_services_2012
Kristin Garboski
 
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Perficient
 

Similar a Amy Driskell - Information management and data Quality (20)

Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
Where do we currently stand at ICARDA?
Where do we currently stand at ICARDA?Where do we currently stand at ICARDA?
Where do we currently stand at ICARDA?
 
2013 OHSUG - Integration of Argus and Other Products Using the E2B Interchange
2013 OHSUG - Integration of Argus and Other Products Using the E2B Interchange2013 OHSUG - Integration of Argus and Other Products Using the E2B Interchange
2013 OHSUG - Integration of Argus and Other Products Using the E2B Interchange
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 
Alpha analytical edd_services_2012
Alpha analytical edd_services_2012Alpha analytical edd_services_2012
Alpha analytical edd_services_2012
 
Labmatrix
LabmatrixLabmatrix
Labmatrix
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Electronic Data Management Systems.ppt
Electronic Data Management Systems.pptElectronic Data Management Systems.ppt
Electronic Data Management Systems.ppt
 
An Introduction to Clinical Study Migrations
An Introduction to Clinical Study MigrationsAn Introduction to Clinical Study Migrations
An Introduction to Clinical Study Migrations
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Qa what is_clinical_data_management
Qa what is_clinical_data_managementQa what is_clinical_data_management
Qa what is_clinical_data_management
 
Clinical data management
Clinical data management Clinical data management
Clinical data management
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Management
 
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
 

Más de Consortium for the Barcode of Life (CBOL)

Más de Consortium for the Barcode of Life (CBOL) (20)

Andrew Lowe - Opening Plenary
Andrew Lowe - Opening PlenaryAndrew Lowe - Opening Plenary
Andrew Lowe - Opening Plenary
 
Axel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates PlenaryAxel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates Plenary
 
Hannah McPherson - Plants Plenary
Hannah McPherson - Plants PlenaryHannah McPherson - Plants Plenary
Hannah McPherson - Plants Plenary
 
Rebecca Johnson - Opening Plenary
Rebecca Johnson - Opening PlenaryRebecca Johnson - Opening Plenary
Rebecca Johnson - Opening Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Scott Miller - Opening Plenary
Scott Miller - Opening PlenaryScott Miller - Opening Plenary
Scott Miller - Opening Plenary
 
Bruce Deagle - Opening Plenary
Bruce Deagle - Opening PlenaryBruce Deagle - Opening Plenary
Bruce Deagle - Opening Plenary
 
Ralph Imondi - Opening Plenary
Ralph Imondi - Opening PlenaryRalph Imondi - Opening Plenary
Ralph Imondi - Opening Plenary
 
Damon Little - Opening Plenary
Damon Little - Opening PlenaryDamon Little - Opening Plenary
Damon Little - Opening Plenary
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Robert Hanner - Closing Plenary
Robert Hanner - Closing PlenaryRobert Hanner - Closing Plenary
Robert Hanner - Closing Plenary
 
Paul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing PlenaryPaul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing Plenary
 
Conrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing PlenaryConrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing Plenary
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
 
Pierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing PlenaryPierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing Plenary
 
Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative
 
Weiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi PlenaryWeiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi Plenary
 
Alain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi PlenaryAlain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi Plenary
 
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi PlenaryMarieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
 
John La Salle - Opening Plenary
John La Salle - Opening PlenaryJohn La Salle - Opening Plenary
John La Salle - Opening Plenary
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Amy Driskell - Information management and data Quality

  • 1. Managing Data Flow Through the Barcoding Pipeline Amy Driskell Laboratories of Analytical Biology (LAB) National Museum of Natural History Smithsonian Institution
  • 2. What is the “pipeline”? LIMS Specimen Data Deposition Data QC
  • 3. Outline 1. BEFORE the LIMS 2. LIMS – Data recorded – Exploring laboratory success/failure – Tracking project completion 3. Data QC – Criteria and data requirements – Checking for contamination and validity
  • 4. Critical data management BEFORE specimen enters the laboratory pipeline • Data elements (“metadata”) necessary for laboratory processing: – Taxonomy, collection information, etc. IMPORTANT! • Assess laboratory successes/failures in light of this information • Tailor/change lab protocols
  • 5. Careful Metadata Collection at Specimen Collection or Harvest • Metadata can be formatted at the beginning of a project (e.g. at specimen collection) to guarantee a smooth information transfer into the LIMS • Multiple sources for metadata: – Spreadsheets – Field Information Management Systems (FIMS) – Museum databases – Fusion tables
  • 6. Rockin’ It “Old School” -- Spreadsheets • Modified BOLD specimen spreadsheet for use in field/museum • Additional fields desired by PIs • Modified easily to interface with multiple kinds of databases • 96-well format – 2D barcoded tubes, extraction plates • NOT directly connected to other databases, including LIMS
  • 7. An Elegant Solution: BiocodeMoorea FIMS Actively connected to their LIMS http://biocode.berkeley.edu/
  • 8. bioValidator – cleaning up the collection of metadata • Many aspects of metadata require specific formats: digital lat/long, meters, names • bioValidator enforces adherence to formatting and other rules • Photo matcher http://biovalidator.sourceforge.net/
  • 9. Museum Collection Databases • Sampling directly from existing collections? • Some museum databases cannot link directly to lab-based information systems (LIMS) • Requires output from collection database, input into lab database – no automatic updates
  • 10. Why? 1. Downstream insertion of data into other databases simplified 2. Because metadata has important uses in the lab • Determine possible causes of failure: taxonomy, collection event, specimen age • adjust extraction or amplification protocols • design new primers – e.g. smaller fragments
  • 11. Specimens enter the lab Metadata enters the LIMS LIMS Specimen & Metadata Data Deposition Data QC
  • 12. What is a LIMS? • An electronic lab “notebook” (aka database) to replace our traditional paper lab notebooks. • Tracks a specimen through lab processes from extraction through to barcode sequence completion (data QC may use external software). • Records every lab procedure. • Provides information to guide further lab efforts – success rates, “redo” lists • Records the physical location of extracts, etc.
  • 13. My requirements for a LIMS • I want a system that records every piece of information about each specimen/extract for which I produce a barcode sequence. • I want my procedures and protocols to be transparent enough so that anyone can reproduce my results. • This includes my QC procedures. • Currently no good place to publish these data.
  • 14. Data to be recorded • Extraction: protocol, digestion time, etc. • PCR: recipes, DNA [ ], cycling parameters, clean-up method (PCR machine, brand of enzyme, lot #) • Gel photos • Sequencing: recipe, clean-up, machine, etc. • Bonus: success or failure can be mapped back to any of these recorded values. Maybe the Taq was bad? Or the PCR machine needs repair?
  • 15. • A LIMS can be homegrown (like LAB’s barcoding LIMS, or SI’s plant barcoding LIMS) – relatively simple relational databases • Sophisticated, commercially produced – Geneious plug-in MooreaBiocode LIMS (plug-in is free) •Software updated and maintained •Plugs into the Geneious data analysis software http://software.mooreabiocode.org
  • 18. Tracking project progress & identifying next steps • Which specimens have completed barcodes? • Which specimens need additional labwork? • Which specimens should be abandoned? • Where are the original DNA extracts or tissue samples?
  • 20. Raw data enters the QC process LIMS Specimen Data Deposition Data QC
  • 21. Data QC • OUTSIDE of LIMS database • “Clean up” raw data – trim, examine quality • Assemble passed traces (“contig”) for a specimen • Examine/edit contigs • Check validity of resulting sequences
  • 22. My data QC ethos • All criteria for each step of data analysis is recorded • For raw trace processing: trimming criteria, length and quality requirements, binning criteria • For assembly: assembly parameters, product length, etc. • Hand editing is minimized* • It would be possible for anyone to recreate the barcode sequence
  • 23. Any DNA sequence analysis software can be used for data QC • Sequencher (Genecodes) &Geneious (Biomatters) – Trim ends of raw sequences with adjustable criteria, explore effects of trim criteria – Discard short or poor sequences – Assemble trimmed reads with stringent, but adjustable criteria – Output completed sequences • Geneious LIMS is plugged into the data analysis software – direct communication – binning* • Sequencher data must be exported and imported into LIMS
  • 24. Data analysis Here are the traces. You can see some FIMS data in the document fields (eg identified by, tissue id). You will also notice a binning column (see the following slide)
  • 25. Binning Automatic categorization of reads and assemblies •Change binning parameters, examine effects •Trimming and assembly dialog boxes similar
  • 26. Final Steps: Is it a contaminant? Is it identified correctly? • A number of procedures for identifying contamination or incorrect identification – BLASTingdatabase of known contaminants; Genbank; BOLD – Quick and dirty assembly tests – NJ trees – Geneious taxonomy verification tool
  • 27. Verify Taxonomy • BLASTs your sequences • Gets the NCBI taxonomy for the best hit(s) • Compares to the taxonomy from the FIMS
  • 28. Good, clean, barcode sequences • Feed back into LIMS* – Monitor progress – Connect sequences and traces to specimen data • Prepare for output to databases Genbank or BOLD upload packages LIMS & Specimen Data QC Data Deposition
  • 29. Positive Information Flow from field or museum to final data deposition 1. Collect metadata to flow easily into LIMS and other databases 2. Record all aspects of all laboratory procedures (LIMS) 3. Use LIMS system for reporting and protocol investigation, monitoring of project progress 4. Input information and data from QC procedures into LIMS* 5. LIMS output upload packages for public databases

Notas del editor

  1. An example workflow. This workflow was very straight forward – everything worked the first time so we didn’t have to rerun anything. Reaction templates and cocktails on the left, reaction thermocycles on the right.