SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Sequence Matrix
 Gene concatenation made easy
  Gaurav Vaidya1, David Lohman2, Rudolf Meier2

                           1: NeatCo Asia, Singapore.
                           2: Department of Biological Sciences,
                              National University of Singapore, Singapore.
Our goals


 ✤   Many powerful tools exist for concatenating sequences.

 ✤   Adding new sequences to an existing dataset is tedious and time consuming.

 ✤   Our initial goal: simple, user-friendly program for concatenating sequences.

 ✤   We also added a few tools to help you look for lab contamination in your dataset.
Sequence Matrix


✤   Written in Java.

    ✤   Graphical user interface libraries.

    ✤   Works on different operating systems.

    ✤   Easy to install: download and run the batch file.
Importing sequences



✤   You can use the sequence names as
    entered in the input file.

✤   Or you can ask Sequence Matrix to try
    to identify the species names.
Importing sequences

✤   Sequences mode:                                      ✤   Species name
    ✤   gi|237510679|gb|AY556753.2|Daubentonia               ✤   Daubentonia madagascariensis
        madagascariensis voucher WE94001 5.8S
        ribosomal RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence

    ✤   gi|237510678|gb|AY556735.2|Macaca                    ✤   Macaca sylvanus
        sylvanus voucher OK96022 5.8S ribosomal
        RNA gene, partial sequence; internal
        transcribed spacer 2, complete sequence; and
        28S ribosomal RNA gene, partial sequence
Importing sequences



✤   A common source of error is forgetting
    to recode leading and trailing gaps as
    missing information.

✤   Sequence Matrix can automatically
    replace such gaps with question marks.
Importing sequences: Naming



✤   Sequences from one dataset are matched up to another dataset by sequence name.

    ✤   Errors in sequence naming need to be fixed.

✤   We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
Export: Taxonsets


✤   By default, we generate taxonsets on the
    basis of:

    ✤   Combined length.

    ✤   Number of character sets

    ✤   Information for a particular gene.
Gene trees



✤   Two ways to do them:

    ✤   Use the taxonset of taxa having information for a particular gene to exclude other
        taxa.

    ✤   Export the entire dataset with one file per column.
Export features



✤   You can also export the Sequence Matrix table as an Excel-readable text file.

    ✤   Supervisory mode.

    ✤   Keep track of a project as it grows.
Character sets


✤   We can read character sets defined in
    Nexus CHARSET and TNT xgroup
    commands.

✤   These can be “split” into individual
    columns, or imported as a single
    column representing the entire file.
Excision


✤   Individual sequences can be excised
    from the dataset.

✤   Excised sequences will not be exported.

    ✤   Sequence Matrix will warn you about
        that.
Contamination


✤   You thought you were sequencing Gorilla gorilla

    ✤   but you were really sequencing Homo sapiens.

✤   We have two tools you can use:

    ✤   If Homo sapiens is in your dataset.

    ✤   If Homo sapiens is not in your dataset (experimental!).
H. sapiens in dataset

✤   Looks for pairs of sequences whose
    pairwise distance is very low.

✤   Expected difference depends on gene:

    ✤   28S doesn’t change very much, but

    ✤   COI changes very quickly.

✤   Some interpretation is required.
H. sapiens not present

✤   Use “Pairwise Distance Mode” to look
    for unusual pairwise distances.

✤   Ignore one charset, then sort taxa based
    on their pairwise distance to a
    “reference taxon”.

    ✤   Colour sequences by their individual
        pairwise distances to the reference
        taxon.
H. sapiens not present

✤   Colour pairwise distances on the gene
    in question by their pairwise distance to
    the reference taxon.

✤   Look for colour variation which is
    unusual or out of place.

✤   We would expect sequences from
    different species to be correlated
    together.
Pairwise distance
mode

✤   You need to vary:

    ✤   The gene you are studying.

    ✤   The reference taxon being compared
        against.

✤   Possibly helpful as an alert mechanism.
Summary

✤   Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets.

✤   Taxonsets allow you to analyse subsets of your data in downstream programs.

✤   Excising sequences gives you greater control over which sequences to analyse.

✤   You can look for contamination in two ways:

    ✤   Looking for very low pairwise distances across your entire dataset.

    ✤   Looking for unusual pairwise distances in Pairwise Distance Mode.
Acknowledgements

✤   Rudolf Meier

✤   Zhang Guanyang

✤   Farhan Ali

✤   David Lohman

✤   Everybody at the NUS DBS
    Evolutionary Biology lab.
Question time!

Más contenido relacionado

Similar a Sequence Matrix: Gene concatenation made easy

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.pptMedhavi27
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...Mark Evans
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 

Similar a Sequence Matrix: Gene concatenation made easy (20)

31931 31941
31931 3194131931 31941
31931 31941
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Seq 301116
Seq 301116Seq 301116
Seq 301116
 
1 md2016 homology
1 md2016 homology1 md2016 homology
1 md2016 homology
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
EST Clustering.ppt
EST Clustering.pptEST Clustering.ppt
EST Clustering.ppt
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 

Último

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Sequence Matrix: Gene concatenation made easy

  • 1. Sequence Matrix Gene concatenation made easy Gaurav Vaidya1, David Lohman2, Rudolf Meier2 1: NeatCo Asia, Singapore. 2: Department of Biological Sciences, National University of Singapore, Singapore.
  • 2. Our goals ✤ Many powerful tools exist for concatenating sequences. ✤ Adding new sequences to an existing dataset is tedious and time consuming. ✤ Our initial goal: simple, user-friendly program for concatenating sequences. ✤ We also added a few tools to help you look for lab contamination in your dataset.
  • 3. Sequence Matrix ✤ Written in Java. ✤ Graphical user interface libraries. ✤ Works on different operating systems. ✤ Easy to install: download and run the batch file.
  • 4. Importing sequences ✤ You can use the sequence names as entered in the input file. ✤ Or you can ask Sequence Matrix to try to identify the species names.
  • 5. Importing sequences ✤ Sequences mode: ✤ Species name ✤ gi|237510679|gb|AY556753.2|Daubentonia ✤ Daubentonia madagascariensis madagascariensis voucher WE94001 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence ✤ gi|237510678|gb|AY556735.2|Macaca ✤ Macaca sylvanus sylvanus voucher OK96022 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
  • 6. Importing sequences ✤ A common source of error is forgetting to recode leading and trailing gaps as missing information. ✤ Sequence Matrix can automatically replace such gaps with question marks.
  • 7. Importing sequences: Naming ✤ Sequences from one dataset are matched up to another dataset by sequence name. ✤ Errors in sequence naming need to be fixed. ✤ We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
  • 8. Export: Taxonsets ✤ By default, we generate taxonsets on the basis of: ✤ Combined length. ✤ Number of character sets ✤ Information for a particular gene.
  • 9. Gene trees ✤ Two ways to do them: ✤ Use the taxonset of taxa having information for a particular gene to exclude other taxa. ✤ Export the entire dataset with one file per column.
  • 10. Export features ✤ You can also export the Sequence Matrix table as an Excel-readable text file. ✤ Supervisory mode. ✤ Keep track of a project as it grows.
  • 11. Character sets ✤ We can read character sets defined in Nexus CHARSET and TNT xgroup commands. ✤ These can be “split” into individual columns, or imported as a single column representing the entire file.
  • 12. Excision ✤ Individual sequences can be excised from the dataset. ✤ Excised sequences will not be exported. ✤ Sequence Matrix will warn you about that.
  • 13. Contamination ✤ You thought you were sequencing Gorilla gorilla ✤ but you were really sequencing Homo sapiens. ✤ We have two tools you can use: ✤ If Homo sapiens is in your dataset. ✤ If Homo sapiens is not in your dataset (experimental!).
  • 14. H. sapiens in dataset ✤ Looks for pairs of sequences whose pairwise distance is very low. ✤ Expected difference depends on gene: ✤ 28S doesn’t change very much, but ✤ COI changes very quickly. ✤ Some interpretation is required.
  • 15. H. sapiens not present ✤ Use “Pairwise Distance Mode” to look for unusual pairwise distances. ✤ Ignore one charset, then sort taxa based on their pairwise distance to a “reference taxon”. ✤ Colour sequences by their individual pairwise distances to the reference taxon.
  • 16. H. sapiens not present ✤ Colour pairwise distances on the gene in question by their pairwise distance to the reference taxon. ✤ Look for colour variation which is unusual or out of place. ✤ We would expect sequences from different species to be correlated together.
  • 17. Pairwise distance mode ✤ You need to vary: ✤ The gene you are studying. ✤ The reference taxon being compared against. ✤ Possibly helpful as an alert mechanism.
  • 18. Summary ✤ Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets. ✤ Taxonsets allow you to analyse subsets of your data in downstream programs. ✤ Excising sequences gives you greater control over which sequences to analyse. ✤ You can look for contamination in two ways: ✤ Looking for very low pairwise distances across your entire dataset. ✤ Looking for unusual pairwise distances in Pairwise Distance Mode.
  • 19. Acknowledgements ✤ Rudolf Meier ✤ Zhang Guanyang ✤ Farhan Ali ✤ David Lohman ✤ Everybody at the NUS DBS Evolutionary Biology lab.