2. •Link the read files from /import/sequence/read-archive/
Link Files
•The goal here is to remove identical reads or reads-pair.
•It also helps in making the dataset size more manageableRedundancy Analysis
•The goal here is to remove the adaptor sequences from the given data
sets.
•The adaptor are identified on the basis of information available from LIMS
Adaptor Trimming
•The goal is to remove all sequences which are below a certain threshold
(15).Low Quality Score Filtering
•The goal here is to remove all sequences that map to chloroplast and
mitochondrial database based on their insert size.Contamination Analysis
•The goal is to calculate the GC-content in the particular lane and plot a
graph of the same.GC-content Analysis
•The goal here is to understand the distribution of k-mer across the given
dataset and thus estimating the genome size from the distribution.K-mer Analysis
•The goal is to join the short overlap reads if present , to produce longer
reads, therefore making it possible to span gaps or repeats in the genome.Join Reads
•The goal here is to align and merge fragments of a longer DNA sequence
to reconstruct the original sequence.
•This is done using command line CLC Bio De novo assembler.
Assembly