Rfam is an open access database (hosted at the Wellcome Trust Sanger Institute) containing information for RNA families and annotations for millions of RNA genes. Designed to work in a similar way to the Pfam database of protein families, Rfam uses a similar model for annotation and display and is built on the same principle of open access to the data. Each entry in the Rfam database includes multiple sequence alignments, a secondary structure and probabilistic models known as covariance models (CMs), these models can simultaneously handle an RNA sequence and its structure. In conjunction with the Infernal software package, Rfam CMs can be used to search genomes or other DNA sequence databases for homologs to known structural RNA families. You can find more about Rfam at http://rfam.janelia.org/
2. Non-coding RNA genes codify for a functional RNA
product rather than for a protein.
3. Non-coding genes codify for a functional RNA
product rather than for a protein.
Family of functional RNAs:
Biological function RNA family
Involved in protein
tRNA, rRNA, SRP RNA, tmRNA
synthesis
Post-trascriptional
snRNA, snoRNA, SmY, scaRNA, gRNA, RNAse P, RNAse MRP,
modification or DNA
Y RNA, telomerase RNA
replication
aRNA, NAT, crRRNA, long ncRNA, miRNA, piRNA, siRNA,
Regulatory RNAs
tasiRNA, rasiRNA, 7SK
Parasitic RNA Retrotransposon,Viroid, satellite RNA
4. The majority of functional RNAs fold in stable
structures that are essential for their biological
activity.
Micro-RNA tRNA U2 Part of
spliceosomal Riboswitch
precursor
RNA
5. Unlike protein-coding genes functional RNAs often
show no significant sequence similarity but preserve a
base-paired secondary structure.
ncRNA_1 AAAAAAGGGGTTTTTT!
ncRNA_2 AAATAAGGGGTTATTT!
Struct ((((((....)))))) !
This makes very difficult to search for those genes
looking only for sequence similarity (i.e. by using
BLAST, FASTA…)
6. For Rfam database a functional RNA family is
represented by a multiple sequence alignment and a
covariance model.
The model takes into account both sequence and
structure and can be used to scan a genomic
sequence to detect new members of the same family.
7. The Rfam Seed alignment for the U12 minor spliceosomal RNA family.
8. Only one sequence,
up to 10 kb
Search methodology
The query sequence is scanned against a library of Rfam sequences using WU-BLAST, with an E-value threshold of 1.0.
Any matches to this are then scanned against the corresponding covariance model using the hand-curated threshold for
that family.
10. Bit score: how well the sequence matches your model.
The score reflects whether the sequence matches better to the profile model (positive score) or to the null model of
nonhomologous sequences (negative score).
E-value: expected number of false positives with bit scores at least high as your hit.
The value is related to the size of database used for the search.
11. I Predicted secondary structure
“<> [ ] { }” base pairs “_” hairpin loop “-” interior bulge and loop “,” single stranded multifurcation loop “:” external
single stranded residues “.” insertion to the consensus.
II Consensus of the query model
III Alignment to the model and scoring system
“Capital letter” = max score. “: +” score >=0 for base pairs and single stranded. “ ” negative score
IV Target sequence
12. Going to the family information
A summary written in wikipedia about the family is shown together with information stored into the database.
13. Going to the family information
Sequences part of that family can be viewed (if they are not so much)
14. Going to the family information
Both seed and full alignments of members can be displayed.
15. Going to the family information
Both seed and full alignments of members can be displayed.
16. Going to the family information
The secondary structure can be viewed.
17. Going to the family information
The secondary structure can be viewed.
18. Going to the family information
Also the tree of genomes containing members of that family can be browsed
19. Going to the family information
If a PDB entry is available it is possible to see also the three-dimensional structure.
20. Going to the family information
If a PDB entry is available it is possible to see also the three-dimensional structure.
21. Going to the family information
You can reach some publication on the family.
22. Problems in searching sequences
- To speed up the searching it is necessary a filtering step based on blast search. This will decrease the sensitivity in
finding true homologues of the functional RNA family.
- The genomes of higher eukaryotes contain many ncRNA-derived pseudogenes and repeats that looks like structured
functional RNAs.
Gardner PP, et al. Bateman A. Rfam: updates to the RNA families database. Nucleic Acids Res. 2009
23. Batch search
You can upload a file containing several sequences in fasta format. Generally a job takes 48 hours.
Files must have fewer than 100,000 lines and fewer
than 1000 sequences with a size shorter than
200,000 nucleotides
28. Running a complete search for a whole genome.
You may install locally the infernal program available at
http://infernal.janelia.org/.
To speed up the search you may install also the rfam_scan.pl script
available at ftp://ftp.sanger.ac.uk/pub/databases/Rfam/tools/ that relies
on Blast program.
29. Running a complete search for a whole genome.
Typical usage of infernal.
cmsearch -o output.aln --tabfile output.tab infile.fna Rfam.cm!
Typical usage of rfam_scan.pl
Perl rfam_scan.pl – blastdb Rfam.fasta -outfile.out Rfam.cm
infile.fna !