MicroRNA identification based on sequence and structure alignment

MicroRNA identification based on sequence andstructure alignment Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li Presented by - Neeta Jain

Outline • Introduction • Motivation • Experiment • Materials • Methods • Results • Conclusion

Introduction • What are miRNAs and why are they important? • miRNAs are ~22 nt long non-coding RNAs • They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: • They are found to regulate the expression of target genes via complementary base pair interactions.

Motivation • Since miRNAs are short (~22 nt), conventional sequence alignment methods can only find relatively close homologues • It has been reported that miRNA genes are more conserved in their secondary structure than in primary structure • This paper exploits this secondary structure conservation and proposes a novel computational approach to detect miRNAs based on both sequence and structure alignment • The authors devised a tool – miRAlign and have compared it’s performance with existing searching methods such as BLAST and ERPIN

Experiment • Materials • Reference sets • Consists of 1298 miRNAs from 12 species out of which 1054 were animal miRNAs. • 1054 animal miRNAs and their precursors(1104) composed our raw training set Train_All. • Train_Sub_1 : All animal miRNAs except those from C.briggsae • Train_Sub_2: All animal miRNAs except those from C.briggsae and C.elegans • Genomic sequences • Sequences of 6 species were used.

Methods • Preprocessing • Known precursors from training set are used to BLAST against the genome • Potential regions are cut from the genome with 70 nt flanking sequences to each end • Such regions are scanned using a 100nt window with 10 nt step • Overlapping sequences with repeat sequences are discarded.

Methods (contd) • miRAlign • Secondary Structure Prediction • Both the candidate sequence and it’s reverse complement are analyzed by RNA fold to predict hairpins. • Only hairpins with MFE lower than -20 kcal/mol are retained. • Pairwise sequence alignment • Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set • Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. • If the score exceeds a user-defined threshold, then the candidate to known miRNA pairs are kept for further analysis

Methods (contd) • Checking miRNA’s position on stemloop • 3 properties for miRNA’s position are considered: • Should not locate on terminal loop of hairpin • Should locate on the same arm of hairpin • Position of potential miRNA on hairpin should not differ too much from it’s known homologues Position difference of miRNA on precursors A and B:

Methods (contd) • RNA secondary structure alignment • RNAforester computes pairwise structure alignment and gives similarity score • Score is a summation of all base (base pair) match (insertion, deletion). • Normalized similarity score of structure C and m is given as: where, C – Candidate sequence ; m – known pre-miRNA; sigma_local(C,m) – raw local alignment score between C and m Sigma(m,m) – self-alignment score of m

Methods (contd) • Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s

Methods (contd) Summary -

Results • Application on C.briggsae • Detection of miRNA homologues - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded. • Identification of miRNAs in distantly related species - miRAlign was applied on C.briggsae’s data with training set Train_Sub_1 and sensitivity and specificity were recorded

Results (contd) Graph 1 -

Results (contd) Graph 2 -

Results (contd) Comparison of miRAlign with BLAST -

Results (contd) Comparison of miRAlign with ERPIN -

Results (contd) Other results: • miRAlign was applied to A. gambiae and 59 putative miRNAs with tss > 35 were detected . This was validated when 38 A. gambiae miRNAs were reported in the MicroRNA registry 6.0 and 37 of them were covered by miRAlign • miRAlign was also applied to plant, Zea mays and detected 28 out of 40 known Zea Mays miRNAs.

Conclusion • Combining sequence and structure alignments, miRAlign has better performance than previously reported homologue search methods • Although, mirAlign was based on animal data, the miRNAs predicted in Zea mays indicates that miRAlign can be applied to plants. Further investigation regarding this is underway.

THANK YOU Questions ??

MicroRNA identification based on sequence and structure alignment

MicroRNA identification based on sequence and structure alignment

Presentation Transcript

Sequence Alignment

Sequence Alignment

CUDA - Based Sequence Alignment

Multiple Sequence Alignment Based on Compact Set

Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory

MicroRNA identification based on sequence and structure alignment

Accuracy of structure-based sequence alignment of automatic (structure-alignment) methods

Sequence Alignment

Sequence Alignment

Sequence Alignment

Sequence Alignment

Sequence Alignment

Based on: MicroRNA identification based on sequence and structure alignment

Sequence alignment

Sequence Alignment

Sequence/Structure Alignment Resources from NCBI

Sequence Alignment

Sequence Alignment

Sequence Alignment

Sequence Alignment

Based on: MicroRNA identification based on sequence and structure alignment

Sequence alignment