1 / 19

Improving Intergenic miRNA Target Genes Prediction

Improving Intergenic miRNA Target Genes Prediction. Rikky Wenang Purbojati. miRNA. MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.

amelia
Download Presentation

Improving Intergenic miRNA Target Genes Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Intergenic miRNA Target Genes Prediction RikkyWenangPurbojati

  2. miRNA • MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. • It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.

  3. miRNA Characteristics • Short (22-25nts) • miRNA plays a major role in RNA Induced Silencing Complex (RISC). • miRNAs control the expression of large numbers of genes by: • mRNA degradation • Translational repression • Expression of miRNA will reduce the expression of its target genes • Intergenic miRNA gene is located outside gene bodies

  4. Basic miRNA problem • Finding miRNA true target genes is not a trivial task • One approach is to make a computational prediction before validating it in wet-lab experiments • one basic challenge of miRNA: Given a miRNA sequence, what is its target genes?

  5. miRNA sequence target prediction • Several requirements for matching: • Strong Watson-Crick base pairing of the 5’ seed (2-8 nts) • Conservation of the miRNA binding site across species • Local miRNA-mRNA interaction with positive balance of minimum free energy • Available tools for target genes prediction: PicTar, TargetScan, miRanda,microT, etc. • Most tool’s prediction does not complement each other, because they use different criteria

  6. Problem and Opportunity • Problem: Pure computational target genes prediction produces a lot of candidates • Most of them are not validated • Common assumption is that most of them are false positives • Can we shorten the list to include only the strong candidates ? • Opportunity: Lots of publicly available experimental dataset i.e. cDNA microarray, miRNA microarray, etc. • Use the dataset to computationally invalidate some of the target genes

  7. Assumptions • miRNA works by silencing target genes, thus miRNA gene and target genes should be anti-correlated • Intragenic miRNA are expressed along with the host gene. • a host gene should be anti-correlated with a target gene • Intergenic miRNA does not have a host gene, but its real target genes should be correlated together • The real target genes should be down-expressed whenever the intergenic miRNA is expressed.

  8. How to invalidate a target gene prediction • A target gene prediction can be invalidated by using a set of microarray datasets • For Intragenic miRNA target gene: • If a target gene’s expressions has no correlation with the host gene’s expression, we assume that the target gene does not influenced by the host gene • For Intergenic miRNA target gene: • If a target gene behaves inconsistently compared to other target genes, we assume that it might not be affected by the miRNA gene

  9. Filtering Intergenic miRNA Target Gene Prediction • Use a combination of 8 prediction tools to produce the initial predictions (union & intersection) • Use a collection of 190 microarray datasets to invalidate some of the predictions • Use a greedy method to approximate the final subset of high-confidence target genes

  10. Consistent Target Genes • We need to establish the meaning of consistent target genes • In this context, target gene A and target gene B is consistent if: • For all microarray datasets in which gene A is down-regulated, then gene B is also down-regulated

  11. Greedy Method • Given a set of target gene predictions, and a collection of microarray dataset: • We wanted to find: • The longest subset of consistent target genes • The highest number of down-regulated target genes in the subset

  12. Reasoning • Why we wanted to find: • The longest subset of consistent target genes? • Consistent target genes, on large number of microarray dataset with different experiments, might indicate that they are affected by a common factor, which may be microRNA • The longest subset ensures high probability of including the true target genes • The highest number of down-regulated target genes in the subset? • Since miRNA works by down-regulating target genes, it is desirable to find the largest subset of consistently down-regulated target genes

  13. Current Algorithm for i = 0 to K A <- G[i] SigA <- signature(A) Temp_Subset = {SigA} down = countDownExpressedMicroarray(A) for j = 0 to K B <- G[j] SigB <= signature(B) if SigA== SigB Temp_Subset U {SigB} end if end for if (length(Temp_subset) > length(Subset)) && (down > downexpr_cnt) subset = Temp_Subset downexpr_cnt = down end if end for

  14. Algorithm Limitations • The algorithm result might be biased based on the first pivot gene expression signature : • Might get stuck on local maxima • Can be solved by prioritizing, sorting of target gene down-expression value, or random selection of pivot gene • The subset is an approximation of high-confidence target genes, but it doesn’t necessarily include all real target genes (because of supporting data limitation)

  15. Benchmarking • Compare the performance with other prediction tools, based on: • Number of correct predictions (based on validated target genes) • Number of predictions • The algorithm will use an initial target predictions with: • 2, 3, and 4 prediction tools support

  16. Performance Comparison

  17. Sensitivity-Specificity Comparison

  18. Conclusion • In general, the approximation method shows better sensitivity compared to other prediction tools • Specificity can be improved by including only target gene that is supported by more than 2 prediction tools

  19. Further Work • Adjusting the scoring function to find the optimum balance between the length of the subset and the number of down-regulated target genes • Implementing a threshold on target gene signaturing to further reduce the specificity

More Related