1 / 13

Tecniche di Intelligenza Artificiale in Bioinformatica

Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria. Tecniche di Intelligenza Artificiale in Bioinformatica. Giacomo Gamberoni. Data Mining in Bioinformatics. Genetic data from comparative experiments (normal-cancer)

Download Presentation

Tecniche di Intelligenza Artificiale in Bioinformatica

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Università degli Studi di FerraraENDIF – Dipartimento di Ingegneria Tecniche di Intelligenza Artificiale in Bioinformatica Giacomo Gamberoni

  2. Data Mining in Bioinformatics • Genetic data from comparative experiments (normal-cancer) • Data provided by Dipartimento di morfologia ed embriologia – Università di Ferrara (Dott. Stefano Volinia) • Software used: • Weka • Matlab • MySQL

  3. Slide is prepared fixing base sequences (ESTs) in specific points (spots) on the glass Hybridization of two mRNA samples from two cell populations coloured with different fluorescent dyes Microarray Experiments Scanning the slide, we measure fluorescence intensities of the two channels in each spot

  4. Dataset normalization • Keep only spots with good intensity in at least 75% of the samples • Log ratio: • Subtract the median of ratios in each spot • Divided by SD of each spot • Keep only spots with at least one sample significantly expressed (Log Ratio >1.5)

  5. Datasets analyzed • Hepatocellular Carcinoma • Reference: artificial mRNA pool • 7449 ESTs for 161 samples • 95 Cancer • 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies • 66 Normal • 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies • Larynx squamous cell carcinoma • Reference: normal larynx • 7626 ESTs for 22 samples • 11 lynph node negative (N0) • 11 lynph node positive (N+)

  6. Supervised/unsupervised learning • Supervised learning • Decision tree • Support vector machines • Unsupervised learning • Hierarchical clustering

  7. Results 358885 <= 0.719385542 | 740476 <= 0.856739394 | | 626619 <= 0.552788235 | | | 451711 <= -0.84774 | | | | 786690 <= -0.116917241: HBV+ (5.0) | | | | 786690 > -0.116917241: HBV- (4.0) | | | 451711 > -0.84774: HBV+ (107.0/1.0) | | 626619 > 0.552788235 | | | 310406 <= -0.162467: HBV- (6.0) | | | 310406 > -0.162467: HBV+ (12.0/1.0) | 740476 > 0.856739394 | | 344648 <= 0.051885057: HBV- (10.0) | | 344648 > 0.051885057: HBV+ (7.0/1.0) 358885 > 0.719385542: HBV- (10.0/1.0) • Decision tree • Clustering dendrogram

  8. Gene correlation • Analysis of correlation between expression of different genes • Study of the expression of every possible couple of genes • Computational complexity • Integration with extra knowledge • Genetic annotation (Gene Ontology) • Chromosome location

  9. Intra-gene relations • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation • Purpose • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation

  10. Relations in Processes • Study relations between the genes involved in the same biological processes • Biological processes as defined by the Gene Ontology • Highlight differences in gene correlations between normal and cancer • Purpose • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation

  11. Present Activities • Development of a web-based interface to make several algorithms available for biologists (PHP, JAVA) • Implementation of some algorithms as plug-ins of an open source analysis suite (JAVA) • Extension of our algorithms in order to analyze other data sources: • SAGE data • Affymetrix data

  12. Publications • Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Exploiting supervised and unsupervised learning techniques for profiling cancer data. Presented at Workshop: Data Mining in Functional Genomics and Proteomics in ECAI 2004. • Giacomo Gamberoni e Sergio Storari. Supervised and unsupervised learning techniques for profiling SAGE results. Presented at Discovery Challenge in ECML/PKDD 2004.

  13. Publications • Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Correlation of expression between different IMAGE clones from the same UniGene Cluster. Presented in ISBMDA 2004; published in Biological and Medical Data Analysis, Lecture Notes in Computer Science 3337, Springer.

More Related