Data Quality Issues: Traps & Pitfalls. Ashok Kolaskar Vice-Chancellor University of Pune, Pune 411 007. India [email protected] Cancer cell growth appears to be related to evolutionary development of plump fruits and vegetables.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Cornell University News, July 2000
Single gene, ORFX,that is responsible for QTL has a sequence and structural similarityto the human oncogene c-H-ras p21.
Fruit size alterations, imparted by fw2.2 alleles, are most likely to be due to the changes in regulationrather than in sequence/structure of protein.
Private sector holds
data of more than 100
finished & unfinished
Value of Genome Sequence Data quantitative variation
D. Nucleic acid sequence submitted to EMBL Data Library with no associated publication: protein sequence NOT displayed in paper:
When data from different sources are ‘merged’ into a single entry, any difference in the reported sequences are explicitly shown unless they are too extensive.
Signal to noise ratio
Tends to have been ignored in the excitement ? Cost ?
Becomes important when dealing with more subtle effects
Lab effects and scanning effects
- Other ways of visualizing the data which can also use information about rows and columns
- Local clustering which is not restricted to “rectangles”
- Genes in more than one cluster
- Clustering with prior information
- Analysis of experimental designs where the response is a vector of microarray data
Low resolution . . . High
Resolution in Å 4.0 3.5 3.0 2.5 2.0 1.5
Ratio of observations to Parameters 0.3 0.4 0.6 1.1 2.2 3.8
The median resolution of structures in the Protein Data Bank is about 2.0 Å .
Proper assessment of outliers (as features or errors) requires access to the experimental data. Sometimes,outliers warn of more serious problems and may require careful inspection of the electron-density maps and even model rebuilding by an experienced crystallographer.Unfortunately, not all errors can be fixed, even by appeal to structure factors and maps; some regions are fatally disordered.
Two factors dominate current developments in Bioinformatics: