Visualization and Classification of DNA sequences using Pareto learning Self Organizing Maps based on Frequency and Correlation Coefficient. Hiroshi Dozono Saga University. Introduction (1).
Visualization and Classification of DNA sequencesusing Pareto learning Self Organizing Maps based on Frequency and Correlation Coefficient
T. Abe, T. Ikemura,et.al, Informatics for unreveiling hidden genome signatures, Genome Res., vol.13, p.693-702
Correlation Coefficients(CC) of DNA sequence
A 1000010010 ρAA(n) CC between A and n-shifted A
C 0101001000 ρAC(n) CC between A and n-shifted C
G 0010000001 :
T 0000100100 ρTT(n) CC between T and n-shifted T
For all combinations of A,G,T,C and from 1 to n shifts, 4x4xn correlation coefficients are calculated, and used as input vector of SOM.
Compared with dimension of n-tuples(4n), dimension of CC is much smaller.