1 / 1

What is Time-Frequency Approach (TFA)?

4:. 3:. Time-frequency approach (TFA) for fast robust DNA sequence comparison YangQuan Chen 1,2 , Huifang Dou 1,2,4 , Dong Chen 3 and Anhong Zhou 4 4160 Old Main Hill, Logan, Utah State University. E: yqchen@ece.usu.edu , W: www.csois.usu.edu , T: ( 435)797-0148; F: (435)7973054. 1:. 2:.

tahir
Download Presentation

What is Time-Frequency Approach (TFA)?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4: 3: Time-frequency approach (TFA) for fast robust DNA sequence comparisonYangQuan Chen1,2, Huifang Dou1,2,4, Dong Chen3 and Anhong Zhou44160 Old Main Hill, Logan, Utah State University. E: yqchen@ece.usu.edu, W: www.csois.usu.edu, T: (435)797-0148; F: (435)7973054 1: 2: Counting the number of the occurrence of “ACT”, and similarly for other 63 codons, a discrete signal of length 64 can be formed as [28,22,16,23,15,27,13,20,19,21,9,0,15,11,36,12,24,20,9,7,21,32,16,18,31,20,10,2,19,25,11,11,23,13,15,34,13,19,26,18,7,19,8,0,1,0,3,2,14,19,9,10,11,9,8,10,28,17,7,17,2,6,2]. (0 for no occurrence) What is Time-Frequency Approach (TFA)? TFA uses time-frequency representations (TFRs) to map a signal (1D) to a time-frequency image (2D) displaying how the frequency content of the signal changes over time. (from http://www-dsp.rice.edu/software/tfa-background.shtml) Time-frequency representations of a bat echolocation chirp (left): Spectrogram (1), Wigner TFR (2), Optimal radially Gaussian kernel TFR (3) horizontal axis: time (0-2.5ms), vertical axis: freq. (0-70kHz) Wigner-Ville TFR Codon time-frequency signal extracted unique to DNA sequence “gi|31982837”. (TIME-FREQ.) Codon time signal extracted unique to DNA sequence “gi|31982837”. (TIME) Codon spectrum signal extracted unique to DNA sequence “gi|31982837”. (FREQUENCY) Example: Compare DNA Sequences. Let us use DNA sequence #1 (“gi|31982837”) as the baseline DNA sequence. Using the correlation matrix, we can tell how cross-correlated with the other 4 DNA sequences. The following matrix if for DNA sequences #1, #2 (gi|397468), #3 (gi|52661), #4 (gi|3005102) and #5 (gi|37577257). #1 #2 #3 #4 #5 #1 1.0000 0.9818 0.7546 0.6807 0.3144 #2 0.9818 1.0000 0.7183 0.6943 0.3240 #3 0.7546 0.7183 1.0000 0.3502 0.1487 #4 0.6807 0.6943 0.3502 1.0000 0.4335 #5 0.3144 0.3240 0.1487 0.4335 1.0000 The rank of similarity is #1, #2, #3, #4, #5 - consistent to the known results! From the spectra and TFRs, we get consistent results, too. Genomic Signal Processing: Symbolic to Digital? Although the entire human genome sequence has been deciphered, the whole description of the human genome, which is roughly three billion characters in length, is waiting the meaningful interpretation of the genome sequence. It is believed that the interpretation of the genome sequences is one of the most exciting challenges today. Clearly, the genomic information is inherently discrete since the number of the nucleotides in the DNA alphabet is finite, i.e., adenine (A), thymine (T), cytosine (C), and guanine (G). It is very intuitive for some researchers to think of converting the symbolic DNA sequence into a discrete digital signal so that rich digital signal processing (DSP) techniques can be applied. Examples of such efforts can be found in [1, 2, 3]. However, the numerical assignment as in [1, 2, 3] to convert the symbolic DNA sequence into numerical values is a kind of arbitrary because this subjective assignment inevitably introduces certain mathematical structure to the symbolic DNA sequence which does not exist in fact naturally. The key question remains “What is the right numerical assignment” so that we can harness the DSP results in DNA sequence analysis. In this work, we will base on a new numerical assignment scheme [4] and introduce the time frequency approach (TFA) [5] for genomic signal processing. Note that, unlike the existing work, instead of using only time-domain or frequency domain, we here, for the first time, propose to use the joint time-frequency domain method [5]. Thus, our new method will open new research opportunities in DNA sequence analysis by borrowing various characterization techniques in TFA. We believe, this paper has created a new perspective to decode the information within any given DNA sequence with endless biological implications in various aspects of genomic research. The genetic code consists of 64 triplets of nucleotides. These triplets are called codons. Construct a 60 point digit signal from the given sequence, as illustrated below: (query codon “ACT” for “Seq_1, gi|31982837”) ATCACCCTTGCTAATCACTCCTCACAGTGACCTCAAGTCCTGCAGGCATGTACAGCATGCAGCTCGCATCCTGTGTCACATTGACACTTGTGCTCCTTGT CAACAGCGCACCCACTTCAAGCTCCACTTCAAGCTCTACAGCGGAAGCACAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCACCTGGAGCAGCTG TTGATGGACCTACAGGAGCTCCTGAGCAGGATGGAGAATTACAGGAACCTGAAACTCCCCAGGATGCTCACCTTCAAATTTTACTTGCCCAAGCAGGCCA CAGAATTGAAAGATCTTCAGTGCCTAGAAGATGAACTTGGACCTCTGCGGCATGTTCTGGATTTGACTCAAAGCAAAAGCTTTCAATTGGAAGATGCTGA GAATTTCATCAGCAATATCAGAGTAACTGTTGTAAAACTAAAGGGCTCTGACAACACATTTGAGTGCCAATTCGATGATGAGTCAGCAACTGTGGTGGAC TTTCTGAGGAGATGGATAGCCTTCTGTCAAAGCATCATCTCAACAAGCCCTCAATAACTATGTACCTCCTGCTTACAACACATAAGGCTCTCTATTTATT TAAATATTTAACTTTAATTTATTTTTGGATGTATTGTTTACTATCTTTTGTAACTACTAGTCTTCAGATGATAAATATGGATCTTTAAAGATTCTTTTTG TAAGCCCCAAGGGCTCAAAAATGTTTTAAACTATTTATCTGAAATTATTTATTATATTGAATTGTTAAATATCATGTGTAGGTAGACTCATTAATAAAAG TATTTAGATGATTCAAATATAAATAAGCTCAGATGTCTGTCATTTTTAGGACAGCACAAAGTAAGCGCTAAAATAACTTCTCAGTTATTCCTGTGAACTC TATGTTAATCAGTGTTTTCAAGAAATAAAGCTCTCCTCT • Observations: • This DNA sequence comparison method is very robust with respect to wrong characters with in the sequences. For example, some portions of the DNA sequence are missing some characters in the DNA sequence are not reliably obtained; we can randomly remove certain percentage of the characters within the sequence. Remarks: • The codon characteristic signal is bp-length independent! • Fast and reliable comparison of DNA sequences where wrong characters in DNA sequences can be regarded as the noise in digital signal. Many filtering (smoothing, interpolation, extrapolation) techniques can be applied in a brand new context for robust genomic information processing. • TFRs of the codon characteristic signal provide good visualization yet can provide more information such as the Renyi information measure etc. More treasures in TFTB [5] can be explored for specific biological research motivations. The possible biological applications can be briefly summarized, but not limited to, as follows: • Disease identification in time, frequency, or time-frequency domain • Possible to propose similarly the concept of “eigen-protein” for protein comparison and motif (3D structure) analysis. • Identifying the homologies between evolutionarily similar/dissimilar species; • DNA forensic analysis – “DNA finger print for dumbs”. References: • [1]. John A. Berger, Sanjit K. Mitra, Marco Carli and Alessandro Neri, “New Approaches to Genome Sequence Analysis Based on Digital Signal Processing”, In Proc. of the Workshop on Genomic Signal Processing and Statistics (GENSIPS), In cooperation with IEEE Signal Processing Society, Holiday Inn Hotel – Brownstone, Raleigh, North Carolina, USA, Oct. 11-13, 2002 • [2]. D. Anastassiou, "Frequency-Domain Analysis of Biomolecular Sequences," Bioinformatics, vol. 16, no. 12, December 2000, pp. 1073-1081. • [3]. Jaakko Astola, Edward Dougherty, Ilya Shmulevich and Ioan Tabus. Genomic signal processing Signal Processing (Elsevier), 83 (2003) 691-694. • [4]. Pandhir Korlapati and E. G. Rajan. “A “signature” spectral characterization of DNA sequences and its biological implications”. In Proc. of the 7-th Joint Conference on Information Sciences, Sept. 26-30, 2003, Research Triangle Park, NC, USA, pp. 967-972. • [5]. FrançoisAuger, Patrick Flandrin, Olivier Lemoine, Paulo Gonçalvès. “TFTB - Time-frequency Toolbox for MATLAB”. http://crttsn.univ-nantes.fr/~auger/tftb.html

More Related