680 likes | 824 Views
This document provides detailed information about the 14-3-3 Protein Beta/Alpha (14-3-3B) derived from Homo sapiens. The given sequence is presented in FASTA format, highlighting key features such as specific amino acid residues and their respective frequencies within the protein structure. Furthermore, the analysis portrays essential characteristics of the protein important in cellular regulation and signal transduction pathways. The sequence data is primed for further bioinformatics applications including alignment, structural studies, or protein interaction assessments.
E N D
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN
A sequence record in FASTA format >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN
seq = ">sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens \ MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS\ WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY\ LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY\ YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD\ AGEGEN" for i in seq: print i
seq = open("SingleSeq.fasta") for line in seq: print line
seq = open("SingleSeq.fasta") seq_2 = open("SingleSeq-2.fasta") for line in seq: seq_2.write(line) seq_2.close()
Writing different things depending on a condition Read a sequence in FASTA format and print only the header of the sequence >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN
seq = open("SingleSeq.fasta") for line in seq: if line[0] == '>': print line >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN
Making choices: The if/elif/else statements if <condition 1>: if expression in <condition1> is TRUE <statements 1> execute statements 1 [elif <condition 2>]: else if exp in <condition2> is TRUE <statements 2>] execute statements 2.... [elif <condition 3>]: etc... pass] … [else: <statements N>]
Write different things depending on a condition >>> s="MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAE" >>> s_len = float(len(s)) >>> G_num = s.count('G’) >>> A_num= s.count('A’) >>> freq_G = G_num/s_len >>> freq_A = A_num/s_len >>> print freq_G 0.116666666667 >>> print freq_A 0.216666666667
Write different things depending on a condition >>> s="MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAE" >>> s_len = float(len(s)) >>> G_num = s.count('G’) >>> A_num = s.count('A’) >>> freq_G = G_num/s_len >>> freq_A = A_num/s_len >>> print freq_G 0.116666666667 >>> print freq_A 0.216666666667 >>> if freq_G > freq_A: ... print "Gly is more frequent than Ala" ... eliffreq_G < freq_A: ... print "Ala is more frequent than Gly" ... else: ... print "The frequency of Gly and Ala is the same" ... Ala is more frequent than Glycines
The if/elif/else construct produces different effects compared with the use of a series ofif conditions
seq = open("SingleSeq.fasta") for line in seq: if line[0] == '>': print line seq = open("SingleSeq.fasta") for line in seq: if line[0] != '>': print line
seq = open("SingleSeq.fasta") for line in seq: if line[0] != '>': print line == != => <= > <
Exercises 1, 2, and 3 1) Read a file in FASTA format and write to a new file only the header of the record. 2) Read a file in FASTA format and write to a new file only the sequence (without the header). 3) Merge 1) and 2). In other words, read a file in FASTA format and write the header to a file and the sequence to a different one.
fasta = open('SingleSeq.fasta') header = open('header.txt', 'w’) for line in fasta: if line[0] == '>': header.write(line) header.close()
fasta = open('SingleSeq.fasta') seq= open('seq.txt','w') for line in fasta: if line[0] != '>': seq.write(line) seq.close()
fasta = open('SingleSeq.fasta') header = open('header.txt', 'w') seq = open('seq.txt','w') for line in fasta: if line[0] == '>': header.write(line) else: seq.write(line) header.close() seq.close()
seq_fasta = open("SingleSeq.fasta") seq = '' for line in seq_fasta: if line[0] == '>': header = line else: seq = seq + line.strip() num_cys = seq.count("C") print header, seq, num_cys
Exercise 4 4) Read a file in FASTA format. Print or write the record to a file only if the sequence is from Homo sapiens.
seq_fasta = open("SingleSeq.fasta") seq = '' header = '' for line in seq_fasta: if line[0] == '>': if "Homo sapiens" in line: header = line else: if header: seq = seq + line if header: print header + seq else: print "The record is not from H. sapiens"
SwissProt-Human.fasta >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN ... ... ...
Read the records from a file and write them to a new file fasta = open('SwissProt-Human.fasta') fasta_2 = open('SwissProt-Human_2.fasta', 'w') for line in fasta: fasta_2.write(line) this must be a string
Strings can be concatenated >>> print "ACTGGTA" + "ATGTAACTT" ACTGGTAATGTAACTT Strings can be indexed and sliced >>> s = "ACTGGTA" >>> s[0] 'A' >>> s[1:3] 'CT' String elements cannot be re-assigned >>> s[2] = 'Z' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment
Read the sequences from a file and write them to a new file Number the lines starting from 1 fasta = open('SwissProt-Human.fasta') fasta_2 = open('SwissProt-Human_2.fasta', 'w') n = 0 for line in fasta: n = n + 1 l_n = str(n) fasta_2.write(l_n + "\t" + line) fasta_2.close()
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN ... ... ...
Exercise 5 5) Download a Uniprot multiple sequence FASTA file. Write the record headers to a new file.
fasta = open('SwissProt-Human.fasta') headers = open('headers.txt', 'w') for line in fasta: if line[0] == '>': headers.write(line) headers.close()
>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN ... ... ...
Exercise 6 6) Read a multiple sequence FASTA file and write the sequences to a new file separated by a blank line
fasta = open('SwissProt-Human.fasta.fasta') seqs = open('seqs.txt', 'w') for line in fasta: if line[0] == '>’: seqs.write('\n') elifline[0] != '>': seqs.write(line) seqs.close() seqs.write(line.strip() + '\n’)
Exercise 7 7) Read a multiple sequence FASTA file and write to a new file only the records from Homo sapiens.
fasta = open('sprot_prot.fasta') output = open('homo_sapiens.fasta', 'w') seq = '' for line in fasta: if line[0] == '>' and seq == '': header = line elif line[0] != '>': seq = seq + line elif line[0] == '>' and seq != '': if "Homo sapiens" in header: output.write(header + seq) header = line seq = '' if "Homo sapiens" in header: output.write(header + seq) output.close()
Exercise 8 8) Read FASTA records from a file and count the cysteine residues in each sequence.
Read the records from a file and count the cysteine residues in each sequence fasta = open('sprot_prot.fasta') seq = '' for line in fasta: if line[0] == '>' and seq == '': header = line[4:10] elif line[0] != '>': seq = seq + line.strip() elif line[0] == '>' and seq != '': cys_num = seq.count('C') print header, ': ', cys_num header = line[4:10] seq = '' print header, ': ', cys_num
Exercises 9, 10, and 11 9) Read a multiple sequence file in FASTA format and write to a new file only the records of the sequences starting with a methionine ('M'). 10) Read a multiple sequence file in FASTA format and write to a new file only the records of the sequences having at least two tryptophan residues ('W'). 11) Read a multiple sequence file in FASTA format and write to a new file only the records the sequences of which start with a methionine ('M') and have at least two tryptophans ('W').
outfile = open('SwissProtHuman-Filtered.fasta','w') fasta = open('SwissProtHuman.fasta','r') seq = '' for line in fasta: if line[0:1] == '>' and seq == '': header = line elif line [0:1] != '>': seq= seq + line elif line[0:1] == '>' and seq != '': TRP_num= seq.count('W') if seq[0] == 'M' and TRP_num > 1: outfile.write(header + seq) seq= '' header = line TRP_num = seq.count('W') if seq[0] == 'M' and TRP_num > 1: outfile.write(header + seq) outfile.close()
In many cases you will need to compare data from different files
SwissProt-Human.fasta >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVF YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE EQNKEALQDVEDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARRSSW RVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCNDFQYESK VFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPIRLGLALNFS VFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDE EAGEGN ... ... ... cancer-expressed.txt
1) Read 10 SwissProt ACs from a file 2) Store them into a data structure cancer_file = open('cancer-expressed.txt') cancer_list = [] for line in cancer_file: AC = line.strip() cancer_list.append(AC) print cancer_list
List data structure A list is a mutableordered collection of objects The elements of a list can be any kind of object: numbers strings tuples lists dictionaries function calls etc. L = [1, [2,3], 4.52, ‘DNA’] L = [] The empty list
>>> L = [1,”hello”,12.1,[1,2,”three”],”seq”,(1,2)] >>> L[0] # indexing 1 >>> L[3] # indexing [1, 2, ’three'] >>> L[3][2] # indexing ‘three’ >>> L[-1] # negative indexing (1, 2) >>> L[2:4] # slicing [12.1, [1, 2, ‘three’]] >>> L[2:] # slicing shorthand [12.1, [1, 2, ‘three’], ‘seq’, (1, 2)] >>>
The elements of a list can be changed/replaced after the list has been defined These operations CHANGE the list >>> l = [2,3,5,7,8,['a','b'],'a','b','cde'] >>> l[0] = 1 >>> l [1, 3, 5, 7, 8, ['a', 'b'], 'a', 'b', 'cde'] >>> l[0:3] = 'DNA' >>> l ['D', 'N', 'A', 7, 8, ['a', 'b'], 'a', 'b', 'cde'] >>> del l[0:5] >>> l [['a', 'b'], 'a', 'b', 'cde'] >>> l.append('DNA') >>> l [['a', 'b'], 'a', 'b', 'cde', 'DNA'] >>> l.extend('dna') >>> l [['a', 'b'], 'a', 'b', 'cde', 'DNA', 'd', 'n', 'a'] >>>
The elements of a list can be changed/replaced after the list has been defined >>> l = [1,3,5,7,8,['a','b'],'a','b','cde'] >>> l.count(‘a’) >>> l 1 >>> l.index(8) 4 >>> l.insert(4, 80) >>> l [1, 3, 5, 7, 80, 8, [‘a’, ‘b’], ‘a’, ‘b’, ‘cde’] >>> l.pop(4) 80 >>> l [1, 3, 5, 7, 8, [‘a’, ‘b’], ‘a’, ‘b’, ‘cde’] >>> l.pop() ‘cde’ >>> l [1, 3, 5, 7, 8, [‘a’, ‘b’], ‘a’, ‘b’] >>> l.remove(8) [1, 3, 5, 7, [‘a’, ‘b’], ‘a’, ‘b’]
The elements of a list can be changed/replaced after the list has been defined >>> l = [4, 3, 2, 1, 5, 6, 7, 8] >>> l.reverse() >>> l [8, 7, 6, 5, 1, 2, 3, 4] >>> new = sorted(l) >>> new [1, 2, 3, 4, 5, 6, 7, 8] >>> l [8, 7, 6, 5, 1, 2, 3, 4] >>> l.sort() >>> l [1, 2, 3, 4, 5, 6, 7, 8]
Putting together lists and loops range()andxrange() built-in functions >>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> range(1, 11) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> range(0, 30, 5) [0, 5, 10, 15, 20, 25] >>> range(0, 10, 3) [0, 3, 6, 9] >>> range(0, -10, -1) [0, -1, -2, -3, -4, -5, -6, -7, -8, -9] >>> range(0) [] >>> range(1, 0) [] # the xrange()method ismore commonly used in for loops than range() >>>for i in xrange(5): … print i … 0,1,2,3,4 Thexrange()method generates the values upon call, i.e. it does not store them into a variable