1 / 32

Searching (and manipulating) your data

Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGG GACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC

adolfo
Download Presentation

Searching (and manipulating) your data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching (and manipulating) your data

  2. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC UGG : W > A06662_protein W

  3. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC GAC : D > A06662_protein WD

  4. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC CAG : Q > A06662_protein WDQ

  5. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC UCA : S > A06662_protein WDQS

  6. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC GCA : A > A06662_protein WDQSA

  7. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC GAG : E > A06662_protein WDQSAE

  8. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC GCA : A > A06662_protein WDQSAEA

  9. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC GCG : A > A06662_protein WDQSAEAA

  10. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC UGU : C > A06662_protein WDQSAEAAC

  11. codonAMINO = {'GCU':'A','GCC':'A','GCA':'A', 'GCG':'A', 'CGU':'R','CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S','AGC':'S’ 'AUU':'I','AUC':'I','AUA':'I','AUU':'I','AUC':'I','AUA':'I', 'UUA':'L','UUG':'L','CUU':'L','CUC':'L','CUA':'L','CUG':'L', 'GGU':'G','GGC':'G','GGA':'G', 'GGG':'G', 'GUU':'V','GUC':'V','GUA':'V','GUG':'V', 'ACU':'T','ACC':'T','ACA':'T','ACG':'T', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P', 'AAU':'N','AAC':'N', 'GAU':'D','GAC':'D', 'UGU':'C','UGC':'C', 'CAA':'Q','CAG':'Q', 'GAA':'E','GAG':'E', 'CAU':'H','CAC':'H', 'AAA':'K','AAG':'K', 'UUU':'F','UUC':'F', 'UAU':'Y', 'UAC':'Y', 'AUG':'M', 'UGG':'W', 'UAG':'STOP', 'UGA':'STOP', 'UAA':'STOP' }

  12. codonAMINO = {'GCU':'A','GCC':'A','GCA':'A', 'GCG':'A', 'CGU':'R','CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S','AGC':'S', 'AUU':'I','AUC':'I','AUA':'I','AUU':'I','AUC':'I','AUA':'I', 'UUA':'L','UUG':'L','CUU':'L','CUC':'L','CUA':'L','CUG':'L', 'GGU':'G','GGC':'G','GGA':'G','GGG':'G','AAU':'N','AAC':'N', 'GUU':'V','GUC':'V','GUA':'V','GUG':'V','GAU':'D','GAC':'D', 'ACU':'T','ACC':'T','ACA':'T','ACG':'T','UGU':'C','UGC':'C', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P','CAA':'Q','CAG':'Q', 'GAA':'E','GAG':'E','CAU':'H','CAC':'H','AAA':'K','AAG':'K', 'UUU':'F','UUC':'F','UAU':'Y','UAC':'Y','AUG':'M','UGG':'W', 'AUG':'START','UAG':'STOP', 'UGA':'STOP', 'UAA':'STOP' } >>>codonAMINO['GCU'] 'A' >>>codonAMINO['AUG'] ’START’ >>> for k in codonAMINO.keys(): ... print k, codonAMINO[k] GUC V AUA I GUA V GUG V ACU T AAC N etc.

  13. Dictionaries Dictionaries are unordered collections of objects Dictionaries are structures for mapping immutable objects (keys) on arbitrary objects (values) d = {key1:value1, key2:value2,…,keyN:valueN} lists and dictionaries cannot be used as dictionary keys!!!! keys must be unique, i.e. the same key cannot be associated to more than one value

  14. >>> d = {'pep1':'MGSNKSKPKDASQRRRSLEPAENVHGAGG', \ 'pep2':'RSLEPAENVHGAGGGAFPASQTPS'} >>> len(d) 2 >>> d[‘pep1’] 'MGSNKSKPKDASQRRRSLEPAENVHGAGG’ >>> d['pep3'] = 'ASADGHRGPSAAFAPAAA' >>> d {'pep1' : 'MGSNKSKPKDASQRRRSLEPAENVHGAGG', 'pep2' : 'RSLEPAENVHGAGGGAFPASQTPS', ‘pep3’ : 'ASADGHRGPSAAFAPAAA'}

  15. >>> del d[‘pep2’] >>> d {'pep1' : 'MGSNKSKPKDASQRRRSLEPAENVHGAGG', ‘pep3’ : 'ASADGHRGPSAAFAPAAA'} >>> d.clear() >>> d { }

  16. >>> dict = {“a”:1, “b”:2, “c”:3} >>> dict.keys() #list of dictionary keys [‘a’, ‘c’, ‘b’] >>> keys = dict.keys() >>> keys.sort() #sort keys [‘a’, ‘b’, ‘c’] >>> dict.values() #list of dictionary values [1, 3, 2] >>> dict.items() #tuple of dictionary (key,value) pairs [(‘a’, 1), (‘c’, 3), (‘b’, 2)] >>> dict.has_key(“a”) #True if dict has key “a”, else False True

  17. Exercise Using the codonAMINO dictonary from tgac.py translate the sequence in rna_seq.fasta. Start with a single reading frame. Then try all reading frames.

  18. for line in F: if line[0] == '>': header = line.split() geneID = header[0] Out.write(geneID + '_protein\n') else: seq = seq + line.strip() prot = '' for i in range(0,len(seq),3): if codonAMINO.has_key(seq[i:i+3]): prot = prot + codonAMINO[seq[i:i+3]] else: prot = prot + '*' Out.write(prot + '\n')

  19. F = open('rna_seq.fasta') Out = open('protein_seq.fasta','w') seq = '' for line in F: if line[0] == '>': header = line.split() geneID = header[0] Out.write(geneID + '_protein\n') else: seq = seq + line.strip() from tgac import codonAMINO prot = '' for j in range(3): Out.write(str(j) + "-frame\n") for i in range(j,len(seq),3): if codonAMINO.has_key(seq[i:i+3]): prot = prot + codonAMINO[seq[i:i+3]] else: prot = prot + '*' Out.write(prot + '\n') prot = ''

  20. Remove redundancy

  21. How many different objects? How many unique objects?

  22. Are the two groups identical? What is the intersection of the two groups?

  23. Q5XXA6 Q9Y5P2 Q14667 O75387 Q8WV07 Q8CH62 Q9GZY1 Q9NQQ7 Q8VCX2 Q7Z769 Q8CH62 Q14667 Q9NQQ7 Q14667 Q9Y5P2 Q7Z769 Q8CH62 Q9GZY1 Q9NQQ7 Q14667 Q5XXA6 Q9Y5P2 Q14667 O75387 Q9Y5P2 Q8WV07 Q8VCX2 Q8CH62 Q14667 Q9NQQ7

  24. Sets Sets are unordered collections of unique objects they are not sequence-like objects and that they cannot contain identical elements • Sets do not support indexing and slicing • in and not in operators can be used to test an element for membership in a set. • Sets are useful for removing duplicates • Set operations: intersection, union, difference, symmetrical difference

  25. Create a new set In order to create a set, the method set(x) must be used, where x is a sequence-like object (string, tuple, list) add(x) update(x)

  26. S1.union(S2) The union between 2 sets S1 and S2 creates a new set with the elements from both S1 and S2. >>> S1 = set(['a','b','c']) >>> S2 = set (['c','d','e']) >>> S1.union(S2) set([‘a’, ‘c’, ‘b’, ‘e’, ‘d’]) >>> S1 | S2 set([‘a’, ‘c’, ‘b’, ‘e’, ‘d’])

  27. S1.intersection(S2) The intersection of 2 sets S1 and S2 creates a new set with the elements common to S1 and S2 >>> S1 = set(['a','b','c']) >>> S2 = set (['c','d','e']) >>> S1.intersection(S2) set([‘c’]) >>> S1 & S2 set([‘c’])

  28. S1.symmetric_difference(S2)or S1 ^ S2 Symmetric difference of two sets S1 and S2 creates a new set with elements in either S1 or S2 but not both >>> S1 = set(['a','b','c']) >>> S2 = set (['c','d','e']) >>> S1.symmetric_difference(S2) set([‘a’, ‘b’, ‘e’, ‘d’]) >>> S1 ^ S2 set([‘a’, ‘b’, ‘e’, ‘d’])

  29. S1.difference(S2) or S1 - S2 The difference of two sets S1 and S2 creates a new set with elements in S1 but not in S2 >>> S1 = set(['a','b','c']) >>> S2 = set (['c','d','e']) >>> S1.difference(S2) set([‘a’, ‘b’]) >>> S1 - S2 set([‘a’, ‘b’]) >>> S2 – S1 set([‘e’, ‘d’])

More Related