file formats and conversions l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
File formats and conversions PowerPoint Presentation
Download Presentation
File formats and conversions

Loading in 2 Seconds...

play fullscreen
1 / 16

File formats and conversions - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

File formats and conversions. Important formats. How Fasta Raw/Peptide Tab. How. One or more entries First line Length of sequence (6 digits right aligned) Name of sequence Next lines Sequence, usually 80 characters pr line Last lines Assignments of the positions in the sequence.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'File formats and conversions' - jabir


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
important formats
Important formats
  • How
  • Fasta
  • Raw/Peptide
  • Tab
slide3
How
  • One or more entries
    • First line
      • Length of sequence (6 digits right aligned)
      • Name of sequence
    • Next lines
      • Sequence, usually 80 characters pr line
    • Last lines
      • Assignments of the positions in the sequence
how file
How file

553 ATP0_BOVIN_1E79.C

MLSVRVAAAVARALPRRAGLVSKNALGSSFIAARNLHASNSRLQKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDG

IARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDG

KGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKK

KLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVA

YRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELF

YKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMA

IEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLSKIRTDGKISEESDAKLKEIVTNFLAGFEA

-------------------------------------------------------------...SS.TTTEEEEEEEETT

EEEEEE.TT.BTTEEEEETTS.EEEEEEE.SS.EEEEESS.GGG..TT.EEEEEEEESEEE.SGGGTT.EE.TTS.B.SS

S.....S.EEETT.....STTB....SB...S.HHHHHHS..BTT.B.EEEESTTSSHHHHHHHHHHHTHHHHSSS.GGG

..EEEEEEES..HHHHHHHHHHHHHHT.GGGEEEEEE.TTS.HHHHHHHHHHHHHHHHHHHHTT.EEEEEEETHHHHHHH

HHHHHHHTT....GGGS.TTHHHHHHHHHTT..BB.GGGTS.EEEEEEEEE.STT.TTSHHHHHHHTTSSEEEEE.HHHH

HHT.SS.B.TTT.EESSGGGGS.HHHHHHHTTHHHHHHHHHHHHHHHTT.....HHHHHHHHHHHHHHHHT...SS....

HHHHHHHHHHHHTSTTTTS.GGGHHHHHHHHHHHHHHH.HHHHHHHHHHTS..HHHHHHHHHHHHHHHHHHH.

fasta
Fasta
  • One or more entries
    • First line
      • The character “>”
      • The name
      • Optional descriptions not read by all readers
    • Rest of lines
      • The sequence usually 50-80 characteres per line
raw peptide
Raw/peptide
  • Short sequences
  • One peptide per line
tab format
Tab format
  • One or more entries
    • One entry per line
    • Tab delimited fields
      • Name
      • Sequence
      • Assignments/features
converters
Converters
  • Saco_convert
    • From/To
      • How
      • Fasta
      • Tab
  • Makefsa
    • Raw peptides to fasta peptides
databases ready for blast
Databases - ready for BLAST
  • SwissProt
  • PDB
  • GenBank
  • nr
    • Non redundant set of proteins from the above plus TREMBL, PIR and others
  • sptr_nrdb
    • Non redundant set of proteins from SwissProt and TREMBL
blast routines single search
BLAST routines - single search
  • blastp
    • aadb aaquery
  • blastn
    • ntdb ntquery
  • blastx
    • aadb ntquery
  • tblastn
    • ntdb aaquery
  • tblastx
    • ntdb ntquery
blastpgp iterative blast
Blastpgp - iterative blast
  • Repetetive searches with AA query through an AA database
  • Results in hits plus an optional position specific scoring matrix
the actual search
The actual search
  • Query is single file in FASTA format
  • Costum databases need to be initially formatted from sets in FASTA format
    • Use setdb program for protein sequence databases (i.e., blastp and blastx)
    • Use pressdb program for nucleotide sequence databases (i.e., blastn and tblastn)
    • Use formatdb for blastpgp (psiblast)
conversion exersise
Conversion exersise
  • Convert the file A1.rsee.test to fasta format
  • Convert the file ss_sub300.how to fasta format
blast
Blast
  • Take the first entry in ss_sub300.how and blastp it against ss_sub300.how and PDB
  • Make a position specific scoring matrix for the entry using psiblast and nr and save the profile as binary and readable matrices
  • Use the binary matrix to search against PDB and ss_sub300.how