300 likes | 314 Views
Learn about protein sequences, domains, and structural elements. Discover how to distinguish protein families and analyze structural constraints, such as bond angles and hydrogen bonds, shaping complex 3D structures like alpha-helices and beta-sheets. Explore methods for predicting tertiary structures and searching structure databases.
E N D
CSE182-L6 Protein structure basics Protein sequencing CSE182
Announcements • Midterm 1: Nov 1, in class. • Assignment 2: Online, due October 20. CSE182
Distinguishing between families Assignment 2 CSE182
Profiles • Start with an alignment of strings of length m, over an alphabet A, • Build an |A| X m matrix F=(fki) • Each entry fki represents the frequency of symbol k in position i 0.71 0.14 0.28 0.14 CSE182
Scoring Profiles Scoring Matrix i k fki s CSE182
Psi-BLAST idea • Multiple alignments are important for capturing remote homology. • Profile based scores are a natural way to handle this. • Q: What if the query is a single sequence. • A: Iterate: • Find homologs using Blast on query • Discard very similar homologs • Align, make a profile, search with profile. CSE182
Pigeonhole principle again: • If profile of length m must score >= T • Then, a sub-profile of length l must score >= lT|/m • Generate all l-mers that score at least lT|/M • Search using an automaton • Multiple alignment: • Use ungapped multiple alignments only Psi-BLAST speed • Two time consuming steps. • Multiple alignment of homologs • Searching with Profiles. • Does the keyword search idea work? CSE182
Protein Domains • An important realization (in the last decade) is that proteins have a modular architecture of domains/folds. • Example: The zinc finger domain is a DNA-binding domain. • What is a domain? • Part of a sequence that can fold independently, and is present in other sequences as well CSE182
Domain review • What is a domain? • How are domains expressed • Motifs (Regular expression & others) • Multiple alignments • Profiles • Profile HMMs CSE182
Domain databases Can you speed up HMM search? CSE182
CS view of a protein • >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine). • MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL CSE182
Protein structure basics CSE182
Side chains determine amino-acid type • The residues may have different properties. • Aspartic acid (D), and Glutamic Acid (E) are acidic residues CSE182
Various constraints determine 3d structure • Constraints • Structural constraints due to physiochemical properties • Constraints due to bond angles • H-bond formation • Surprisingly, a few conformations are seen over and over again. CSE182
Alpha-helix • 3.6 residues per turn • H-bonds between 1st and 4th residue stabilize the structure. • First discovered by Linus Pauling CSE182
Beta-sheet • Each strand by itself has 2 residues per turn, and is not stable. • Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. • Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local interactions. CSE182
Domains • The basic structures (helix, strand, loop) combine to form complex 3D structures. • Certain combinations are popular. Many sequences, but only a few folds CSE182
3D structure • Predicting tertiary structure is an important problem in Bioinformatics. • Premise: Clues to structure can be found in the sequence. • While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals. • The PDB database is a compendium of structures PDB CSE182
Searching structure databases • Threading, and other 3d Alignments can be used to align structures. • Database filtering is possible through geometric hashing. CSE182
Trivia Quiz • What research won the Nobel prize in Chemistry in 2004? • In 2002? CSE182
Nobel Citation 2002 CSE182
Nobel Citation, 2002 CSE182
Mass Spectrometry CSE182
Enzymatic Digestion (Trypsin) + Fractionation Sample Preparation CSE182
Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182
Tandem MS Secondary Fragmentation Ionized parent peptide CSE182