Computational method on biochemistry

Computational method on biochemistry 정진원

순서 • Protein Structure and Dynamics • Bioinformatics • Comparative modeling • Other method

Protein structure and dynamics • Time scale in biological phenomena • Newtonian mechanics • Force field • CHARMM • AMBER • Energy minimization • Molecular Dynamics • Example

Time scale in biological phenomena -15 ns ms ms s fs ps ~hr

Force field • 주어진 분자에서 각 원자들의 좌표-위치로부터 에너지를 정의. • 이 값은 분자의 상태를 모사하기 위해 수치화한 것이므로 실제 현상에서의 에너지와는 직접적인 관계는 없음.

Newtonian mechanics • F=ma • v=v0+at=f(t) • s=v0t+at2/2=g(t) • E=mv2/2 힘이 존재하고 시간이 흐르면 물체의 위치와 속도, 에너지는 변한다

Energy minimization

Energy minimization 구조를 최적화!!

Molecular Dynamics

Molecular Dynamics • Etot=Epot+Ekin

CHemistry at HARvard Macromolecular Mechanics • CHARMm forcefields • CHARMm, which derives from CHARMM (CHemistry at HARvard Macromolecular Mechanics), is a highly flexible molecular mechanics and dynamics program originally developed in the laboratory of Dr. Martin Karplus at Harvard University. It was parameterized on the basis of ab initio energies and geometries of small organic models. • Applicability • CHARMm performs well over a broad range of calculations and simulations, including calculation of geometries, interaction and conformation energies, local minima, barriers to rotation, time-dependent dynamic behavior, free energy, and vibrational frequencies (Momany & Rone, 1992). CHARMm is designed to give good (but not necessarily "the best") results for a wide variety of modelled systems, from isolated small molecules to solvated complexes of large biological macromolecules; however, it is not applicable to organometallic complexes.

Assisted Model Building with Energy Refinement • AMBER forcefield • The standard AMBER forcefield (Weiner et al. 1984, 1986) is parameterized to small organic constituents of proteins and nucleic acids. Only experimental data were used in parameterization. • However, AMBER has been widely used not only for proteins and DNA, but also for many other classes of models, such as polymers and small molecules. For the latter classes of models, various authors have added parameters and extended AMBER in other ways to suit their calculations. The AMBER forcefield has also been made specifically applicable to polysaccharides (Homans 1990, and see Homans' carbohydrate forcefield). • AMBER is used mainly for modeling proteins and nucleic acids. It is generally lower in accuracy and has a limited range of applicability. The use of AMBER is recommended mainly for those customers who are familiar with AMBER and have developed their own AMBER-specific parameters. It generally gives reasonable results for gas-phase model geometries, conformational energies, vibrational frequencies, and solvation free energies.

Application • protein motion • protein folding • enzyme mechanism • model optimization

In silico protein folding 1us=1,000,000,000 fs(or step) 644 step/sec on 256 CPUs CRAY machine

Simulation of the travel of potassium

Bioinformatics • Introduction • Sequence alignment • Pairwise sequence alignment • BLAST • Multiple sequence alignment • CLUSTALW • T-COFFEE • Scoring matrix • Structure Alignment • Example

Pairwise alignment • Smith-Waterman Algorithm • BLAST – local alignment • FASTA – global alignment

 A T C T C G T A T G A T G  0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 T 0 0 2 1 2 1 1 4 3 2 1 1 3 2 C 0 0 1 4 3 4 3 3 3 2 1 0 2 2 T 0 0 2 3 6 5 4 5 4 5 4 3 2 1 A 0 2 2 2 5 5 4 4 7 6 5 6 5 4 T 0 1 4 3 4 4 4 6 5 9 8 7 8 7 C 0 0 3 6 5 6 5 5 5 8 8 7 7 7 A 0 2 2 5 5 5 5 4 7 7 7 10 9 8 C 0 1 1 4 4 7 6 5 6 6 6 9 9 8 A T C T C G T A T G A T G G T C T A T C A C Smith-Waterman Algorithm Align S1=ATCTCGTATGATGS2=GTCTATCAC 0 0 0 0 0 0 2 1 0 0 2 1 0 2 2 =1, =1 4 3 5 7 9 8 10

BLAST • Basic Local Alignment Search Tool • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Journal of Molecular Biology v. 215, 1990, pp. 403-410 • Used to search sequence databases for local alignments to a query

BLAST algorithm • Keyword search of all words of length w from the in the query of length n in database of length m with score above threshold • w = 11 for nucleotide queries, 3 for proteins • Do local alignment extension for each found keyword • Extend result until longest match above threshold is achieved • Running time O(nm)

BLAST algorithm (cont’d) keyword Query: KRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRD GVK 18 GAK 16 GIK 16 GGK 14 GLK 13 GNK 12 GRK 11 GEK 11 GDK 11 Neighborhood words neighborhood score threshold (T = 13) extension Query: 22 VLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK 60 +++DN +G + IR L G+K I+ L+ E+ RG++K Sbjct: 226 IIKDNGRGFSGKQIRNLNYGIGLKVIADLV-EKHRGIIK 263 High-scoring Pair (HSP)

Original BLAST • Dictionary • All words of length w • Alignment • Ungapped extensions until score falls below some threshold • Output • All local alignments with score > statistical threshold

Original BLAST: Example A C G A A G T A A G G T C C A G T • w = 4 • Exact keyword match of GGTC • Extend diagonals with mismatches until score is under 50% • Output result • GTAAGGTCC • GTTAGGTCC C T G A T C C T G G A T T G C G A From lectures by Serafim Batzoglou (Stanford)

ClustalW • Popular multiple alignment tool today • Several heuristics to improve accuracy: • Sequences are weighted by relatedness • Scoring matrix can be chosen “on the fly” • Position-specific gap penalties

ClustalW (cont’d) • Often used for protein alignment • ‘W’ stands for ‘weighted’ • Different parts of alignment are weighted. • Position/residue specific gap penalties. • Three-step process 1.) Pairwise alignment 2.) Build Guide Tree 3.) Progressive Alignment

S1 S2 S3 S4 S1 - S2 .17 - S3 .87 .28 - S4 .59 .33 .62 - Step 1: Pairwise Alignment • Aligns each sequence again each other giving a distance matrix • Distance = exact matches / sequence length (percent identity) (.17 means 17 % identical)

Step 2: Guide Tree • Create Guide Tree using the distance matrix • ClustalW uses the neighbor-joining method • Guide tree roughly reflects evolutionary relations

S1 S2 S3 S4 S1 - S2 .17 - S3 .87 .28 - S4 .59 .43 .62 - Step 2: Guide Tree (cont’d) S1 S3 S4 S2 Calculate:s1,3 = consensus(s1, s3)s1,3,4 = consensus((s1,3),s4)s1,2,3,4 = consensus((s1,3,4),s2)

Step 3: Progressive Alignment • Align the two most similar sequences • Following the guide tree, add in the next sequences, aligning to the existing alignment • Insert gaps as necessary Sample output: FOS_RAT PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFD FOS_MOUSE PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD FOS_CHICK SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFD FOSB_MOUSE PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP-----------------LPFQ FOSB_HUMAN PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP-----------------LPFQ . . : ** . :.. *:.* * . * **: Dots and stars show how well-conserved a column is.

Scoring Matrix • BLOSUM • PAM • PSSM

PAM • Percentage of Acceptable point Mutations per 108 years • 어떤 아미노산이 임의의 아미노산으로 바뀔 수 있는 확률을 바탕으로 score 설정 • matrices are based on global alignments of closely related proteins. The PAM 1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Scores are derived from a mutation probability matrix where each element gives the probability of the amino acid in column X mutating to the amino acid in row Y after a particular evolutionary time, for example after 1 PAM, or 1% divergence. A PAM matrix is specific for a particular evolutionary distance, but may be used to generate matrices for greater evolutionary distances by multiplying it repeatedly by itself. However, at large evolutionary distances the information present in the matrix is essentially degenerated. It is rare that a PAM matrix would be used for an evolutionary distance any greater than 256 PAMs.

BLOSUM • Local alingment에 사용하기 위해 개발 • BLOcks SUbstitution Matrix • 일정정도의 유사한 서열들을 모아 정렬하고 그 안에서 치환되는 정도를 이용해서 scoring matrix작성 • BLOSUM 62는 유사성 62% 이상의 서열들을 모아서 작성한 것

Position Specific Scoring Matrix • 유사한 단백질간의 서열 정렬결과를 바탕으로 특성 아미노산이 특정 위치에 나타나는지의 여부를 점수화 • PSI-BLAST에서 사용하는 방법 • 특징적인 서열이나 잔기를 가지는 단백질에 대한 전역탐색에 적절

Homology/Comparative modeling • Introduction • Method • Example

Introduction • 유사한 기능을 지닌 단백질은 유사한 구조를 가지고 있음. • Ex) hemoglobin/myoglobin, ubiquitin/ubiquitin like proteins. Serine proteases, thioredoxin/glutaredoxin

Method • 30% 이상의 homology를 가진 단백질 중 구조가 있는 것 검색 • Pairwise or multiple sequence alignment • Alignment를 기준으로 구조를 따오거나 distance constraint작성. • Model 최적화

Example: Modeling of malonly-CoA synthetase

Firefly luciferase Malonyl-CoA synthetase

Other Methods • Simulated Annealing • Monte Carlos method • Docking

Computational method on biochemistry

Computational method on biochemistry

Presentation Transcript

Biochemistry

Biochemistry

Biochemistry

Biochemistry

On scientific method

The Finite Element Method Computational EM: 490D

Linear Programming – Simplex Method: Computational Problems

2D-PME method and REX-MS method - Application of computational chemistry -

Lectures on Computational Biology

WELCOME ON THE BIOCHEMISTRY LECTURES

Computational Steering on Grids

Computational Steering on Grids

Parallel Computational Biochemistry

Module on Computational Astrophysics

Biochemistry

Biochemistry

Biochemistry & Analytical Biochemistry

Biochemistry & Analytical Biochemistry

Biochemistry

Linear Programming – Simplex Method: Computational Problems