slide1 n.
Download
Skip this Video
Download Presentation
מבוא לביואינפורמטיקה

Loading in 2 Seconds...

play fullscreen
1 / 42

מבוא לביואינפורמטיקה - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

... לקחת את הביולוגיה למימד חדש. מבוא לביואינפורמטיקה. בני שומר, נובמבר 2005. Exponential Growth Rate. Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries. ~200,000 Protein Entries.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'מבוא לביואינפורמטיקה' - shaw


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
exponential growth rate
Exponential Growth Rate

Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries.

200 000 protein entries
~200,000 Protein Entries

Currently stored in the UniProt database, with

70M amino acids.

ERK2 MAP Kinase

slide6

The whole genome of over 1500 viruses and 775 bacteria has been completely sequenced or is in progress…

Salmonella sp.

Bacteriophage T4

Haemophilus influenza

slide7

Trypanosoma brucei

Plasmodium falciparum

Leishmania major

Schizosaccharomyces pombe

…as well as some 400 eukaryotic genomes

of which 135 are of parasites, fungi and other lower forms.

slide8

Mitochondrion 3D CAT

More than 500 organelle genomes are in the databases

Mitochondria

Chloroplast

slide9

About 80 plants are being genome/EST sequenced or genetically mapped

Arabidopsis thaliana

slide11

3.2 Gb

~30,000 genes.

slide13

"לא צריך לצאת מפרופורציות

וצריך להישאר עם הראש על המתניים"

אלון מזרחי

slide14

Same Size Genome ~3Gb

  • About same number of genes (30,000)
  • Same gene contents
  • 85-90% similarity between genes (up to 98% similarity with apes)
from sequence to biology
From Sequence to Biology

Human

Zebrafish

HoxB4 local alignment

1460 1470 1480 1490 1500

AF3071 TGGGCAATTCCCAGAAATTAATGGCTATGAGTTCTTTTTTGATCAACTCA

:: ::::::: ::::::::::::: :::::::: : ::::::::::::

AF0712 TGTGCAATTCAAAGAAATTAATGGCCATGAGTTCCTATTTGATCAACTCC

180 190 200 210 220

1510 1520 1530 1540 1550

AF3071 AACTATGTCGACCCCAAGTTCCCTCCATGCGAGGAATATTCACAGAGCGA

:::::::: ::::: ::::: :: :: :::::::::::::: ::::::::

AF0712 AACTATGTGGACCCTAAGTTTCCACCCTGCGAGGAATATTCCCAGAGCGA

230 240 250 260 270

1560 1570 1580 1590 1600

AF3071 TTACCTACCCAGCGACCACTCGCCCGGGTACTACGCCGGCGGCCAGAGGC

::::::::::: ::::: :: : ::::: : ::: ::::::::

AF0712 CTACCTACCCAGT---CACTCTCCGG---ACTACTACAGCGCCCAGAGGC

280 290 300 310

1610 1620 1630 1640 1650

AF3071 GAGAGAGCAGCTTCCAGCCGGAGGCGGGCTTCGGGCGGCGCGCGGCGTGC

::: : ::::::: ::: :: :: : : ::: ::: :::

AF0712 AAGACCCCTCGTTCCAGCATGAGTCGATCTACCACCAGCGGTCGGGCTGC

320 330 340 350 360

Local, Global, Multiple…

slide19

>gi|28558768|sp|P53601|A4_MACFA Amyloid beta A4 protein precursor (APP)

(ABPP) (Alzheimer's disease

amyloid protein homolog) [Contains: Soluble APP-alpha

(S-APP-alpha); Soluble APP-beta (S-APP-beta); C99;

Beta-amyloid protein 42 (Beta-APP42); Beta-amyloid

protein 40 (Beta-APP40); C83; P3(42); P3(40);

Gamma-CTF(59) (Gamma-secretase C-terminal fragment 59);

Gamma-CTF(57) (Gamma-secretase C-terminal fragment 57);

Gamma-CTF(50) (Gamma-secretase C-terminal fragment 50);

C31]

Length = 770

Score = 1277 bits (3305), Expect = 0.0

Identities = 642/752 (85%), Positives = 643/752 (85%)

Query: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78

EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP

Sbjct: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78

Query: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138

ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ

Sbjct: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138

Query: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLXXXXXXXXX 198

ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPL

Sbjct: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLAEESDNVDS 198

Query: 199 XXXXXXXXXXWWGGADTDYADGSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 258

WWGGADTDYADGS

Sbjct: 199 ADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDEDDEDGDEVEEEA 258

Query: 259 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXCSEQAETGPCRAMISRWYFDVTEGKCAP 318

CSEQAETGPCRAMISRWYFDVTEGKCAP

Sbjct: 259 EEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYFDVTEGKCAP 318

slide25

Genome Level Annotation

Chromosome Oriented

Focus Position

Chromosome

Slider

Focus Area Overview

slide26

Genome Level Annotation

Focus Area Detailed View

slide27

Genome Level Annotation

Focus Area Basepair View

slide29

Protein properties

EX33 inflammation related GPCR analysis

slide30

Protein properties

EX33 inflammation related GPCR analysis

slide31

Secondary Structure Prediction

Garnier

. 10 . 20 . 30 . 40 . 50

MWNSSDANFSCYHESVLGYRYVAVSWGVVVAVTGTVGNVLTLLALAIQPK

helix HHHHHHH

sheet E E EEEEEEEEEEEEEEE EEEE E

turns TT TTTTT TTTT TTT T

coil CC CCCC

. 60 . 70 . 80 . 90 . 100

LRTRFNLLIANLTLADLLYCTLLQPFSVDTYLHLHWRTGATFCRVFGLLL

helix HHHHHHHH H

sheet EE EEEEEE EEEEEEEE E EEEE EEEEEEEE

turns T TT T TTTTTTT

coil C

. 110 . 120 . 130 . 140 . 150

FASNSVSILTLCLIALGRYLLIAHPKLFPQVFSAKGIVLALVSTWVVGVA

helix HH HHHHH HHHHHH

sheet EEEEEEE EEEE EEE EEEEEEEEEEEEEE

turns T TT T

coil C C CC C

. 160 . 170 . 180 . 190 . 200

SFAPLWPIYILVPVVCTCSFDRIRGRPYTTILMGIYFVLGLSSVGIFYCL

helix

sheet EEEEEEEEEEEE EEEEEEEEEEEE EEEEEE

turns T TTTTTTT T TT

coil CCCC C CC CC

Plotstructure

PredictProtein

slide33

Pattern and Motif Analysis

ID GATA_ZN_FINGER_1; PATTERN.

AC PS00344;

DT NOV-1990 (CREATED); NOV-1997 (DATA UPDATE); JUL-1998 (INFO UPDATE).

DE GATA-type zinc finger domain.

PA C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C.

NR /RELEASE=41.18,131945;

NR /TOTAL=99(61); /POSITIVE=99(61); /UNKNOWN=0(0); /FALSE_POS=0(0);

NR /FALSE_NEG=14; /PARTIAL=0;

CC /TAXO-RANGE=??E??; /MAX-REPEAT=2;

CC /SITE=1,zinc; /SITE=4,zinc; /SITE=15,zinc; /SITE=18,zinc;

DR O13412, AREA_ASPNG, T; O13415, AREA_ASPOR, T; P17429, AREA_EMENI, T;

Protein Families

slide36

Protein-Protein interaction

Data Sources:

Yeast Two Hybrid system

Triclosan - FabI

slide37

Protein-Protein interaction

Data Sources:

Surface Plasmon Resonance

Triclosan - FabI

slide38

Protein-Protein interaction

Data Sources:

Natural Language Processing

slide39

DNA Microarray

& Expression Analysis

slide40

Cloning,

Restriction

& Mapping

slide42

Linguistics &

Information systems