...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

מבוא לביואינפורמטיקה PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

... לקחת את הביולוגיה למימד חדש. מבוא לביואינפורמטיקה. בני שומר, נובמבר 2005. Exponential Growth Rate. Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries. ~200,000 Protein Entries.

Download Presentation

מבוא לביואינפורמטיקה

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


3638901

...לקחת את הביולוגיה למימד חדש

מבוא לביואינפורמטיקה

בני שומר,

נובמבר 2005


Exponential growth rate

Exponential Growth Rate

Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries.


200 000 protein entries

~200,000 Protein Entries

Currently stored in the UniProt database, with

70M amino acids.

ERK2 MAP Kinase


3638901

The whole genome of over 1500 viruses and 775 bacteria has been completely sequenced or is in progress…

Salmonella sp.

Bacteriophage T4

Haemophilus influenza


3638901

Trypanosoma brucei

Plasmodium falciparum

Leishmania major

Schizosaccharomyces pombe

…as well as some 400 eukaryotic genomes

of which 135 are of parasites, fungi and other lower forms.


3638901

Mitochondrion 3D CAT

More than 500 organelle genomes are in the databases

Mitochondria

Chloroplast


3638901

About 80 plants are being genome/EST sequenced or genetically mapped

Arabidopsis thaliana


3638901

There are currently ~170 genome projects

of Metazoa


3638901

3.2 Gb

~30,000 genes.


3638901

"בעלות על מאגר של ידע, זהו אושר לא קטן"

סוקרטס


3638901

"לא צריך לצאת מפרופורציות

וצריך להישאר עם הראש על המתניים"

אלון מזרחי


3638901

  • Same Size Genome ~3Gb

  • About same number of genes (30,000)

  • Same gene contents

  • 85-90% similarity between genes (up to 98% similarity with apes)


3638901

Genes & Development Vol. 14, No. 20, pp. 2551-2569, October 15, 2000


3638901

The Basis for Bioinformatics


From sequence to biology

From Sequence to Biology

Human

Zebrafish

HoxB4 local alignment

1460 1470 1480 1490 1500

AF3071 TGGGCAATTCCCAGAAATTAATGGCTATGAGTTCTTTTTTGATCAACTCA

:: ::::::: ::::::::::::: :::::::: : ::::::::::::

AF0712 TGTGCAATTCAAAGAAATTAATGGCCATGAGTTCCTATTTGATCAACTCC

180 190 200 210 220

1510 1520 1530 1540 1550

AF3071 AACTATGTCGACCCCAAGTTCCCTCCATGCGAGGAATATTCACAGAGCGA

:::::::: ::::: ::::: :: :: :::::::::::::: ::::::::

AF0712 AACTATGTGGACCCTAAGTTTCCACCCTGCGAGGAATATTCCCAGAGCGA

230 240 250 260 270

1560 1570 1580 1590 1600

AF3071 TTACCTACCCAGCGACCACTCGCCCGGGTACTACGCCGGCGGCCAGAGGC

::::::::::: ::::: :: : ::::: : ::: ::::::::

AF0712 CTACCTACCCAGT---CACTCTCCGG---ACTACTACAGCGCCCAGAGGC

280 290 300 310

1610 1620 1630 1640 1650

AF3071 GAGAGAGCAGCTTCCAGCCGGAGGCGGGCTTCGGGCGGCGCGCGGCGTGC

::: : ::::::: ::: :: :: : : ::: ::: :::

AF0712 AAGACCCCTCGTTCCAGCATGAGTCGATCTACCACCAGCGGTCGGGCTGC

320 330 340 350 360

Local, Global, Multiple…


Elongation factor 1 alpha

Elongation Factor 1 alpha


3638901

>gi|28558768|sp|P53601|A4_MACFA Amyloid beta A4 protein precursor (APP)

(ABPP) (Alzheimer's disease

amyloid protein homolog) [Contains: Soluble APP-alpha

(S-APP-alpha); Soluble APP-beta (S-APP-beta); C99;

Beta-amyloid protein 42 (Beta-APP42); Beta-amyloid

protein 40 (Beta-APP40); C83; P3(42); P3(40);

Gamma-CTF(59) (Gamma-secretase C-terminal fragment 59);

Gamma-CTF(57) (Gamma-secretase C-terminal fragment 57);

Gamma-CTF(50) (Gamma-secretase C-terminal fragment 50);

C31]

Length = 770

Score = 1277 bits (3305), Expect = 0.0

Identities = 642/752 (85%), Positives = 643/752 (85%)

Query: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78

EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP

Sbjct: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78

Query: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138

ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ

Sbjct: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138

Query: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLXXXXXXXXX 198

ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPL

Sbjct: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLAEESDNVDS 198

Query: 199 XXXXXXXXXXWWGGADTDYADGSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 258

WWGGADTDYADGS

Sbjct: 199 ADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDEDDEDGDEVEEEA 258

Query: 259 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXCSEQAETGPCRAMISRWYFDVTEGKCAP 318

CSEQAETGPCRAMISRWYFDVTEGKCAP

Sbjct: 259 EEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYFDVTEGKCAP 318


3638901

SNP


3638901

RNA secondary structure prediction


Tf analysis

TF Analysis


3638901

Gene Analysis


3638901

Genome Level Annotation


3638901

Genome Level Annotation

Chromosome Oriented

Focus Position

Chromosome

Slider

Focus Area Overview


3638901

Genome Level Annotation

Focus Area Detailed View


3638901

Genome Level Annotation

Focus Area Basepair View


3638901

Genome Level Annotation

Gene Oriented


3638901

Protein properties

EX33 inflammation related GPCR analysis


3638901

Protein properties

EX33 inflammation related GPCR analysis


3638901

Secondary Structure Prediction

Garnier

. 10 . 20 . 30 . 40 . 50

MWNSSDANFSCYHESVLGYRYVAVSWGVVVAVTGTVGNVLTLLALAIQPK

helix HHHHHHH

sheet E E EEEEEEEEEEEEEEE EEEE E

turns TT TTTTT TTTT TTT T

coil CC CCCC

. 60 . 70 . 80 . 90 . 100

LRTRFNLLIANLTLADLLYCTLLQPFSVDTYLHLHWRTGATFCRVFGLLL

helix HHHHHHHH H

sheet EE EEEEEE EEEEEEEE E EEEE EEEEEEEE

turns T TT T TTTTTTT

coil C

. 110 . 120 . 130 . 140 . 150

FASNSVSILTLCLIALGRYLLIAHPKLFPQVFSAKGIVLALVSTWVVGVA

helix HH HHHHH HHHHHH

sheet EEEEEEE EEEE EEE EEEEEEEEEEEEEE

turns T TT T

coil C C CC C

. 160 . 170 . 180 . 190 . 200

SFAPLWPIYILVPVVCTCSFDRIRGRPYTTILMGIYFVLGLSSVGIFYCL

helix

sheet EEEEEEEEEEEE EEEEEEEEEEEE EEEEEE

turns T TTTTTTT T TT

coil CCCC C CC CC

Plotstructure

PredictProtein


3638901

3D Structure analysis


3638901

Pattern and Motif Analysis

ID GATA_ZN_FINGER_1; PATTERN.

AC PS00344;

DT NOV-1990 (CREATED); NOV-1997 (DATA UPDATE); JUL-1998 (INFO UPDATE).

DE GATA-type zinc finger domain.

PA C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C.

NR /RELEASE=41.18,131945;

NR /TOTAL=99(61); /POSITIVE=99(61); /UNKNOWN=0(0); /FALSE_POS=0(0);

NR /FALSE_NEG=14; /PARTIAL=0;

CC /TAXO-RANGE=??E??; /MAX-REPEAT=2;

CC /SITE=1,zinc; /SITE=4,zinc; /SITE=15,zinc; /SITE=18,zinc;

DR O13412, AREA_ASPNG, T; O13415, AREA_ASPOR, T; P17429, AREA_EMENI, T;

Protein Families


3638901

Pathway Analysis


3638901

Pathway Analysis


3638901

Protein-Protein interaction

Data Sources:

Yeast Two Hybrid system

Triclosan - FabI


3638901

Protein-Protein interaction

Data Sources:

Surface Plasmon Resonance

Triclosan - FabI


3638901

Protein-Protein interaction

Data Sources:

Natural Language Processing


3638901

DNA Microarray

& Expression Analysis


3638901

Cloning,

Restriction

& Mapping


3638901

PCR Design


3638901

Linguistics &

Information systems


  • Login