1 / 36

Gene Discovery by use of MySQL

Gene Discovery by use of MySQL. Background – myself NsGene – DTU satellite Parkinson Disease (Affymetrix GeneChip) Analysis of fetal brain tissue Search for new protein families MySQL & bioinformatic tools. Background. Thomas Nordahl Petersen

thalia
Download Presentation

Gene Discovery by use of MySQL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Discovery by use ofMySQL • Background – myself • NsGene – DTU satellite • Parkinson Disease (Affymetrix GeneChip) • Analysis of fetal brain tissue • Search for new protein families • MySQL & bioinformatic tools

  2. Background • Thomas Nordahl Petersen • Chemist, Ph.D protein Crystallography, University of Copenhagen • Computational Scientist, SBI-AT (Hørsholm) • Prediction of protein structure, secondary structure, fold recognition, homology modeling • Bioinformatics - Gene discovery, NsGene Devolop novel cell and gene based products for the treatment of neurological diseases.

  3. ECT ProductsECT for Parkinson’s Disease • Growth of cells in a capsule matrix • The therapeutic protein be released directly in the relevant brain area • Safe delivery across the blood-brain-barrier • Michael J. Fox foundation granted US $3 million to support a clinical “proof-of-concept” (May 2004)

  4. Factor Products • Identification of novel genes by use of bioinformatics • NBN (GDNF family – potent neuroprotective effects) • Scanning the human genome or assembled protein sets for different features of interest

  5. A case study • Search for Parkinson related gene(s) • Affymetrix GeneChip experiments • Fetal brain tissue

  6. Parkinson DiseaseDegenerative central nervous system (CNS) disorder

  7. Parkinson DiseaseLoss of dopamine producing brain cells

  8. Parkinson’s Disease • Dopamine from Substantia nigra activates neurons in Striatum/Basal ganglia • Important for initiation of movement

  9. Cure for Parkinson’s Disease ? • Parkinson disease may be cured provided that new dopamine • producing cells replace the dead ones. • Dopamin producing brain cells from aborted foetuses have been • operated into the brain of parkinson patients and ín some cases • cured the disease. Brain tissue from approx 6 foetuses were needed. • Major ethical problems ! • Search for a protein drug is the only valid option

  10. Parkinson DiseaseDopamine producing cells • Dopaminergic neurons can be found in the ventral part of the mesencephalon (VM) from approximately 6 weeks • No dopaminergic neurons can be found in the neighbouring dorsal part (DM). • Dopaminergic differentiation by use of GeneChips to compare the expression profiles of VM and DM

  11. Fetal brain tissueMidbrain mesencephalon - Dopamine producing cells + Dopamine producing cells Vm Dm • Aborted feotus brain tissue – Karolinska hospital • Feotus of age 6-10 weeks, 2 cases

  12. Midbrain mesencephalon Dm Vm - Dopamine producing cells + Dopamine producing cells Dopamine producing cells at the interface ? Isolate the two samples (Vm/Dm) RNA purification + amplification Affymetrix genechip analysis

  13. GenePublisher(program by Steen Knudsen) • Scale, normalize the Affymetrix GeneChip experiments A1 A2 A2 B1 B2 B2 P-value 319 315 314 44 48 38 1.26e-07 314 334 327 443 434 444 6.55e-05 1980 1974 1973 1801 1785 1763 6.77e-05 123 123 126 87 88 93 8.01e-05 103 101 104 77 78 73 0.000112 107 107 111 79 77 82 0.000124 128 123 117 189 184 1960.000142 179 179 186 145 147 149 0.000191 78 77 79 86 87 87 0.000202 96 90 93 136 129 138 0.000215

  14. Vulcano plot P-value Log2 Fold change

  15. Unigene sequence (cDNA) 5’ 3’ Blast Blast inferred IPI protein sequence Affymetrix probe Assigning Affymetrix GeneChip probes to a protein sequence ~20.000 probes on each of the A/B Affymetrix chips. The probes are normally not a part of a protein sequence.

  16. Internal database

  17. Signal Peptide prediction

  18. Conclusion – so far • The most up-regulated genes include several ‘known’ genes like dopamine transporter (good positive control) • The most interesting genes are the ‘unknowns’ that were up-regulated in Vm. Futher analysis is ongoing. • Roland JR et al., Exp Neur (2006) Vol 198,2,427-437 • “Identification of novel genes regulated in the developing human ventral mesencephalon”

  19. A new growth factor family • Criteria • ‘Unknown’ family of protein sequences • Growth factor like (Cys-Cys, SigP) • Data source • Assembled protein set/genomic data • Search criteria are dynamic • Use of MySQL

  20. MySQL – a relational database language • Data are stored in tables as a ’black box’ • Data physically separated from user • Language is easy to read and understand • Complex search queries • Combine data in different tables/databases • Result can be obtained in seconds • Search criteria can be changed

  21. Parsing Blast files(Preparing data for MySQL) # Qname Dname MlenAlenQlen% a_id% q_id e-valueQfromQtoDlenDfrom Dto IPI00000001.1STAU_HUMAN577577577100.0 100.0 0.01 5775771 577 IPI00000005.1RASN_HUMAN189189189100.0 100.0 e-1061 1891891 189 IPI00000006.1RASH_HUMAN189189189100.0 100.0 e-1061 1891891 189 IPI00000009.1RASK_HUMAN189189189100.0 100.0 e-1061 1891891 189 IPI00000010.1RASL_HUMAN188188188100.0 100.0 e-10511881881 188 IPI00000012.3ZNT1_MOUSE86261240 33.0 35.8 1e-321230503248 500 IPI00000013.1CSL2_HUMAN334334334100.0 100.0 0.013343341 334 IPI00000015.2SFR4_HUMAN494494494100.0 100.0 0.014944941 494 IPI00000016.1LMA3_MOUSE114145145 78.6 78.6 9e-62114533331521 1665

  22. Storing data from blast alignments Field Type query_db enum('hs_2_18','hs_2_23','affym','mm_1_11','affym_mouse') query_acc varchar(20) target_db enum('swissp','mm_1_11','sid','sid_mouse’) target_acc varchar(20) align_len smallint(6) match_len smallint(6) query_len smallint(6) perc_align_lenfloat(5,1) perc_query_len float(5,1) minus_ln_e float(6,2) query_from smallint(6) query_to smallint(6) target_from smallint(6) target_to smallint(6) target_len int(11)

  23. MySQl example SELECT a.query_db,a.query_acc, a.target_db,a.target_acc, a.perc_align_len,a.minus_ln_e, b.target_db,b.target_acc, c.cleavage_site FROM blastdb AS a, blastdb AS b, signalp AS c WHERE a.query_db='hs_2_23' AND a.target_db = 'mm_1_11' AND a.target_acc != 'NULL' AND b.target_db='swissp' AND a.query_acc=b.query_acc AND b.target_acc='NULL' AND c.query_db='hs_2_23' AND c.query_acc = a.query_acc AND c.cleavage_site >= 15 AND c.cleavage_site<=45;

  24. Output from MySQL query_db query_acc target_db target_acc perc_align_len minus_ln_e target_db target_acc cleavage_site hs_2_23 IPI00000111 mm_1_11 IPI00223686 48.6 999.00 swissp NULL 35 hs_2_23 IPI00000183 mm_1_11 IPI00108107 74.0 999.00 swissp NULL 26 hs_2_23 IPI00000381 mm_1_11 IPI00128682 78.5 206.13 swissp NULL 21 hs_2_23 IPI00001001 mm_1_11 IPI00221700 91.7 173.39 swissp NULL 45 hs_2_23 IPI00001443 mm_1_11 IPI0022191360.0 17.73 swissp NULL 30 hs_2_23 IPI00001578 mm_1_11 IPI0012246688.8 207.93 swissp NULL 38 hs_2_23 IPI00001719 mm_1_11 IPI0012096183.1 52.27 swissp NULL 44 hs_2_23 IPI00001952 mm_1_11 IPI0022592176.0 999.00 swissp NULL 44 hs_2_23 IPI00002173 mm_1_11 IPI00112960 85.4 999.00 swissp NULL 42

  25. Clustering of protein sequencesTribe-mcl Store in MySQL 1) Cluster size 47306 sequences 13130 clusters 2) Cys-Cys 230 16 (3) 2 ACPGICSKSCCPF LTPALCSRTCCPY

  26. Conserved Cys-Cys • Many growth factor families have their own specific Cys-pattern,TGF-b family. • Transforming growth factor- is a multifunctional peptide that controls proliferation, differentiation and other functions in many cell types. • Search for Cys-pattern without any a priori knowledge

  27. Search criteria • Family cluster size > 1 • No SwissProt homologues • Cys count > 4 • Signal Peptide • Mouse homologue/orthologue • 48 Families • Manual inspection of alignments (- isoforms) • Upload remaining sequences to internal database

  28. Internal database

  29. Tissue-specific expression 100 bp ladder Thymus Thyroid gland Trachea Uterus Colon Small Intestine Spinal Cord Fetal Liver Fetal brain Pancreas Neurosphere ctrl dH2O 100 bp ladder 100 bp ladder Universal ref Whole brain Heart Kidney Liver Lung Placenta Prostate Salivary gland Skeletal muscle Spleen Testis 100 bp ladder

  30. Outcome from Gene Search • Family including 5 sequences • At least 8 Cys • Predicted as growth factors/hormones • ~125 – 140 amino acids

  31. Outcome from Gene Search • Family including 2 sequences - approx 30% seqid • 11 of 16 Cys are conserved • Effect on cultured neural cells

More Related