Mining the Genome Filip Železný ČVUT FEL, Prague Dept. of Cybernetics Gerstner Laboratory Intro Research at ČVUT FEL Dept. of Cybernetics Nature Inspired Technologies machine learning evolutionary computation Agent Computing Robotics Computer Vision EU Projects (6 FP)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
ČVUT FEL, Prague
Dept. of Cybernetics
size=small & luxury=low affordable
Plethora of paradigms
Learning = optimization in structure / parameter space
Learning = search
AI techniques employed (gradient descent, heuristic search)
What if examples have a structure?
Not an attribute tuple !
Description spread in multiple tables of a relational database
carcinogenic(Compound) IF has_atom(Compound, Atom) & type(Atom, carbon) & charge(Atom, Charge) & Charge > 0.0133 & has_atom(Compound, Atom2) & double_bond(Atom1, Atom2)
3 hot fields intersection
How does a cell know what to do?
Chromosomes get copied during mitosis
They carry the assembly instructions?
Chromosomes = proteins + DNA
where is the information ??
1953: Jim Watson & Francis Crick
Discover the DNA structure.
That is where the information is.
Guanin, Adenin, Cytosin, Tymin
Two common secondary structures
Primary structure determines secondary structure.
Computational problem:Given primary structure, predict if - sheet or - helix
NOBODY CAN DO THAT !
Using ILP, obtained rulessuch as
alpha0(A,B) ... position(A,D,O) & not_aromatic(O) & small_or_polar(O) & position(A,B,C) & very_hydrophobic(C) & not_aromatic(C) ...etc
All human genes sequenced
Celera X NIH race
annotate the genes
IMPOSSIBLE (TOO MUCH DATA)
Expression of almost entire genome(tens of thousands genes)
IF gene_20056 EXPRESSEDAND gene_23984 NOT_EXPRESSEDTHEN cancer_class = AML
Combining expression & gene annotation data
Rule Based Model
expressed_in_all(Gene) IF has_location(Gene, integral_to_membrane) & has_function(Gene, receptor_activity)
Expression of genescoding for proteinslocated in the integral to membrane cell component,whose functions include receptor activity, has a high correlation with the BCR class of acute lymphoblastic leukemia (ALL) and a low correlation with other classes of ALL.