Http://creativecommons.org/licenses/by-sa/2.0/
Download
1 / 67

creativecommons/licenses/by-sa/2.0/ - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

http://creativecommons.org/licenses/by-sa/2.0/. Bioinformatics. Prof:Rui Alves [email protected] 973702406 Dept Ciencies Mediques Basiques , 1st Floor , Room 1.08 Website:http :// web.udl.es / usuaris /pg193845/ testsite /

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' creativecommons/licenses/by-sa/2.0/' - kasen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Bioinformatics

Bioinformatics

Prof:Rui Alves

[email protected]

973702406

DeptCiencies Mediques Basiques,

1st Floor, Room 1.08

Website:http://web.udl.es/usuaris/pg193845/testsite/

CourseWebsite: http://web.udl.es/usuaris/pg193845/Courses/Bioinfo_Biotech_2011/


Language of the course
Language of the course

  • Mine: English

  • Slides: English

  • Webpage: English

  • Yours: Whichever you choose as long as I understand it. ALWAYS ASK WHEN YOU DON’T UNDERSTAND SOMETHING!!


Web page of the course
Web Page of the course

http://web.udl.es/usuaris/pg193845/Bioinfo_Biotech_2011/

  • There, youwillfindalltheinformationaboutyourtasks, links tobioinformaticsresources, and thelecture.

  • Itwillbe up fromtomorrowonwards.


Goals of this course
Goals of this course

  • Give you an integrated view of how to use computers and informatics to gain a systemic understanding of biological systems at the molecular level.

  • Integrate bioinformatics, mathematical modelling and other areas of computational biology to save lab work and address problems that can not yet be solved at the lab.


What this course will be
Whatthiscoursewillbe

  • A coursetoteachyouhowtothinkaboutproblem, not a coursetoteachyouhowto use programs.


Course plan
Course Plan

  • Firstpart of thecourse (2-3 weeks): Broadintroductiontobioinformatics and computationalbiology in molecular biology.

  • Secondpart of thecourse: Problemsforyoutosolve in group at home, + in-depthlecturesaboutthedifferentsubjectsyouneedtosolvetheproblems.


Evaluation plan
Evaluation Plan

  • 5 tasks in groups of four. At the end of each task you deliver a paper as a group. (overall, all tasks will account for 70% of final grade).

  • Final exam (with two sections) where a problem will be posed to each of you and you will have to outline how you would solve it (20%).

  • My discretion (10%).

  • CAUTION: YOU NEED TO HAVE AT LEAST 6 IN EACH TASK, AND IN EACH SECTION OF THE FINAL EXAM.


Index
Index

  • Why bioinformatics?

  • Ontologies & Classification schemes

  • Databases and servers


Why Bioinformatics?OrThings to do when it is raining and you want to have an integrated view about biological systems…

Prof:Rui Alves

[email protected]

973702406

DeptCiencies Mediques Basiques,

1st Floor, Room 1.08

Website:http://web.udl.es/usuaris/pg193845/

CourseWebsite: http://web.udl.es/usuaris/pg193845/Courses/Bioinfo_Biomed_2011/


What obvious problems do large scale sets create
What obvious problems do large scale sets create?

  • Imagine the 6 500 000 000 human beings born within the last 130 years and still alive.

  • By and large a majority of them has had and education.

  • What problems need solving to ensure that education?

1 – Organize Knowledge

2 – Organize its transmission

Knowledge


First problem organizing knowledge
First problem: organizing knowledge

  • We do not need to know all there is to know in order to be productive in society.

  • Furthermore we can not learn everything at the same time.

  • Problem: How to organize knowledge into bite-sized packages that can be consecutively parceled out, and from which one can build upon?


Organizing knowledge
Organizing knowledge

Communication

(Read, write, count)

Humanities

Sciences


Second problem organizing the transmission of knowledge
Second problem: organizing the transmission of knowledge

  • The school system is a way in which the most people can be trained with the least societal effort

Not effective


School and books are the servers and databases of educating people
School and Books are the servers and databases of educating people

Database

New Server:

You

Server

Users


Understanding biological systems
Understanding biological systems people

You’re WRONG!!!!!

I need more data!!! How do I plan whatto do now?

Hey, it’sraining!!! Whydon’twe try and figure outhowallthelittle molecular pieces in a cellworktogether?!?!?!


The omics revolution in molecular biology
The “omics” revolution in molecular biology people

  • Over many decades, a huge amount of biological data has accumulated.

  • Unlike the “KNOWLEDGE” we discussed before, this data is not well organized and the connections between the different parcels of data are obscure.

  • The omics revolution has compounded this problem 1000 fold because data now accumulates faster than ever.


What is the omics revolution in molecular biology
What is the “omics” revolution in molecular biology? people

  • The omics revolution is a period of about ten years in which several different technologies that can be applied to study the complement molecular landscape of cells!!!

    • Genomics

    • Proteomics

    • Metabolomics

    • Et caeteromics


Understanding biological systems1
Understanding biological systems people

I need more data!!! Whydon’ttheygiveitto me


The omics revolution in molecular biology1
The “omics” revolution in molecular biology people

  • (We!!) Biologists want the data to make sense and they (we) want it now!!!


Comparison between the two problems
Comparison between the two problems people

PeopleorganizedtheKnowledgetransmissionsystem and itsconnectionsovermilenia of trial and error.

Itisimpossibleforpeopletoorganizethebiologicalknowledgebroughtaboutbyomics in the20 yearsthathavepassedsincethebeginning of theomics era.


Why? people

  • Data is not well classified.

  • Data is not well connected.

  • Data is not well understood.

  • Not enough people to do it in a short amount of time.


New types of servers and databases are required for very fast organization and data mining
New types of servers and databases are required for very fast organization and data mining

Database

Server

Users

BIOINFORMATICS!!


What is bioinformatics
What is Bioinformatics? fast organization and data mining

  • Development and application of computational/informatic tools to the solution of biological problems

  • The Standard of internet Bioinformatics:

LAM P

E

R

L

I

N

U

X

P

A

C

H

E

Y

S

Q

L

H

P

Y

T

H

O

N

Operating system

Programing

language(s)

Internet server

Database server


The standards are changing
The standards are changing fast organization and data mining

  • JAVA facilitates that the servers launch a smaller number of processes by using the client’s machines for calculus and allowing for a larger number of simultaneous connections.

  • TOMCAT “talks” very well with JAVA.

LTM J

A

V

A

I

N

U

X

O

M

C

A

T

Y

S

Q

L

Operating system

Programing

language(s)

Internet server

Database server


What does a computer need to be effective
What does a computer need to be effective? fast organization and data mining

  • Well classified data

    • Ontologies, Classification schemes

  • Well organized data

    • Databases, servers

  • Good users


Index1
Index fast organization and data mining

  • Why bioinformatics?

  • Ontologies & Classification schemes

  • Databases and servers


Ontologies and classification schemes for data

Ontologies and classification schemes for data fast organization and data mining

Prof:Rui Alves

[email protected]

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:http://web.udl.es/usuaris/pg193845/testsite/


Biological classification schemes
Biological Classification Schemes fast organization and data mining

  • What is an Ontology (in the Biological sense)?

    A set of definitions of controlled vocabularies with hierarchical relationships to one another, that can easily be dealt with by computers


What are bio ontologies
What are Bio-Ontologies? fast organization and data mining

Biological Ontologies (Bio-ontologies) can be defined as a complex hierarchical structure in which biological concepts are described by their meanings (definitions) and relationships to each other.

There are many Bio-Ontologies available and in use by databases. The Plant Ontology, along with other ontologies such as the Gene Ontology, are included in the open source Open Biological Ontologies project at Sourceforge.

http://obofoundry.org/


The gene ontology
The Gene Ontology fast organization and data mining

The most well-known example of a bio-ontology is the Gene Ontology (GO; http://www.geneontology.org) which describes three biological domains: cellular component (where the gene product locates), molecular function (what the gene product does) and biological process (the cellular, developmental or physiological events the gene product is involved in).

GO are used to describe gene products. Because these descriptions are independent of species-specific nomenclature and uniformly applied, it is possible to make meaningful and efficient comparisons of genes across diverse taxa.


Three super categories of go
Three “Super Categories of GO fast organization and data mining

  • Molecular Function (what)

    • Tasks performed at the molecular level

  • Biological Process (why)

    • How it pertains to the organism

  • Cellular Component (where)

    • Its location


Example
Example fast organization and data mining

  • Gene Name: BRCA1

  • Molecular Function: protein binding

  • Biological Process: DNA Replication and Chromosome Cycle

  • Cellular Component: nucleus


Structure of go
Structure of GO fast organization and data mining

  • How to define the relationship between concepts?

  • Example: How to relate the terms: “cell” “nucleus” “membrane”


How is go annotated
How is GO Annotated? fast organization and data mining

  • Manual

    • Humans sifting through primary literature

  • Electronic

    • Assign GO Terms using already existing information in databases.


Evidence code for go annotation
Evidence Code for GO Annotation fast organization and data mining

IEA Inferred from Electronic Annotation

ISS Inferred from Sequence Similarity

IEP Inferred from Expression Pattern

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IPI Inferred from Physical Interaction

IDA Inferred from Direct Assay

RCA Inferred from Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred by Curator

ND No biological Data available

Detailed info available from: http://www.geneontology.org/doc/GO.evidence.html


How to use go in data analysis
How to use GO in data analysis fast organization and data mining

  • Simple Queries

  • Find over-represented GO categories in a list of genes

    • Search Biological “Themes”

  • Binning

    • Obtain a broad view of the distribution of major GO terms in a list of genes.

  • Clustering Genes on GO terms

    • Group together functionally related genes based on GO terms.


Go tools
GO Tools fast organization and data mining

  • NetFlix – Get GO Annotation

  • AmiGO – Browser and Simple Queries

  • GoTermMapper – Binning(Go Slim)

  • GeneToolBox –

    • Finding over-represented GO categories

    • Clustering based on similar GO terms

    • Query for Gene with Similar Function.


Go is not very good
GO is not very good fast organization and data mining

  • EC numbers

  • Protein classification schemes

  • TF classification schemes

  • Transport proteins classification schemes

  • Etc.


The ec number database
The EC number database fast organization and data mining


The brenda database
The BRENDA database fast organization and data mining


The tf classification database
The TF classification database fast organization and data mining


The signal transduction classification database
The signal transduction classification database fast organization and data mining


The transport proteins classification database
The transport proteins classification database fast organization and data mining

All these classifications are reminiscente of the Dewey classification system for books!!!! (Remember public libraries?)


A general protein classification database
A general protein classification database fast organization and data mining


How close are we to have good comprehensive universally used classifications
How close are we to have good, comprehensive & universally used classifications?

  • Far!!!!!

  • BMC Bioinformatics + Bioinformatics publish papers with proposals for new ontologies and classifications almost every month in one are or another of molecular biology.

  • Wet lab molecular biologists still not won to the cause of single name for single entity…

  • There is hope! The situation is much better than 5 years ago!!!


What does a computer need to be effective1
What does a computer need to be effective? used classifications?

  • Well classified data

    • Ontologies, Classification schemes

  • Well organized data

    • Databases, servers


Index2
Index used classifications?

  • Why bioinformatics?

  • Ontologies & Classification schemes

  • Databases and servers


Databases servers

Databases & Servers used classifications?

Prof:Rui Alves

[email protected]

973702406

Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08

Website:http://web.udl.es/usuaris/pg193845/testsite/


What is a database
What is a Database? used classifications?

  • A database is a collection of data organized in such a way that it is easy to store in a computer and to mine by appropriate software

  • A database is usually organized as a set of tables in which information about an object is stored

  • The tables are related to each other in different ways.


What does database technology allow
What does database technology allow? used classifications?

  • Making information useful

  • Avoiding "accidental disorganisation”

  • Making information easily accessible and integrated with the rest of our work


S tructured q uery l anguange
S used classifications?(tructured)Q(uery)L(anguange)

  • ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems.

  • SQL statements are used to retrieve and update data in a database.

  • Includes:

    • Data Manipulation Language (DML)

    • Data Definition Language (DDL)


Web databases
Web Databases used classifications?

  • Data is accessible through Internet

  • Have different underlying database models

  • Example: biological databases

    • Molecular data: NCBI, Swissprot, PDB, KEGG, GO

    • Protein interaction : DIP , BIND

    • Organism specific: Mouse , Worm, Yeast

    • Literature: Pubmed

    • Disease: OMIM


How to make databases useful
How to make databases useful used classifications?

  • Attach it to a server

  • Let people use to mine for knowledge


An example of wamp
An example of WAMP used classifications?

  • A simple bioinformatics class server


An example of wamp1
An example of WAMP used classifications?

  • A simple bioinformatics class server


An example of wamp2
An example of WAMP used classifications?

  • A simple bioinformatics class server


An example of wamp3
An example of WAMP used classifications?

  • A simple bioinformatics class server

Wireless


An example of wamp4
An example of WAMP used classifications?

  • The bioinformatics class server

Wireless


How close are we to have good comprehensive universally used data repositories
How close are we to have good, comprehensive & universally used data repositories?

  • Not far at all!!!!!

  • NCBI, KEGG, Protein databank, SGD, Uniprot,….

  • Problems:

    • Redundant data over many databases…

    • Conflicting information due to the use of different data sources, standards, and classifications


A glimpse at a useful present
A glimpse at a useful present used data repositories?

Relationaltools

Online analyticalprocessingtools

Data

warehouse

Applications

Data Sources


A glimpse of a useful present
A glimpse of a used data repositories?useful present


A glimpse of possible futures
A glimpse of possible futures used data repositories?


A glimpse of possible futures1
A glimpse of possible futures used data repositories?


The future
The future used data repositories?

  • Cloud computing

  • Distributed computation

  • Artificial inteligence methods to facilitate data search, analysis and mining


Summary
Summary used data repositories?

  • Why bioinformatics:

    • Because there is simply too much data out there for human being to deal with without computer assistance.

    • Because many of the calculations to extract knowledge from the data would take too long without computers.

  • How to do bioinformatics:

    • Organize data well using appropriate classification systems.

    • Use databases and server technology.


A glimpse at a useful present1
A glimpse at a useful present used data repositories?


ad