Bioinformatics lectures at rice university
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Bioinformatics lectures at Rice University PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

Bioinformatics lectures at Rice University. Li Zhang Lecture 1 Department of Bioinformatics and Computational Biology MD Anderson Cancer Center March-April , 2012. Contact information. Li Zhang Phone: 713-563-4298 (office) 713-962-6661 (cell) Email: [email protected]

Download Presentation

Bioinformatics lectures at Rice University

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bioinformatics lectures at rice university

Bioinformatics lectures at Rice University

Li Zhang

Lecture 1

Department of Bioinformatics and Computational Biology

MD Anderson Cancer Center

March-April, 2012


Contact information

Contact information

  • Li Zhang

  • Phone: 713-563-4298 (office)

    713-962-6661 (cell)

  • Email: [email protected]

  • URL: http://odin.mdacc.tmc.edu/~llzhang/RiceCourse/

  • Office location: FCT4.5034. Pickens Tower, 4th floor, MD Anderson Cancer Center.


Homework

Homework

  • There will be 2-3 assignments posted online.

  • All students are required to complete the assignments. Homework will be submitted at the beginning of class on the due date.

  • If circumstances beyond the student’s control arise and an assignment cannot be submitted on the due date, an instructor should be contacted prior to the due date. With an instructor’s permission, late homework may be accepted within one week of the due date.

  • All decisions will be made on an individual student basis and the final decision rests with the instructor assigning the homework. A penalty of 10 percentage points will be applied to late homework.


What is bioinformatics

What is bioinformatics?

  • Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software engineering, data mining, image processing, modeling and simulation, signal processing, discrete mathematics, control and system theory, circuit theory, and statistics, for generating new knowledge of biology and medicine, and improving & discovering new models of computation (e.g. DNA computing, neural computing, evolutionary computing, immuno-computing, swarm-computing, cellular-computing).

  • Commonly used software tools and technologies in this field include Java, XML, Perl, C, C++, Python, R, MySQL, noSQL, CUDA, MATLAB, and Microsoft Excel.


Focus area of this course

Focus area of this course

  • Reference book by in Pierre Baldi’s: “Bioinformatics: A machine learning approach” and a few key papers.

  • Introducing high throughput technologies that provide the data.

  • Machine learning algorithms and models to visualize and explore large datasets identify patterns & relationships.

  • Computing language: R/Perl.

  • Database: Non-relational database NoSQL.

  • Not focused web applications, no structural biology.


Why should we study bioinformatics

Why should we study bioinformatics?

Why it is important to study bioinformatics?


Let us see a few growth charts

Let us see a few growth charts …


Growth of pdb protein structures

Growth of PDB (Protein Structures)

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. Most structures are determined by X-ray diffraction, but about 15% of structures are determined by NMR.

Large scale organized efforts by Structural Genomics Initiative and International Structural Genomics Consortium have greatly accelerated the pace of growth.


Growth chart of geo rna etc

Growth Chart Of GEO (RNA etc)

Gene Expression Omnibus (GEO) database holdsover 10 000 experiments comprising 300 000 samples, 16 billionindividual abundance measurements, for over 500 organisms, submittedby 5000 laboratories from around the world. The database typicallyreceives over 60 000 query hits and 10 000 bulk FTP downloadsper day, and has been cited in over 5000 manuscripts.


Genbank growth chart dna sequences

GenBank growth chart (DNA sequences)

There are 126 billion bases in 135 million sequence records in the traditional GenBank divisions and 191 billion bases in 62 million sequence records in the WGS division as of April 2011.


Bioinformatics lectures at rice university

A brief history of the big bang of the digital universe


The age of big data

The age of big data

“The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. It’s a revolution. We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”

-------- By Steve Lohr, “The Age of Big data”, The New York Times, 2012.


What is big data

What is big data?

  • 3Vs of big data:

  • High volume,

  • High-velocity,

  • High-variety

  • --- A definition of big data, The Gartner Inc.

Simply put, it is big and complex.


The big value of big data

The big value of big data

The value of big data is that analysis of the big data can lead to

enhanced decision making,

insight discovery and

process optimization.

In business, big data can help to identify unknown needs, customize advertisement, monitor and evaluate operation, which leads to big profit and big saving. In science, big data is a huge resource for a lot of scientific discoveries.


Bioinformatics lectures at rice university

A brief introduction of molecular biology


Bioinformatics lectures at rice university

James Watson and Francis Crick

DNA


Next generation sequencing

Next generation sequencing


The cost of sequencing has reduced 100 thousand fold in the past 12 years

The cost of sequencing has reduced 100 thousand fold in the past 12 years


The little usb drive could do it

The little USB drive could do it

Oxford Nanopore, long the sleeper project to watch in the field of mapping DNA, just announced two products that could dramatically change the field of DNA sequencing: a new DNA sequencer that may be able to handle a human genome in 15 minutes, and a USB thumb drive DNA sequencer that can read DNA directly from blood with no prep work.

“‘Game changer’ is an understatement,” says George Church of Harvard University. (Church was one of the inventors on one of the patents licensed to Oxford Nanopore that led to the device.” He ticks off the devices specs: Tiny instruments for $900. Able to read DNA in 10,000-letter stretches — compared to a couple hundred for current technologies. Able to sequence a human genome in fifteen minutes (although you’d need 20 of the server-size devices coming in 2013, not just the USB stick.)


Bioinformatics lectures at rice university

Nanopore sensing


Data explosion in the era of genomics

Data explosion in the era of genomics

There have been a large series of breakthroughs in micro-electronics and nano-electronics that have produced instruments that quantify and/or characterize large number of biological molecules in parallel using very small mount of biomaterial.

Such technical advances have made possible to comprehensively characterize and quantify the building blocks (DNA, RNA, protein) in a biological system.


Think google

Think google …


Or think netflix

Or, think Netflix.


Bioinformatics is the key in genomics

Bioinformatics is the key in genomics


Genome genomics and post genomic era

Genome, genomics and post genomic era

List of sequenced genomes of mammals:


Large projects

Large Projects

  • TCGA: The cancer genome Atlas

  • 1000 Genome Project

  • 1001 Genome Project

  • ICGC: International cancer genome consortium

  • The International HapMap Project


Data information knowledge power

Data  Information  Knowledge/power

Bioinformatics provides tools to catalyze the transformations


Bioinformatics lectures at rice university

Ion semiconductor sensing

Ion Torent


  • Login