1 / 21

Acelerando la bioinformatica con el GRID computing

Acelerando la bioinformatica con el GRID computing. Angel Merino Centro Nacional de Biotecnología, Unidad de Biocomputación. Qué contar …. Microscopia Electrónica Qué es la EM. Cuál es el proceso de trabajo.

finn
Download Presentation

Acelerando la bioinformatica con el GRID computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acelerando la bioinformaticacon el GRID computing Angel Merino Centro Nacional de Biotecnología, Unidad de Biocomputación

  2. Qué contar …. • Microscopia Electrónica • Qué es la EM. • Cuál es el proceso de trabajo. • Que se está resolviendo con la GRID: Procesos/Aplicaciones que se han “gridificado” • Maximum Likelihood • Estimación de la CTF • Superando la barrera de potencial • Web-portal • Web/Grid Services & Workflows • Otras aplicaciones del mundillo

  3. Que es la EM (I) • La EM es una técnica de análisis estructural. • Nos permite adentrarnos en el entorno molecular de las partículas a estudiar.

  4. Cual es el proceso de trabajo Procesado de las imágenes y cálculo de volúmenes 3D Preparación de muestras. Obtención de las imágenes.

  5. Biological Material - High H2O content - Elevated radiation damage • Negative Tint • Dehydration • - Structural changes / Crushing • Image comes from metal mold • Cryomicroscopy • - Hydrated / Biologic-friendly • - Less distorsions • Image comes from biological • specimen Que es la EM (II)

  6. Que es la EM (III) Tinción negativa Criomicroscopía

  7. Estimación de la CTF (I) Estimation of the CTF allows correction of the blurred images. Aberrations in the microscope optics affect the experimental images (blurring). These effect may be described by the CTF. CTF-estimation in Xmipp may take up to half a day per micrograph. Moreover per experiment, a user processes about 100 micrographs. Therefore, grid computing is necessary.

  8. Estimación de la CTF(II)

  9. Estimación de la CTF (III) Por micrografía

  10. 1000x Maximum-Likelihood

  11. Maximum-Likelihood (I) Ejecución “lenta” 1 iteración

  12. Maximum-Likelihood(II) Ejecución “rapida” (MPI)

  13. Desarrollo de Maximum-Likelihood usando EGEE-GRID vs local cluster Usando EGEE GRID Durante el pasado mes de Noviembre se consumieron 17160 horas de CPU (casi 2 años!) 23 CPUs tiempo completo Tiempo de uso real = 50%del tiempo total debido a la actividad de desarrollo que se estaba realizando Grid 46 CPUs!!! Usando nuestro cluster local (50%) (jumilla.cnb.uam.es), para la misma actividad 20 cpu´s

  14. Superando la barrera de potencial 4 simple steps to run all jobs that you need for your experiment 2º Login into the UI 1º Select your application 3º Upload your necessary files 4º Submit your experiment, giving a notification e-mail address and your password certificate

  15. Superando la barrera de potencial (I) El motor del portal JDLs Input from Grid portal For each JDL C++ Object Required scripts (3) Required input tar´s Second script Run the job and publish the output data when job finishes. First script Third script Get Output and retrieve the output data. Checking status Submit job and publish the data(first time) Done (success) Aborted or not submitted Send e-mail to the notification e-mail address

  16. Superando la barrera de potencial (II) Workflows & Grid Services

  17. Otras aplicaciones Grid Protein Structure Analysis Scientific objectives Bioinformatic analysis of data produced by complete genome sequencing projects is one of the major challenge of the next years. Integrating up-to-date databanks and relevant algorithms is a clear requirement of such an analysis. Grid computing, such as the infrastructure provided by the EGEE European project, would be a viable solution to distribute data, algorithms, computing and storage resources for Genomics. Providing bioinformatician with a good interface to grid infrastructure will also be a challenge that should be successful. GPS@ web portal, Grid Protein Sequence Analysis, aims to be such an user-friendly interface for these grid genomic resources on the EGEE grid. Method A well-known web interface eases the access to the algorithms offered. Protein databases are stored on grid storage as flat files. Most protein sequence analysis tools are reference legacy code that is run unchanged. This tools are wrapped in grid jobs to be executed on grid resources. The algorithms output are analysed and displayed in graphic format through the web interface.

  18. Otras aplicaciones(I) In silico Drug Discovery • Scientific objectives • Provide docking information helping in search for new drugs. • Biological goal: propose new inhibitors (drug candidates) addressed to neglected diseases. • Bioinformatics goal: in silico virtual screening of drug candidate DBs. • Grid goal : demonstrate to the research communities active in the area of drug discovery the relevance of grid infrastructures through the deployment of a compute intensive application. • Method • Large scale molecular dockingon malaria • to compute million of potential drugs with • some software and parameters settings. • Docking is about computing the binding • energy of a protein target to a library of • potential drugs using a scoring algorithm.

  19. Otrasaplicaciones (II) Genome evolution modeling Scientific objectives Study human evolutionary genetics and answer questions such as the geographic origin of modern human populations, the genetic signature of expanding populations, the genetic contacts between modern humans and Neanderthals, and the expected null distributions of genetic statistics applied on genome-wide data sets. Method Simulate the past demography (growth and migrations) of human populations into a geographically realistic landscape, by taking into account the spatial and temporal heterogeneity of the environment. Generate the molecular diversity of several samples of genes drawn at any location of the current human's range, and compare it to the observed contemporary molecular diversity. SPLATCHE uses a region sampling Bayesian framework that requires105 independent demographic and genetic simulations.

  20. Paramas info Xmipp web page: www.cnb.uam.es/~bioinfo Unit web page: http://biocomp.cnb.uam.es NA4 EGEE biomed applications home: http://egee-na4.ct.infn.it/biomed/index.php aj.merino@cnb.uam.es

  21. Gracias

More Related