Genomic Data Analysis Services Available for PL-Grid Users - PowerPoint PPT Presentation

genomic data analysis services available for pl grid users n.
Skip this Video
Loading SlideShow in 5 Seconds..
Genomic Data Analysis Services Available for PL-Grid Users PowerPoint Presentation
Download Presentation
Genomic Data Analysis Services Available for PL-Grid Users

play fullscreen
1 / 11
Download Presentation
Genomic Data Analysis Services Available for PL-Grid Users
Download Presentation

Genomic Data Analysis Services Available for PL-Grid Users

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Genomic Data Analysis ServicesAvailable for PL-Grid Users Tomasz Waller, Tomasz Gubała, Kazimierz Murzyn Academic Computer Centre Cyfronet AGH, KlasterLifeScienceKraków, Recent Advances in Omics Research, Kraków, October 2014

  2. ACC Cyfronet AGH andPL-Grid Infrastructure Academic Computer Centre Cyfronet AGH • Established in 1973 (40 years of experience) • Provides network, computational power and data storage capabilities for Polish science • ~374 TFlops (zeus, 175@top500), 2.5 PB (disks)and 3.5 PB (tapes) • 1.7 PFlops (prometheus) with 10 PB of disks,expected first half of 2015 • Regular and bigmem nodes, vSMP, GPGPU, FPGA,MPI over Infiniband • Details: PL-Grid Infrastructure for Polish science • Five computing centers with Cyfronet asthe consortium leader • Total: ~588 TFlops and ~5.6 PB (disks) butsoon to grow considerably (see above) • Available free of charge to all Polish scientistsand their foreign collaborators • Details:

  3. Using PL-Grid Infrastructure • Register at • User verification process based on Polish OPI number • Assistants and foreigners are confirmed by Polish PIs • Variety of basic and higher level services available after login • Local SSH access, cloud computing, middlewares • Considerable library of installed applications • GATK, MACS, SAMTools, Picard, TopHat, Bowtie, (p)BWA, R/Bioconductor, AutoDock/AutoGrid, BLAST, Clustal, CPMD, Gromacs, NAMD, Matlab, Mathematica … • Free to compile and install own applications using the shell login • Possibility to use own commercial licenses on HPC resources • Specific services dedicated to the Life Science domain

  4. DNA Microarray Integromics Analysis Platform (1/2) • For people who perform biological investigations using DNA microarrays • Goal: help to analyze gene expression information and correlate it with other clinical data • Analyses available now: normalization, clustering, SAM, T-test, GO-based enrichment, ANNs, PCA, panel filtering • ’Integromics’ analyses in ’beta’ (testing) stage • CCA, PLS (gene expression and lipidomics) • Roleswitch, TargetScore (gene expression and miRNA) • Still in continuous development (Pathways, EBI export etc.) • Supported models: some Affymetrix, AgilentSurePrint (addingsupport for others is possible, in case of demand)

  5. DNA Microarray Integromics Analysis Platform (2/2) • Notable features • Integration with EBI ArrayExpress (import, MIAME) • Sharing experiments with others • Importing own data for further analysis • Supported languages: PL, EN • Manual: • Cooperation • Jagiellonian University Medical Collage, Kraków • Medical University of Silesia, Katowice • Institute of Oncology, Gliwice

  6. Agilent GeneSpring GX • RDP: • Used with Windows Remote Desktop • Integrated with the DNA Integromics Platform for uniform microarray files management • 5-year, single-seat license for all registered Polish scientists • Manual:

  7. Galaxy NGS Server (1/4)

  8. Galaxy NGS Server (2/4) ”Galaxy is an open, web-based platform for data intensive biomedical research.” • Goal: deploy high-performance, high-throughput NGS data analysis solution on top of HPC resources for PL-Grid users • Needs a lot of adjustments and in-house add-on development • Work started 12.2013, and still at a beta stage…  - but accessible to anyone willing to test and to help • Planned integrated tools (list not closed): GATK, SAMtools, Bowtie, TopHat, BWA, bedtools, Cufflinks, Picard, SnpEff/SnpSift, Flexbar, FastQC, MACS • Targeted platforms: Illumina *Seq, Ion Proton, Roche 454

  9. Galaxy NGS Server (3/4) • Notable features • Full integration with Zeus cluster and disk arrays • PBS and MQ system for effective job queuing • Secured environment (open for all PL-Grid users, not ”public”) • All major Galaxy features (history, sharing, viewers) • Well documented workflows designed by NGS experts • Basics (alignment and quality control, trimming, filtering) • DNA-Seq, RNA-Seq, variant calling, SNP calling, methylation, exome analysis with annotations • Manual: • Cooperation • Institute of Pharmacology, Polish Academy of Sciences, Kraków • OMICRON, Jagiellonian University Medical Collage, Kraków • National Research Institute of Animal Production, Kraków-Balice

  10. Galaxy NGS Server (4/4) • Current challenges • Some security issues in the Galaxy code prevent the production deployment • Cluster integration is there, yet rather unstable and prone to fail (quite an intricate contraption, it is) • Broad variety of integrated tools and wrappers does not help • Call to action – who is needed • Users: the bigger the community, the easier to make us visible • Early adopters: tell us what you need, help us test and integrate the tools and workflows you use • Programmers: if you’d like to help us bring a dedicated HPC-powered Galaxy for Polish scientists, any assistance is greatlyappreciated • Contact:

  11. Links, Contact, Partners • These resources, services and tools (and much more) are available after registering to PL-Grid • PL-Grid User Manual • (PL) • (EN) • Questions, problems, requests about PL-Grid • • Contact for LifeScience domain services •