slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research? PowerPoint Presentation
Download Presentation
Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research?

Loading in 2 Seconds...

play fullscreen
1 / 31

Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research? - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research? Professor Richard Sinnott Technical Director National e-Science Centre University of Glasgow 9 th May 2006. Grids and e-Research. Classical characteristics HPC, data deluge, … More recent push

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Can Grids Deliver the Vision for Future Hypothesis Driven Life Science Research?' - liora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Can Grids Deliver the Vision for

Future Hypothesis Driven Life

Science Research?

Professor Richard Sinnott

Technical Director National e-Science Centre

University of Glasgow

9th May 2006

grids and e research
Grids and e-Research
  • Classical characteristics
    • HPC, data deluge, …
  • More recent push
    • Security, virtual organisations, usability, …

computers

software

Grid

sensor nets

instruments

Shared data archives

colleagues

e health future drivers
E-Health Future Drivers
  • The big questions
    • Why do people who eat less tend to live longer?
    • Is there a genetic reason why Scotland has such a high incident rate of cardiovascular disease? How significant are social, cultural, occupational factors in this?
  • Tailored e-Heath
    • Wouldn't it be wonderful to know what measures you could take to stave off/prevent the onset of disease?
    • Wouldn't it be a relief to know that you are not allergic to the drugs your doctor just prescribed?
    • Wouldn't it be a comfort to know that the treatment regimen you are undergoing has a good chance of success because it was designed just for you?
the big picture
The Big Picture…

Tissues

Cell

Organs

Protein functions

Protein Structures

Organisms

Physiology

Gene expressions

GRID

Populations

Epidemiology

Nucleotide structures

Cell signalling

Nucleotide sequences

Protein-protein interaction (pathways)

SECURITY

+social, lifestyle, occupational,environmental, …

slide5

There are still issues to be resolved

    • OGSA definition and delivery
      • Standards OGSI, WSRF, …
      • Technologies GT2, GT3, GT4, EGEE, OMII…
    • What about the science drivers
      • What data sets, what services, accessed by whom, …
        • Longevity of systems…?
        • If I build a Grid infrastructure for you, do you promise not to change your requirements (completely!)

HPC(x)

Challenges/ Opportunities?

The next Grid software

White Rose Grid

Core National Grid Service

NeSC in the UK

NeSC

Glasgow

Edinburgh

Newcastle

Belfast

Manchester

Daresbury Lab

Cambridge

CSAR

Oxford

Hinxton

RAL

Cardiff

London

Southampton

bridges project

VO Authorisation

Information Integrator

OGSA-DAI

Magna Vista Service

SyntenyService

blast

+

+

+

BRIDGES Project
slide8

MagnaVista

www.nesc.ac.uk

bridges security
BRIDGES Security
  • Used PERMIS (www.permis.org) to provide fine grained security (authorisation)
    • XML based policies digitally signed (tamperproof) and used to make authorisation decisions when users invoke services
      • (XACML based policies coming…)
    • Use SAML callouts to transparently link Grid service and policies
  • Data Policies
    • Only members of CFG can access all public and local warehoused data
    • Other guest users can only access remote genome databases
      • Security at DB level!
  • Computational Policies
    • CFG members can run BLAST across NGS, Glasgow clusters and Condor pools
    • Guest users only get access to the Condor pool
      • Users do not need their own X.509 certificates – all hidden behind portal!
bridges data
BRIDGES data
  • Originally planned that would have many different types of data with different security requirements
    • Public data: data from public sources
    • Processed public data: public data that has additional annotation or indexing to support the analyses needed by CFG
    • Sensitive data: data about individuals in the cohorts of patients and the data derived from animal experiments
    • Special experimental data: such as quantitative trait loci (QTL) or microarray data
    • Personal research data: data specific to a researcher as a result of experiments or analyses that that researcher is performing
    • Team research data: data shared by the team members at a site
    • Consortium research data: data produced by one site or a combination of sites that is now available for the whole consortium
    • Personalisation data:metadata collected and used by the bioinformatics tools pertinent to individual users
      • …but scientists reluctant to share their data!
jdss project
JDSS Project
  • Public data resources openness
    • Often cannot query directly nor easy/possible to find schemas (and they change… often!)
    • Joint Data Standards Study investigated this
      • Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI and involved
        • Digital Archiving Consultancy
        • NeSC (Edinburgh and Glasgow)
        • Bioinformatics Research Centre (Glasgow)
      • Looked at technical, political, social, ethical etc issues involved in accessing and using public life science resources
      • Final report completed September 2005 and available at:
        • www.mrc.ac.uk/prn/pdf-jdss_final_report.pdf
          • (to also appear as a NeSC technical report)
grid enabled microarray expression profile search gemeps
Grid Enabled Microarray Expression Profile Search (GEMEPS)
  • 1 year project (just) started 1st March 2006
    • Funded by BBSRC
      • Involves Glasgow, Cornell University, US, Riken Institute, Japan
    • Aim to provide tools for discovery, comparison and analysis of microarray data sets
      • How does my data compare to others?
      • How do these experiments compare?
      • Can we improve the way we establish how genes in different species are linked?
    • Microarrays expensive and contain potentially important (valuable) data sets
        • Fine grained security essential (and willingness of researchers to collaborate)!
grid enabled microarray expression profile search gemeps1
Grid Enabled Microarray Expression Profile Search (GEMEPS)
  • Why bother…?
    • Major journals require experimental data to be published
    • Minimal Information About a Microarray Experiment (MIAME) standard
      • Does not provide sufficient information for scientist to repeat experiment, to compare results, …
    • Scientists often unwilling to spend time to provide additional meta-data
      • …experiences from BRIDGES
    • Scientists also now questioning sensitivity of microarray data results
      • Gene names and expression values vs ordering of gene expression values
      • Initial prototypes support both of these but issues of gene naming
        • entrez, unigene, go, …
    • Work on searching/mining of public repositories on-going
      • including GEO, arrayExpress, …
grid enabled occupational data environment geode
Grid Enabled Occupational Data Environment (GEODE)
  • GEODE
    • Funded by ESRC lead by University of Stirling with NeSC Glasgow
      • Two year project aiming to develop Grid enabled portal for occupational data
        • includes integration of various existing classification scheme
    • Many occupational classification schemes exist
      • Used by different researchers/sociologists
        • Linkage to national and international census data sets
    • When is a plumber not a plumber?
    • When they are a water transport technician…?
      • How many plumbers had a heart attack in Scotland in the last 2 years?
votes
VOTES
  • Virtual Organisations for Trials and Epidemiological Studies
    • 3 year MRC (£2.8M) funded project expected to start imminently
    • Plans to develop Grid infrastructure to address key components of clinical trial/observational study
      • Recruitment of potentially eligible participants
      • Data collection during the study
      • Study administration and coordination
        • Involves Glasgow, Oxford, Leicester, Nottingham, Manchester
        • Prototypes available now building on SCIStore, GPASS, consent DB, existing trials repositories
distributed data framework
Distributed Data Framework

Portal

Grid Server

Grid Server

Data Server

Data Server

Access

Access

Access

Authorisation

Authorisation

Security

Security

Security

Access Matrix

Access Matrix

Policies

Policies

Policies

Security Policies

Security Policies

Globus

Globus

OGSA

OGSA

-

-

DAI

DAI

Service

Service

Container

Container

User

Authentication

Glasgow

Glasgow

SCI Store 1

Remote

Remote

Other

Other

(SQL Server)

Trust

Trust

GPASS

SCI Store 1

Transfer

Transfer

Policies

Policies

Driving

Driving

(SQL Server)

Grid

Grid

DB

DB

Nodes

Nodes

Local

Local

SCI Store 2

SCI Store 2

Consent DB

Consent DB

Trust

Trust

(SQL Server)

(SQL Server)

Policies

Policies

(Oracle 10g)

(Oracle 10g)

Local

Local

RCB Test

RCB Test

Local

Local

Local

Local

Trust

Trust

Trust

Trust

Trials DB

Trials DB

Trust

Trust

Policies

Policies

Policies

Policies

Policies

Policies

(SQL Server)

(SQL Server)

generation scotland scottish family health study
Generation Scotland Scottish Family Health Study
  • Five (2+3) year proposal (£4.6M) started January 2006
    • Funded by Health Department and Department for Enterprise and Lifelong Learning
      • Involves Glasgow, Dundee, Edinburgh, Aberdeen
        • focus of genetics as applied to healthcare
        • first two years emphasis on providing a platform for research into the genetic basis of common complex diseases in Scotland
          • Mental health, cardiovascular, …
          • Plan to establish 15,000 family-based intensively-phenotyped cohort recruited from the East and West of Scotland
        • basis for neutralising heritable (genetic) risk factors in disease surveillance, treatment optimisation, avoidance of adverse drug events and prediction of response to therapy, health care planning and drug discovery, …
        • Recruitment process has started already!
security related projects
Security Related Projects
  • GLASS
    • JISC funded started January 2006
      • Exploring early adoption of Shibboleth
        • Working with Computer Services directly
      • Scenarios based upon teaching and access to NHS resources/data
      • Builds upon university wide unified account management system being rolled out (based on Novell nSure technology)
  • ESP-Grid
    • JISC/Oxford University funded
      • Developing demonstrator to show how Grid resources can be accessed and used via Shibboleth technology
        • Initial prototypes already available
  • Grid Security Report
    • JISC/JCSR funded
      • Focus on Grid security practices, middleware and outlook
        • Contact me if want a copy!
dyvose project

Glasgow SoA using Edinburgh DIS

DyVOSE Project

Glasgow

Edinburgh

Condor pool

Create new ACs for Glasgow users/roles

LDAP

LDAP

Glasgow

Education

VO policies

Edinburgh

Education

VO policies

PERMIS based Authorisation checks/decisions

Job scheduling/

data management

Grid BLAST

Service

Grid BLAST

Data

Service

Nucleotide

+ Protein

Sequence

DB

Implemented

by Students

data input

Protein/nucleotide sequence data returned based on student team and Edinburgh policy

Grid-data Client

future
Future
  • The Grid is not a magic wand
    • Your data quality issues won’t go away
    • We can however identify what these are
      • SCIStore schema incompatibilities
  • Ethics and legal aspects essential
    • Working closely with NHS
  • Consent crucial
    • Scenarios now implemented looking at patient consent via GPASS
the future
The Future…

Hypothesis driven systems biology

+

Personalised e-Health

+

...

Tissues

Cell

Organs

Protein functions

Protein Structures

Organisms

Physiology

Gene expressions

GRID + Security

Populations

Nucleotide structures

Cell signalling

Nucleotide sequences

Protein-protein interaction (pathways)

Needs research pull not Grid middleware push...