Structural Proteomics Automatic Target Selection

Structural Proteomics Automatic Target Selection Gordon Whamond

Project Overview • Aim: • Provide a resource that facilitates the automatic selection of potential targets for protein structure determination while minimising human interaction with the software (if required). • Input: • Raw amino acid sequence • UniProt accession number • UniProt accession number and a sequence range • Output: • Query sequence showing possible domains • All candidates for structure determination • Recommendation for which sequence to use

Considerations • Is there a known structure? • Are there Classified Structural (CATH, SCOP) Domains? • Are there Known Sequence (Pfam) Domains? • Are there Predicted Structural (Gene3D, Superfamily) Domains? • Do Domain Boundaries Conform to Secondary Structure Restrictions? • Which Species has a Representative Domain that is the Most Compactly Folded? • The core implementation needs to be extendible and easily maintainable.

Taverna The software is to be implemented using the Taverna workbench. This is a tool that can be used to formulate the workflow and implement each of the processes as distributed web services. • Advantages: • Distributed computing reduces resource requirement. • Easily extendible system • Maintenance issues shifted to external providers • Disadvantages: • Learning curve • Convincing service providers to adopt a standard format • Maintenance issues shifted to external providers Tom Oinn - http://taverna.sourceforge.net/

Taverna The prototype workflow: When it is expanded to show all of the incorporated sub-workflows is quite complex Luckily Taverna can provide a top level view.

Taverna

Dealing With DAS

Taverna

Process Data Secondary Structure Elements: (Method not yet chosen) Sequence Domains: Pfam, Gene3D, Superfamily etc Protein Folding: RONN, FoldIndex, DisEMBL Rank Target Selection: Based on loop lengths, folding predictions, etc

Starting the Process

Monitoring Progress

Assess Data

Review Results

Extensibility • Java Services • Straightforward to provide as a web service using Tomcat and Axis • WSDL (describing the service) can be generated automatically • Legacy Software • Any command line based tools can be wrapped into a web service using Soaplab • For example the EMBOSS tools are already available

Extensibility Output Format: To ensure generic service compatibility it helps to define a common results format. As a result we are using the e-Family service schema (http://www.efamily.org.uk/) Current collaborators include: The Weizmann Institute - FoldIndex University of Oxford - RONN

Results Viewers http://www.efamily.org.uk/software/dasclients/spice/

Conclusions • Taverna and Web Services: • Taverna facilitates the provision of complex distributed systems that utilise web services • This reduces maintenance overheads and keeps technology requirements at a reasonable level • It is also easily extensible to accommodate new services • Availability: • Hopefully the core system will be ready by the end of the year • This will provide the basic workflow for users to customise according to their needs

Acknowledgments Thanks to: Tom Oinn Andreas Prlic The RONN and FoldIndex teams The MSD Group

Structural Proteomics Automatic Target Selection