1 / 12

Motif Space Database Design

Motif Space Database Design. Kiranjit Sidhu. Outline. Schema Design Content of Database Functionality Future Plans. Sample PDB File. Sample PDB File Each PDB File represented as a text file (~ 60K Lines) Inefficient for pattern matching

yates
Download Presentation

Motif Space Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motif SpaceDatabase Design Kiranjit Sidhu

  2. Outline • Schema Design • Content of Database • Functionality • Future Plans

  3. Sample PDB File • Sample PDB File • Each PDB File represented as a text file (~ 60K Lines) • Inefficient for pattern matching • Relational Database required for most efficient solution

  4. Structure of Database • DB divided into two major components: • Protein Data • Motif (Occurrence) Data • Protein Data • Obtained from PDB Files (Protein Data Bank) • Derived Data • Motif Data • Obtained from Luke’s FFSM technique • Derived Data

  5. Schema Design

  6. Schema Design - Protein

  7. Schema Design - Motif

  8. Tools Used • Obtaining Data • Perl Scripts • Database: • SQL Server 2000 and SQL Server 2005 • T-SQL (Bulk Import Data)

  9. Obtaining Data Import Extract PDB File CSV File Temp Tables (T-SQL) Convert and Derive Final DB T-SQL Procedures

  10. Uploading Protein Data • Input dataset: ~ 70,000 PDB/Chain Combinations • Entries in tables: • E.g. Approx. 800 Million Rows in the proteinchaindistance table • Initial version imported 10 PDB files in 1 day • Current version: under 3 minutes

  11. Current Functionality • Protein (PDB) data has been completely uploaded into both: • Production Database (MotifSpace) • Development Database (MotifSpaceDev) • Visualize protein structure using data from database (data available) • Data can be obtained from Server using SOAP or web services. • Basic Queries such as • Different PDBs a specific motif occurs in? • Histograms to compute statistics.

  12. Demo

More Related