1 / 51

Informatics perspectives in Bio-Informatics

Informatics perspectives in Bio-Informatics. Atul P Agarwal Apt Software Avenues Pvt Ltd. Two aspects of Informatics. Computational Biology All the plumbing needed to put a Bio-informatics application together. Application architecture. Standalone Local computation

derry
Download Presentation

Informatics perspectives in Bio-Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informatics perspectivesin Bio-Informatics Atul P Agarwal Apt Software Avenues Pvt Ltd Apt Software Avenues Pvt Ltd, Unit G302 Block DC, City Centre, Salt Lake, Kolkata 700064

  2. Two aspects of Informatics • Computational Biology • All the plumbing needed to put a Bio-informatics application together

  3. Application architecture • Standalone • Local computation • Needs to be installed on individual machines • Can connect to a web service • Updates are difficult to manage • Web based • Runs in a browser • Needs no install • Updates are easy • Can connect to other web services

  4. Web application architecture Application Browser HTML, XHTML, DHTML, Javascript, AJAX Proprietary, SOAP Lite SOAP XML HTTP, MIME Web server Apache, JBoss, IIS CGI/ASP.NET/JSP Application logic Perl, Python, PHP, C/C++, C# Database driver, SQL MySQL, Postgress, SqlServer, Oracle Database

  5. Platforms - Two camps • Public domain • LAMP • Linux • Apache, JBoss • MySQL • Perl, Python, PHP, Java • Microsoft • .Net • SQLServer • ASP.NET (C, C++, C#, VB.net)

  6. World Wide Web • The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI).

  7. Browsers – the display • Responsible for user input and result display • No algorithmic computation • Displays HTML • Some programmability through Javascript

  8. Browser Operation • The browser recognizes that what a user has typed is a URI. • The browser performs an information retrieval action in accordance with its configured behavior for resources identified via the "http" URI scheme. • The authority responsible for handling the URI provides information in a response to the retrieval request. • The browser interprets the response, identified as HTML by the server, and performs additional retrieval actions for inline graphics and other content as necessary. • The browser displays the retrieved information, which includes hypertext links to other information. The user can follow these hypertext links to retrieve additional information.

  9. Portability across Browsers • There are many browsers out there • IE • Firefox • Safari • Opera • They have their own idiosyncracies • Application needs lots of testing

  10. Web Server • Handle multiple incoming requests • Process the HTTP requests • Serve the requests • Multiple possibilities • static pages • cgi-bin • jsp • servlets • Form the HTTP responses • Send back the responses • Maintain sessions

  11. HTTP (Hypertext transfer protocol) • RFC 2616 (The official specification ) • A request/response protocol. • A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. • The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible entity-body content.

  12. HTTP Message format • The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of: • an initial line, • zero or more header lines, • a blank line (i.e. a CRLF by itself), and • an optional message body (e.g. a file, or query data, or query output).

  13. Example request • To retrieve the file at the URL http://www.somehost.com/path/file.html • open a connection to the host www.somehost.com • send something like the following through the connection: GET /path/file.html HTTP/1.0 From: someuser@jmarshall.com User-Agent: HTTPTool/1.0 [blank line here]

  14. Example response • The server will respond with something like HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Millennium!</h1> </body> </html> • After sending the response, the server closes the network connection.

  15. HTML (Hypertext Markup Language) • A markup language which consists of tags embedded in the text of a document. • The browser reading the document interprets these markup tags to help format the document for subsequent display to a reader. • However, many of the decisions about layout are made by the browser.

  16. Basic HTML tags

  17. Evolution of HTML • Emergence of new platforms • Mobiles, TVs, Digital phones • Dynamic HTML • Interactive web pages • Combines HTML, Javascript, DOM, CSS • XHTML • Stricter and cleaner version of HTML

  18. Evolution of the Web technologies • Static content • Cgi-bin • Servlets • JSP • ASP • Struts • JSF • AJAX

  19. AJAX • Asynchronous JavaScript and XML • Improve the User experience • The browser can continue to communicate with the web server while the user interacts with the page • The User can do something during long running computationally intensive jobs • The User can manipulate complex data in a more friendly manner • Aggregate data from multiple sources into a single view

  20. Enhancing the User experience • iPhone has set a new standard • More demands from the Browser • Rich Internet Applications (RIA) • Silverlight – Microsoft • Flex – Adobe • GWT – Google • Web 2.0 • Communities and sharing

  21. Building your application • Choice of programming language • Lightweight • Pearl, Ruby, Python • Heavyweight • C#, Java, C++ • Specialized • R, Matlab, Mathematica • Choice of architecture/framework • Costs

  22. Perl – The language • An interpreted language • Easy and fast • Very good for prototyping • Powerful text manipulation features • Has been used a lot for “plumbing”

  23. Disadvantages of Perl • Interpreted, hence slow • Poor GUI support, screen based or command line user interaction only • Novice can be caught on the wrong foot • Variables can be used without initialization • No type checking of variables

  24. BioPerl • A collection of Perl modules • Specifically for Bio-Informatics • Object oriented • Can be a little difficult to get started with

  25. Objects in BioPerl • Sequences • Databases • Alignments • Features and genes on sequences

  26. Parallel Computing • Advent of cheap multi-core CPUs • Availability of libraries to help parallel processing • STAPL • Standard Template Adaptive Parallel Library • Protein folding problem using STAPL • http://www.hicomb.org/papers/HICOMB2004-03.pdf • Intel TBB • Intel Threading Building Blocks • Google MapReduce • Parallelized version of Smith Waterman algorithm • http://cmgm.stanford.edu/~brutlag/Papers/brutlag93.pdf • Specialized hardware • FPGA implementation of Blast • Very hard to program parallel algorithms

  27. CGI (Common Gateway Interface) • a standard way for a web server to invoke a script, passing certain environment variables and user input data to the script, and allow the script to return a result.  • one of the oldest ways of providing dynamic web content. • supported on innumerable low cost web hosting services • included out of the box with many Apache installations, such as that provided on Red Hat Linux.

  28. CGI in operation

  29. XML (eXtensible Markup Language) • XML is a data format that represents data in a structured form • XML is a simple, standard way for interchange of structured textual data between multi-vendor platforms • XML can be used to store data

  30. XML is used to create new languages • XHTML the latest version of HTML  • WSDL for describing available web services • WAP and WML as markup languages for handheld devices • RSS languages for news feeds • RDF and OWL for describing resources and ontology • SMIL for describing multimedia for the web 

  31. Domain Specific XML • WITSML • Oil drilling • JDF • Printing • Gen2Phen • http://www.pageom.org

  32. XML documents • Well formed • Conform to the syntax • Valid • Conform to the semantics

  33. Data Models in BioInformatics • Not much standardization so far • Laboratory specific modeling • New initiative for genome data modeling • http://www.pageom.org • Based on XML

  34. Databases • Public domain databases • MySQL, Postgress • Commercial databases • Oracle, SQLServer • SQL is the language • The heart and soul of BioInformatics applications • Commercial deployments are expensive !

  35. RDBMS (Relational Database Management System) • Based on a “Relational” model proposed by Codd • A “Relational” is a formal mathematical concept • The operations on Relations are based on “Relational Algebra” • Implemented as tables • Each row defines a relation

  36. Relational Algebra • 3 primitive operations • Projection • Select a subset of columns • Selection • Select a subset of rows • Join • Cross product of two tables • Set Operations • Union • Intersection • Difference

  37. SQL (Structured Query Language) • For manipulating an RDBMS • Data Definition Language (DDL) statements • To build and modify the structure of tables • Data Manipulation Language(DML) statements • To work with the data in the tables • 4 basic statements • SELECT • INSERT • UPDATE • DELETE

  38. Transaction • RDBMS are multi-user systems • Different programs may be updating the database at the same time • A DML operation that changes the database is “effected” only when a COMMIT is issued • To undo a DML change, you can use the ROLLBACK command instead

  39. Datatype • An RDBMS has its own type system • The service provider “maps” from the programming language types to the database types

  40. MySQL – the database • The ‘M’ in LAMP architecture • Free (GPL License) • Many enterprise features • Distributed databases • Triggers and stored procedures • Poor XML support

  41. Some MySQL DataTypes • INT integer • FLOAT Small floating-point number • DOUBLE Double-precision floating-point number • CHAR(N) Text N characters long (N=1..255) • VARCHAR(N) Variable length text up to N characters long • TEXT Text up to 65535 characters long • LONGTEXT Text up to 4294967295 characters long

  42. DBI (Database Interface) Perl • to access databases from different vendors transparently • e.g., MySQL, Oracle, Sybase (even Plain text files) • relies on proper DBD (DataBase Ddrive) modules to talk to the real databases • there is one DBD module for every different type of database • to connect to different databases (of different types) at the same time and easily move data between them. • single generalized API for all types of databases • program at a "higher level" than the API provided by the database system

  43. DBD (Database Driver) Perl • convert the general DBI API into the database system-specific API. • also provide mechanism to access database specific functionality directly (won’t be used)

  44. Future Databases in Bioinformatics • Parallel database architectures • Data mining • Data warehousing • Improved query techniques • Object oriented databases ?

  45. Web Services • Simulates a remote function invocation • A calling program wants to use function hosted on another machine • Inputs are passed to a remote function • The remote function is executed • The output is returned to the calling program • WSDL to define services • SOAP/XML to invoke services

  46. SOAP::Lite • a collection of Perl modules • provides a simple and lightweight interface to the Simple Object Access Protocol (SOAP) • on client and server side • the programmer doesn’t have to worry about the details of the SOAP protocol • http://www.soaplite.com/

  47. Service Oriented Architecture • Structuring large applications as an ad hoc collection of smaller modules called "services“ • encapsulation • Many web-services are consolidated to be used under the SOA. • loose coupling • Services maintain a relationship that minimizes dependencies and only requires that they maintain an awareness of each other • contract • Services adhere to a communications agreement, as defined collectively by one or more service description documents • abstraction • Beyond what is described in the service contract, services hide logic from the outside world • reusability • Logic is divided into services with the intention of promoting reuse • composability • Collections of services can be coordinated and assembled to form composite services • autonomy • Services have control over the logic they encapsulate • discoverability • Services are designed to be outwardly descriptive so that they can be found and assessed via available discovery mechanisms

  48. Cloud Computing • Thin clients • Software as a service • Pay per use ? • Data stored on servers

  49. Web 3.0 (wiki) • transformation of the Web from a network of separately siloed applications and content repositories to a more seamless and interoperable whole • ubiquitous connectivity, broadband adoption, mobile Internet access and mobile devices • network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing • open technologies, open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons) • open identity, OpenID, open reputation, roaming portable identity and personal data • the intelligent web, Semantic Web technologies such as RDF, OWL, semantic application platforms, and statement-based datastores • distributed databases, the "World Wide Database" (enabled by Semantic Web technologies) • intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents

  50. Example Bio-workflow • Quickly integrate different web service • Pdb • EBI • Kegg • AJAX and Microsoft Atlas technologies • All data exchanged as XML • http://203.197.120.150:82/aptbiocom/

More Related