1 / 31

Domain-specific Languages for Cellular Interactions

Domain-specific Languages for Cellular Interactions. Bill Harrison Department of Computer Science University of Missouri at Columbia. This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1, the Georgia Research Alliance and the Georgia Cancer Coalition.

lefty
Download Presentation

Domain-specific Languages for Cellular Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain-specific Languages for Cellular Interactions Bill Harrison Department of Computer Science University of Missouri at Columbia • This work partially supported by: • NIH1 R0l GM62920-04A1, • NIH1 P20 GM065762-01A1, • the Georgia Research Alliance and • the Georgia Cancer Coalition.

  2. Domain-specific Languages for Cellular Interactions Bill Harrison Department of Computer Science University of Missouri at Columbia meow! • This work partially supported by: • NIH1 R0l GM62920-04A1, • NIH1 P20 GM065762-01A1, • the Georgia Research Alliance and • the Georgia Cancer Coalition.

  3. Ph.D 2001, UIUC • Thesis: Modular Compilers and Their Correctness Proofs • Thesis Advisor: Sam Kamin • Post-doc, Oregon Graduate Inst. (OGI) • Three years on Programatica Project • using Haskell programming language as basis for formal methods • Assistant Professor, University of Missouri-Columbia since Fall 2003

  4. Systems Biology asks… • Can static biological structure be related to dynamic biological behavior with mathematical clarity, precision, & rigor? • Can biological systems be viewed as the “sum of their parts”? • Can component-level models be integrated into precise system-level models of biological behavior? • What techniques from Mathematics and Computer Science apply to this composition problem?

  5. Rhodobacter Sphaeroides • Photosynthetic bacterium • seeks out regions of greater light • Roughly the size of wavelength of light • cannot sense local light differences directly •  applies random walk

  6. Simulations of Biological Systems • Simulations provide qualitative feedback, but are not models per se • how accurate/faithful is a simulation? • what does the feedback mean? • can one reason about the biological phenomenon based on the simulation? • can you identify the biology by inspecting the text of the simulation program?

  7. contains 1000 LOC to understand requires expertise in C++ …and biological model …and critical system details e.g., how is concurrency implemented? R. Sphaeroides in C++ bool global_state::register_state(void *apointer) { if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true; }

  8. Program structure does not reflect biological model can you look at the source code and recognize the underlying biology?  difficult to comprehend …and write correctly …and modify …and maintain …and re-use R. Sphaeroides in C++ bool global_state::register_state(void *apointer) { if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true; }

  9. System Biology as Programming Language Design • The Problem: • General-purpose programming languages do not have the “right vocabulary” • Biological model: Concurrent Markov chains • C++: classes, pointers, etc. • …nor are they mathematics • Our Solution:Design small, special purpose languages with exactly the right vocabulary • called a Domain-specific Language (DSL) [Sheard99,Thiemann01,Leijen01] • Mathematical semantics of DSLs gives formal model of biology

  10. Language Model of R. Sphaeroides cell1 || … || celln Executing: Produces animation:

  11. Outline • Language Design and Domain-specific Languages • design, definition, and implementation • Systems Biology as Language Design • Case Study for Rhodobacter Sphaeroides • Design: what are the appropriate abstractions for R. Sphaeroides? • Definition: how do we specify exactly what R. Sphaeroides programs mean? • Implementation: how do we run R. Sphaeroides programs? • Conclusions

  12. Cardinal Rule of Language Design Application Programmersshould choose languages with abstractions most suited to their task; Language designersmust provide languages with those abstractions… Domain Central Activities Reasonable Language System Programming “bit-fiddling” C Artificial Intelligence List processing LISP System Admin. Text processing, etc. PERL

  13. DSLs are small languages w/ “domain abstractions” Ex: “Parsec” Parser DSL BNF for language <Stmt>  <ident> := <Expr> translates directly assignStmt :: Parser Stmt assignStmt = do{ id  ident ; symbol ":=" ; s  Expr ; return (Assign id s)} Parsec code

  14. “Why a language and not a library?” • The Slogan: “What is excluded from a DSL is as important as what is included in it” • libraries in a general-purpose language still require • considerable expertise & self-discipline on the part of the programmer • Lack of generality in DSL  fewer things to “go wrong” • DSL may have desirable properties that a general-purpose language will not • e.g., implementation techniques specialized to DSL that do not apply to general-purpose languages • small size makes rigorous specification tractable

  15. DSL Design DSL design for R. Sphaeroides • what are our domain abstractions? • How does this organism behave? • What modeling techniques are used by biologists to describe this behavior?

  16. Bacterial Commands laze die adjust speed grow divide tumble *Probability of growth varies with light concentration

  17. Chapman-Kolmogorov Equation* Pi,j probability of being in state m probability of transition from i to j *Commonly used framework for modeling biological systems [Bremaud99, Dailey02, Mao02, Shah00]

  18. Chapman-Kolmogorov Equation A row in the above matrix encodes the transition function from state i of a Markov chain

  19. Bacteria as Markov Chains State 0 State i … State m • non-deter. state machines with probabilistic transitions • induced by the Chapman-Kolmogorov equation • Pi,j in terms of environmental factors, organism • state, etc. • executing concurrently

  20. Domain Abstractions for R. Sphaeroides • Individual cells: Markov-chain abstraction choose P1 Action1 … Pn Actionn • Actions: Tumble, Divide, AdjSpeed, Laze, Grow, etc. • Concurrency: cell1 || cell2 • Environmental Factors: light,size

  21. Abstract syntax for CellSys • choose is our principal domain abstraction • behaves like the Markov chain transition function • Cell-level environment variables: light,size

  22. DSL Definition • Background: Programming languages are “collections of effects” • Java = OO + Threads + State +… • LISP = Higher-order Functions + … • Prolog = Backtracking + … • Corresponding to each such effect is an algebraic construction called a monad • used for the development of modular semantic theories of programming languages [Moggi89] • monads may be constructed using “monad transformers”

  23. Periodic Table of Effects StateT imperative := BackT backtracking cut ResT threads step pause  StateT imperative := BackT backtracking cut ResT threads step pause  EnvT binding  @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose EnvT binding  @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose DebugT debugging rollback ReactT reactivity send,recv,… ProbT probability random DebugT debugging rollback ReactT reactivity send,recv,… • Prog. languages are collections of effects captured as monads [Moggi] • Monads assembled from constructors (monad transformers) • Our view: Systems are collections of effects captured as monads • “Systems” broadly construed: • Compilers [Harrison00,98,01,02], • Secure system software [Harrison05,03], and • Biology [Harrison04]

  24. Periodic Table of Effects StateT imperative := BackT backtracking cut ResT threads step pause  EnvT binding  @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose ProbT probability random DebugT debugging rollback ReactT reactivity send,recv,… • Mathematical definitions for any language created by combining MTs • CellSys = StateT + ResT + ProbT + ReactT • Such definitions are flexible • modular, extensible, and easily refactored

  25. In a traditional RTS threads request services like “send a message” “output on device” “consume resource” RTS mediates ensuring that the threads do not interfere global system state remains consistent schedules threads DSL definition similar to traditional RTS … Run-time System threads

  26. In CellSys Cells are threads with physical components as well size, velocity, … cells request services like “consume nutrients” “move me here” “want to divide” GE mediates like RTS, also: preserves physical integrity updates global world view performs scheduling High-level view of definition … Global Enviroment cells

  27. DSL Implementation • Because CellSys defined in terms of monad transformers, may be implemented directly as Haskell program • I.e., monadic language definition may be transcribed “symbol for symbol” into Haskell • Haskell implementation easily instrumented to output system “snapshots”: • prints out snapshots in POV (Persistence of Vision) format & converted into MPEG

  28. Q: What are appropriate languages for modeling? • Integrate techniques from programming languages • models of concurrency • language semantics • i.e., precise, mathematical language definitions • efficient language implementation • …into special purpose language called a “Domain-Specific Language” • abstractions taken directly from biology •  comprehensible by biologists • DSLs and DSL programs • hide technical details irrelevant/uninteresting to biologists • are “tunable” by computer scientist to reflect discovery/refinement • execute to provide “reality check” by biologists

  29. models of concurrency efficient implementation mathematical models of programs reasoning about programs organism structure & behavior modeling techniques cellular automata systems of PDE’s numerical techniques Bioinformatics = Computer Science + Biology Computer Science Biology  =  Hard Problem: How do you effect a technology transfer from CS  Biology?

  30. Interdisciplinary Process CellSys (version 1.0) Biologist evaluates DSL model for accuracy, expressiveness, etc. feedback/discussion Language expert refactors language as needed CellSys (version 2.0)

  31. Summary Large body of work providing domain abstractions & models Comprehensibility, Reusability, & Ease of Use systems biology domain specific languages modular monadic semantics Precise description of biological phenomena through DSL semantics * Harrison & Harrison, “Domain Specific Languages for Cellular Interactions” in Proceedings of the International Conference IEEE Engineering in Medicine and Biology, 2004.

More Related