1 / 45

Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min)

Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min). Anurag Shankar TeraGrid Science Gateways Team Indiana University TeraGrid 2007 Madison, WI. This unit will try to answer the following questions:. What is a science gateway?

gotzon
Download Presentation

Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana University TeraGrid 2007 Madison, WI

  2. This unit will try to answer the following questions: • What is a science gateway? • What questions to ask before building one? • What problems scientists face when computing? • What can gateways do to help? • What technologies can be used? • How to ensure that the gateway will be used? • When is using the TeraGrid appropriate? • What resources do I need to build one? TeraGrid 2007

  3. What do you really mean by a science gateway? • A (web-based) GUI that allows a scientist to do some sort of computation by clicking buttons. • The computation requires resource(s) at the back end to carry it out, e.g. storage, CPU cycles, databases, etc. • These resources could be modest - perhaps just a PC, or significant - a compute cluster or a grid. • Pardon the subsequent, implicit CPU-cycle-centricity. The gateway could as well be a data repository, etc. TeraGrid 2007

  4. What is a TeraGrid science gateway? • A web interface • with science users in the front and TeraGrid services in back (a traditional TG SGW). • that bridges an existing non-TeraGrid science grid and the TeraGrid (a grid-bridging SGW). • that allows applications running on a user’s desktop to access TeraGrid services (a personal TG SGW). Will TeraGrid build a science gateway for me? • Nope. But we will gladly help you build one. TeraGrid 2007

  5. Why call them gateways and not portals? • We could, but distinguish here for the sake of clarity. • We will use the word “portal” generally, to refer to an entry point, a URL on the web.  could be an aggregation point for information, services, or tools, a means to allow ubiquitous access or the ability to customize, etc. • We define a “science gateway” as a portal designed specifically for (or by) a specific science community. TeraGrid 2007

  6. Ok, so I think I want to build a science gateway. Before we even start, what are the crucial questions to ask? What is it that we are trying to do here? • Is it to lessen the pain? For who? • Is it to build something because it uses cool technology users will love? • Is it to get the damn thing done so we can write that quarterly report? • etc. … TeraGrid 2007

  7. Questions to ask … 1. Will the gateway add value for the user? • For example, a command line user can perform every task that a gateway can, often with far more control and without the obfuscation layer a gateway adds. • You will wrest the command line from these users only from their dead fingers. • A gateway must add serious value to be successful here. TeraGrid 2007

  8. Questions to ask … 1.1. Precisely how will the gateway add value? (aka why would the user want to use my gateway?) • Will it solve an existing problem? • Will it add new functionality? • Will it save user time? • etc. 1.2. How am I going to find out? TeraGrid 2007

  9. Questions to ask … 2. If yes to (1), what technologies can and should be used? • 2.1. How? 3. Cost? 4. Validation (that I accomplished what I set out to do)? How? TeraGrid 2007

  10. 1. What problems can gateways solve? • What are the common problems facing scientific users? • Increasing complexity of ITs. • No time to master increasingly complex ITs. • Repetitive tasks waste a lot of time. • No simple workflow tools. • No easy to use “toolboxes” for frequent tasks. • An alien HPC culture for many new entrants. • Command line interface too distasteful (for many). • No native clients to do useful things. TeraGrid 2007

  11. 1. Problems … • No GUI for tasks that are done a lot easier graphically. • Frequent reinvention of the wheel (redundancy of effort). • (Insert your favorite here). All problems can be reduced to Not being able to have the data I need delivered, here, now. TeraGrid 2007

  12. 1. Problems … • Ok, so which of these problems can gateways solve/address? What can they add? • Save users from back-end complexity. • especially those that do not speak HPC • Provide a simple interface to many tasks. • Save user time by providing tools for repetitive tasks. • Provide standard tools for a discipline/group of users. • Provide a GUI when/where appropriate. • Provide statefulness, persistence, historical data, etc. • Allow ubiquitous access. TeraGrid 2007

  13. 1. Problems … • Is a science gateway always the right approach? • No. • For example, a PI with a small research group, all involved in extensive code modification, development, and/or testing is unlikely to benefit from a gateway. • Gateways are best used when a large group of users (community) make use of the same computational tools. • Fields using common data formats (astronomy, climate modeling, etc.) also lend themselves to gateway-ing. TeraGrid 2007

  14. 2. What technologies can I use? • Common off the shelf (COTS) • Usually PHP/MySQL, perl, ruby, python based. • Very popular, open source, portal building toolkits such as Mambo, Joomla, Drupal, e107, PHP-Nuke, etc. • Also new “web operating systems (WebOS)” like eyeOS. • Standards based • Portlets (JSR 168). • Globus (Globus Toolkit 4, COG kit). • Web services (WSDL, WSRF, WSRP, etc.). • Globus web services (WS-MDS, WS-GRAM). • Grid services (OGSA). TeraGrid 2007

  15. 2. COTS technologies … TeraGrid 2007

  16. 2. COTS technologies …  When resources are limited (time, people, expertise) and/or when the project has modest needs. • When are COTS technologies appropriate? • When the portal needs to be built yesterday. • When the portal needs to be built yesterday and there is exactly one undergrad to do it. • When the undergrad has just taken his first programming class. TeraGrid 2007

  17. 2. Standards based technologies • What are all these terms and acronyms? Portlets? JSR 168/286? WSRP? WSRF? COG? OGSA? • COG kit = COmmunity Grids kit • JSR = Java Standard Request • OGSA = Open Grid Services Architecture • WS-GRAM = Web Services - Grid Resource Allocation Manager • WS-MDS = Web Services - MetaData Service • WSRF = Web Services Reference Framework • WSRP = Web Services for Remote Portlets TeraGrid 2007

  18. 2. Standards based … • The acronym maze alone will give you a headache, even on a good day. • Let’s try an evolutionary approach to see if helps. • For good bedtime reading, check out my “Portals 101” document, created in desperation: http://www.gridsphere.org/gridsphere/gridsphere/html/docsTab/r/ TeraGrid 2007

  19. 2. Evolution of portal technologies … (2003) Portlets WSRP Services based Servlets Dynamic  ? (1997) Java applets (1995) Static Javascript (1995) Web Services (late 1980s) PHP (1994) Stateful Web Services CGI (1993) HTML Time  Prehistory TeraGrid 2007

  20. 2. Evolution of grid technologies … Open Grid Services Architecture (2005) Java COG kit GT 3.0 (2003) (1997) (API for Globus) Grid Services GT 2.0 (2002) (2003) Open Grid Services Infrastructure Web Services GT 1.0 (1997)  ? (1997) Global Grid Forum (2000) (1997) Globus (Grid middleware) Prehistory (Distributed Computing) Time  TeraGrid 2007

  21. 2. Evolution of standards … • Web: HTML  CSS  XHTML  XML (W3C) • Modular web: Servlets  Portlets (JCP/Sun) • SOA: WS  WSDL  WS-x, WSRF (OASIS) • Portlets: JSR 168  JSR 286 (JCP/Sun) • Grid: Globus  OGSI  OGSA (GGF/OGF) • JCP = Java Community Process (creates Java Standard Requests or JSRs) • W3C = World Wide Web Consortium • SOA = Services Oriented Architecture • WS-x = Various web services standards or in process to be standards (maybe), such as WS-Notification, WS-Security, etc. TeraGrid 2007

  22. 2. Problem with evolution … Evolution according to creationists TeraGrid 2007

  23. 2. Evolution … TeraGrid 2007

  24. 2. Evolution … Man’s Evolution from the Prehistoric to Post Fast Food Is it or it is not evolution? Depends on who you ask. TeraGrid 2007

  25. 2. Portlets • Standardized Java components (special servlets) that can be put together quickly to create a complete portal page. • Plug and play. Transportable. • Generate fragments of markup. • Follow the JSR 168 standard. • JSR 168 defines • How to bundle portlets • How the portlet lifecycle is managed TeraGrid 2007

  26. 2. Portlets … • Run inside a “portlet container”. Two popular JSR 168 compliant containers are • Gridsphere • Apache Pluto • The portlet container runs inside a “servlet container”. The most popular container is • Apache Tomcat • The servlet container may work with a webserver such as Apache httpd. TeraGrid 2007

  27. 2. Portlets & the grid • What is the connection between portlets and the grid? • None. Portlets are merely generic components. • Some portlets (grid portlets) might perform grid tasks. • What about Gridsphere? It has the word grid in it. • Nope. It is simply a strategic name chosen by the Gridsphere developers. • Gridsphere is a generic, JSR 168 compliant portlet container. • It can thus run JSR 168 compliant (or not) portlets that do some grid task(s). TeraGrid 2007

  28. 2. Practical (standards-based) tools • Enough! I have a headache already. Tell me something I can actually use with TeraGrid. • COG kits • Open Grid Computing Environment (OGCE) • (Gridsphere) GridPortlets • Clarens is a web services approach to the grid • IN-VIGO virtualizes the grid • Application Hosting Environment (AHE) runs unmodified apps on the grid TeraGrid 2007

  29. 2. Globus API • Java community grids toolkit (COG kit) • An abstraction layer (via a Java API) that hides the underlying middleware (Globus toolkit/different toolkit versions - GT2/GT4). • Provides command line tools as well. • Also Python COG kit. http://wiki.cogkit.org/ TeraGrid 2007

  30. 2. Portal Creation Enviroments • Open Grid Computing Environment (OGCE) • A complete Java environment that allows you to develop JSR 168 portlets, Gridsphere included. • Uses the COG kit. • Provides a number of bundled portlets • Job submission and monitoring • File transfer • Collaboration tools, etc. • Current version: 2.0.4. http://www.collab-ogce.org/ TeraGrid 2007

  31. 2. Portal creation … • GridPortlets • GridPortlets is the name of the package. The package includes grid portlets, but note the difference. • A specific, JSR 168 compliant Java implementation. • Runs under Gridsphere (not included). • Uses the COG kit but provides an abstraction layer (API) on top of the COG kit. • Uses (depends on) Gridsphere’s simple API for creating a GUI. • Provides an “action” model for creating portlets. • Current version: 1.4. http://www.gridsphere.org/gridsphere/gridsphere/guest/download/r/ TeraGrid 2007

  32. 3. Tips for building a usable gateway • How can I make sure that my gateway will actually be used? • If you keep in mind three most important factors: a) users, b) users, and c) users. • Let users dictate; don’t assume. • If users can’t, spend time with them; observe what they do and how they do it. • Test, test some more, then test until you drop. The assumption that an IT person/developer, removed from the user/discipline, can “build it and they will come” is doomed from the get go. TeraGrid 2007

  33. 3. Usability tip #1: Determine what users want/need • Some users know and come seeking help. • Others have no idea; they don’t know what’s possible. How do I help them? • Try this: • “Can I come over and see your lab (or how you do X)?” X might be • process data/run simulation/handle results • submit/run/monitor jobs, etc. • “Ah, that’s how you do it. What if I can Y?” Y might be • make it 100x faster • make it a lot easier, etc. TeraGrid 2007

  34. 3. Usability tip #2: Design/build a good user interface • Otherwise why would Microsoft spend zillions of dollars on developing and testing its user interfaces? • The UI can be a make or break factor. • How do I ensure that I have a usable UI? • Formal usability testing in a usability lab • Scour the web to learn about usability/testing • Read the “Usability 101” document http://dhruv.uits.indiana.edu/portals/usability-101.doc • Perform poor man’s usability testing TeraGrid 2007

  35. 3. Developer/user UI disconnect … * From “DON’T MAKE ME THINK: A Common Sense Approach to Web Usability” by Steve Krug. TeraGrid 2007

  36. 3. Usability Tips #3: Follow best practices • Refer to the TeraGrid Science Gateways Primer http://www.teragridforum.org/mediawiki/index.php?title=TeraGrid_Science_Gateways_Primer TeraGrid 2007

  37. 4. Scaling up • Do I need to scale up? • Not necessarily. • Many scientific applications provided in a gateway may require only local resources (compute cluster, storage, databases, etc.). • Many existing science gateways use quite modest back ends for compute resources. • Some even have nothing to do with CPU cycles or grid at all. TeraGrid 2007

  38. 4. Scaling up … • Ok, so when do I need more powerful resources (such as the TeraGrid)? • Reactively: • too many new users, analyses, etc. • processing is too slow to be useful • local resources no longer sufficient • users yelling at you? • Proactively: • possible future growth designed in from the get go • close monitoring of trends • etc. … TeraGrid 2007

  39. 4. Scaling up … • Why should I use the TeraGrid? • Virtually unlimited resources (CPU cycles, storage, databases, etc.) • Many services available. • Easy to get access. • TG support staff ready to help. • A production, national grid infrastructure (looks good on grant too) TeraGrid 2007

  40. 4. Scaling up … • Ok, I am convinced that I need to scale up? What do I do next? • Nancy Wilkins-Diehr will be addressing this later today. TeraGrid 2007

  41. Still awake? Had enough? TeraGrid 2007

  42. 5. Local resources needed • So what will it take locally for me to build one of these gateways? • People • Expertise • Time • Hardware • Software TeraGrid 2007

  43. 5. Local resources needed … • How many people? • Depends. For a complex, grid-based gateway, 1-2 FTEs. Much less if modest effort (an undergrad). • What level of expertise? • For need-it-now projects, interpreted language (PHP, perl, Ruby, etc.) programming skills + some DB (MySQL, etc.) knowledge. • For a well designed, high-end gateway Java programming skills a must. Also some database and UI experience. • How much time? • Anywhere from 3-6 undergrad months for a simple gateway to roughly ~2 FTE-years for one that is fairly complex; this includes the learning curve (modest to high). • What hardware? • Anywhere from a Unix/Linux box to an entire Linux cluster depending on development needs. TeraGrid 2007

  44. 5. Local resources needed … • What software? • Programming language(s): Perl, Python, Ruby, PHP, Java, Javascript, etc. • Development environment (compilers, editors, debuggers, etc.). • Databases: MySQL, PostgreSQL, etc. • Server environment: Apache httpd, Apache Tomcat, etc. • Grid middleware: Globus toolkit, COG kit, etc. • Portlet container: Gridsphere, Pluto. • Portlet building toolkit: OGCE, GridPortlets. • Web services: WSRF, WSRP, etc. • Popular portal building toolkit: Joomla/Drupal/Mambo/e107, etc. TeraGrid 2007

  45. TeraGrid 2007

More Related