Java-based Technologies to Support Parallel and Distributed Applications

Java-based Technologies to Support Parallel and Distributed Applications Prof Mark Baker ACET, University of Reading Tel: +44 118 378 8615 E-mail: Mark.Baker@computer.org Web: http://acet.rdg.ac.uk/~mab mark.baker@computer.org

Outline • The pros and cons of Java. • Problems developing Java applications. • Java and the Grid: • WS, REST and Web 2.0. • Some Java application development. • GridRM, • Tycho, • MPJ Express. • Some lessons learnt! • Conclusions. mark.baker@computer.org

The pros and cons of Java • Object Oriented: • Java is a modern object-oriented language: • There are many popular object-oriented languages available today: • Some like Smalltalk and Java are designed from the beginning to be object-oriented, • Others, like C++, are partially object-oriented, and partially procedural. • In Java, almost every variable is an object of some type or another - even strings. • In C++, you can still overwrite the contents of data structures and objects, causing the application to crash, • Java prohibits direct access to memory contents, leading to a more robust system. mark.baker@computer.org

The pros and cons of Java • Portable: • Most programming languages are compiled and converted to machine code that executed only on one type of machine - typically produces fast native code. • Interpreted code often does not need to be compiled - it is translated as it is run, which can be quite slow, but often portable across different OSs and processor architectures. • Java uses both techniques - it is compiled into a platform-neutral machine code (bytecode) then an interpreter: • Either a Java Virtual Machine (JVM), a JIT (Just-In-Time) compiler, or an adaptive optimising engine such as HotSpot. • A couple of advantages: • The source code is protected from view and modification - only the bytecode needs to be made available to users. • Security mechanisms can scan bytecode for signs of modification or harmful code, • Means that Java code can be compiled once, and run on any machine and OS combination that supports a JVM - wonderful delegation of responsibility! mark.baker@computer.org

The pros and cons of Java • Multi-threaded • Some languages, like C/C++ have support for threads - Pthreads (POSIX 1003.1c standard), which is a separate library that you need to include and use - an after thought! • Many language do not have thread support, so you need to potentially create multiple-processes… • Java has support for multiple threads of execution built right into the language: • Thread support was designed into the language from the out start, simple, and commonly used in applications and applets. mark.baker@computer.org

The pros and cons of Java • Concurrency Control: • Java 1.5 has comprehensive support for general-purpose concurrent programming, • The support is partitioned into three packages: • java.util.concurrent - this provides various classes to support common concurrent programming paradigms, e.g., support for various queuing policies such as bounded buffers, sets and maps, thread pools etc, • java.util.concurrent.atomic - this provides support for lock-free thread-safe programming on simple variables such as atomic integers, atomic booleans, etc, • java.util.concurrent.locks - this provides a framework for various locking algorithms that augment the Java language mechanisms, e.g., read -write locks and condition variables. mark.baker@computer.org

The pros and cons of Java • Garbage collection: • Languages like C/C++ force programmers to allocate/deallocate memory for data and objects manually: • Adds complexity, but also causes another problem - memory leaks. • When Java applications create objects, the JVM allocates memory space for their storage: • When the object is no longer needed (no reference to the object exists), the memory space can be reclaimed for later use. • In Java, the programmer is free from such worries, as the JVM will perform automatic garbage collection of objects. mark.baker@computer.org

The pros and cons of Java • Memory Management: • There are no destructors in Java, • There is no "scope" of a variable per se, to indicate when the object’s lifetime is ended – the lifetime of an object is determined instead by the garbage collector, • finalize() method is called by the garbage collector and is supposed to be responsible only for releasing "resources”: • Such as open files, sockets, ports, URLs. • All objects in C++ will be (or rather, should be) destroyed, but not all objects in Java are garbage collected, • Java garbage collector can be changed, but no explicit control over object collection! mark.baker@computer.org

The pros and cons of Java • Security: • Security is not built into most traditional programming languages, • Security is a big issue with Java: • At the API level, there are strong security restrictions on file and network access for applets, as well as support for digital signatures to verify the integrity of downloaded code. • At the bytecode level, checks are made for obvious hacks, such as stack manipulation or invalid bytecode. • The weakest link in the chain is the Java Virtual Machine on which it is run - a JVM with known security weaknesses can be prone to attack: • Worth noting that while there have been a few identified weaknesses in JVMs, they are rare, and usually fixed quickly. mark.baker@computer.org

The pros and cons of Java • Network and "Internet" aware”: • In languages like C/C++, the networking code must be re-written for different operating systems, and is usually more complex, • Java was designed to be "Internet" aware, and to support network programming: • The Java API provides network support, from sockets and IP addresses, to URLs and HTTP, • It is easy to write network applications in Java, and the code is portable between platforms, • The networking support of Java saves a lot of time, and effort, • Java support for exotic network programming, such as remote-method invocation (RMI), SOAP, CORBA and Jini. mark.baker@computer.org

The pros and cons of Java • Arrays in Java: • Although they look similar, arrays have a very different structure and behaviour in Java than they do in C/C++, • There is a read-only length member (size of array) and run-time checking throws an exception if you go out of bounds, • In Java a two-dimensional array is an array of one-dimensional arrays, • Although we might expect that elements of rows are stored contiguously, one cannot depend upon the rows themselves being stored contiguously: • In fact, there is no way to check whether rows have been stored contiguously after they have been allocated. • The possible non-contiguity of rows implies that the effectiveness of block-oriented algorithms may be dependent on the particular implementation of the JVM as well as the current state of the memory manager. mark.baker@computer.org

The pros and cons of Java • The Floating-point Issue: • The fact that Java programs produce bitwise identical floating-point results in every JVM seems to be not only unrealisable in practice, but also inhibits efficient floating-point processing on some platforms. • For example, it precludes the efficient use of floating-point hardware on Intel processors which utilise extended precision in registers. mark.baker@computer.org

The pros and cons of Java • Exceptions: • Java exception handling is superior to those in C/C++: • No exception handling classes in C. • The C++ approach of calling a function at run-time when the wrong exception is thrown, where as in Java exception specifications are checked and enforced at compile-time. • Also, overridden methods must conform to the exception specification of the base-class version of that method: they can throw the specified exceptions or exceptions derived from those. mark.baker@computer.org

The pros and cons of Java • Accessing OS Resources: • Java can be too restrictive in some cases, you are prevented from doing important tasks such as directly accessing hardware, • Java solves this with native methods (JNI) that allow you to call a function written in another language (currently only C and C++ are supported), • Thus, you can always solve a platform-specific problem (in a relatively non-portable fashion, but then that code is isolated), • Applets cannot call native methods, only applications. mark.baker@computer.org

The pros and cons of Java • Documentation • Java has built-in support for comment-based documentation: • The source code file can also contain its own documentation, which is stripped out and reformatted into HTML via a separate program (javadoc): • Create API docs… • This is a boon for documentation maintenance and use. mark.baker@computer.org

Some Advantages of Java • Java development tools: • e.g. Eclipse platform, and a large number of open source, free software that has been made available by the community of programmers. • If nothing else, this software can be a starting place to develop new ideas and more sophisticated applications. • Programmer productivity: • At least two times greater with Java. • A lot can be done in a short amount of time with Java because it has such an extensive library of functions already built into the language, integrated development environments, and a wide selection of supporting tools. mark.baker@computer.org

Java Summary • Generally, Java is more robust, via: • Object handles initialised to null (a keyword), • Handles are always checked and exceptions are thrown for failures, • All array accesses are checked for bounds violations, • Automatic garbage collection prevents memory leaks, • Clean, relatively fool-proof exception handling, • Simple language support for multi-threading, • Java Security, Tooling… mark.baker@computer.org

Problems Developing Java Applications • Memory - both amount used by an applications and speed that memory is recovered: • Some evidence that Java applications are becoming increasingly bloated - never been particularly small though! • Default JVM has 64 Mbytes, increase the amount of memory allocated using -Xmx command-line switch.. • Obviously, applications could have multiple threads, or you could be using multiple JVMs. • Issues with how quickly GC recovers memory - can cause java.lang.OutOfMemoryError: Java heap space: • Commonly seen problem in message systems (i.e. NaradaBroker) and when storing in-memory data-structures (MDS4). mark.baker@computer.org

Problems Developing Java Applications • Accessing system resources and legacy applications: • Java Native Interface created for the purpose of interacting with resources that cannot accessed normally via the JVM: • Interacting with OS resources - such as /Proc!, sending signals… • User applications, or libraries (MPI). • Use of JNI breaks Java programming model: • Non object oriented, • No GC - memory leaks possible, • Relatively complicated interface - easy to make mistakes, • OK for C and C++ - need to have a good think about wrapping Fortran as you call it from C/C++!, • Lose portability. mark.baker@computer.org

Problems Developing Java Applications • Relative performance of a Java-based applications depends on a number of factors: • Coding efficiency, • Version of the JVM, • Operating System, • Memory available, • Others… The authors conclude, "On Intel Pentium hardware, especially with Linux, the performance gap is small enough to be of little or no concern to programmers." mark.baker@computer.org

Problems Developing Java Applications • Java is now nearly equal to (or faster than) C++ on low-level and numeric benchmarks: • This should not be surprising as Java is a compiled language (albeit JIT compiled). • Nevertheless, the idea that "java is slow" is widely believed. • Several possible reasons: • Early Java (circa 1995 was slow): • Did not have Java JIT compiler, and hence bytecode was interpreted (like Python). • Situation changed with the emergence of JIT compilers from Microsoft, Symantec, and in Sun's java 1.2. • Java can be slow still: • For example, programs written with the thread-safe Vector class are necessarily slower, than those written with the equivalent thread-unsafe ArrayList class. mark.baker@computer.org

Problems Developing Java Applications • Java program start-up is slow - as a it starts, it unzips the Java libraries and compiles parts of itself, so an interactive program can be sluggish for the first couple seconds of use. • There is a similar "garbage collection is slow" myth that persists despite evidence to the contrary. • Together these suggest that it is possible that no amount of data will alter peoples' beliefs, and that in actuality these "speed beliefs" probably have little to do with java, garbage collection, or the otherwise stated subject: • The answer probably lies somewhere in sociology or psychology. • Programmers, despite their professed appreciation of logical thought, are not immune to a kind of mythology, though these particular "myths" are arbitrary and relatively harmless. mark.baker@computer.org

Java and the Grid • Almost all the core Grid software is now Java-based! • Especially now that it is based on Web Services’ technologies… • Could ask why this is the case…? • Can probably be explained by the fact that Java is a good language for rapid prototyping, it is free, runs anywhere, extensive tooling, libraries,… • Core of Globus/Unicore/gLite/OMII/Crown… etc all Java-based. • Well defined APIs, so clients can be written in more or less any language. • Interacting with legacy applications is more problematic. mark.baker@computer.org

The Grid’s Architecture • Moved to Service Oriented Architecture (SOA) Web Services-based Grid infrastructure back in early 2003 - Open Grid Service Architecture (OGSA): • Huge effort to standardise everything Grid-related! • Out popped OGSI, quickly dropped, • Then WSRF (Jan 2004+), • More recently WS-Resource Transfer (Oct 2006)! • Also there is a great debate about services and state! • Funnily enough, Globus 2.4, is still very popular and a key part of many Grid projects. • Far too many changes, standards and specifications, and all very confusing and complicated… • Mutterings in the Grid community for several years now, due to fact that no one really knows what standards and specs to use: • And if the one chosen will actually still being used in the future. mark.baker@computer.org

Web Services Standards and Specifications Some Standards - list from Geoffrey Fox circa 2004 mark.baker@computer.org

A List of Web Services I • a) Core Service Architecture • XSD: XML Schema (W3C Recommendation) V1.0 February 1998, V1.1 February 2004 • WSDL 1.1: Web Services Description Language Version 1.1, (W3C note) March 2001 • WSDL 2.0: Web Services Description Language Version 2.0, (W3C under development) March 2004 • SOAP 1.1: (W3C Note) V1.1 Note May 2000 • SOAP 1.2: (W3C Recommendation) June 24 2003 • b) Service Discovery • UDDI:(Broadly Supported OASIS Standard) V3 August 2003 • WS-Discovery: Web services Dynamic Discovery (Microsoft, BEA, Intel …) February 2004 • WS-IL:Web Services Inspection Language, (IBM, Microsoft) November 2001 mark.baker@computer.org

A List of Web Services II • c) Security • SAML:Security Assertion Markup Language (OASIS) V1.1 May 2004 • XACML: eXtensible Access Control Markup Language (OASIS) V1.0 February 2003 • WS-Security: Web Services Security: SOAP Message Security (OASIS) Standard March 2004 • WS-SecurityPolicy: Web Services Security Policy (IBM, Microsoft, RSA, Verisign) Draft December 2002 • WS-Trust:Web Services Trust Language (BEA, IBM, Microsoft, RSA, Verisign …) May 2004 • WS-SecureConversation: Web Services Secure Conversation Language (BEA, IBM, Microsoft, RSA, Verisign …) May 2004 • WS-Federation:Web Services Federation Language (BEA, IBM, Microsoft, RSA, Verisign) July 2003 mark.baker@computer.org

A List of Web Services III • d) Messaging • WS-Addressing: Web Services Addressing (BEA, IBM, Microsoft) March 2004 • WS-MessageDelivery: Web Services Message Delivery (W3C Submission by Oracle, Sun ..) April 2004 • WS-Routing: Web Services Routing Protocol (Microsoft) October 2001 • WS-RM: Web Services Reliable Messaging (BEA, IBM, Microsoft, Tibco) v0.992 March 2004 • WS-Reliability: Web Services Reliable Messaging (OASIS Web Services Reliable Messaging TC) March 2004 • SOAP MOTM: SOAP Message Transmission Optimization Mechanism (W3C) June 2004 • e) Notification • WS-Eventing: Web Services Eventing (BEA, Microsoft, TIBCO) January 2004 • WS-Notification: Framework for Web Services Notification with WS-Topics, WS-BaseNotification, andWS-BrokeredNotification (OASIS) OASIS Web Services Notification TC Set up March 2004 • JMS:Java Message Service V1.1 March 2002 mark.baker@computer.org

A List of Web Services IV • f) Coordination and Workflow, Transactions and Contextualization • WS-CAF:Web Services Composite Application Framework including WS-CTX, WS-CFandWS-TXM below (OASIS Web Services Composite Application Framework TC) July 2003 • WS-CTX:Web Services Context (OASIS Web Services Composite Application Framework TC) V1.0 July 2003 • WS-CF: Web Services Coordination Framework (OASIS Web Services Composite Application Framework TC) V1.0 July 2003 • WS-TXM: Web Services Transaction Management (OASIS Web Services Composite Application Framework TC) V1.0 July 2003 • WS-Coordination: Web Services Coordination (BEA, IBM, Microsoft) September 2003 • WS-AtomicTransaction: Web Services Atomic Transaction (BEA, IBM, Microsoft) September 2003 • WS-BusinessActivity: Web Services Business Activity Framework (BEA, IBM, Microsoft) January 2004 • BTP: Business Transaction Protocol (OASIS) May 2002 with V1.0.9.1 May 2004 • BPEL: Business Process Execution Language for Web Services (OASIS) V1.1 May 2003 • WS-Choreography: (W3C) V1.0 Working Draft April 2004 • WSCI: (W3C) Web Service Choreography Interface V1.0 (W3C Note from BEA, Intalio, SAP, Sun, Yahoo) • WSCL:Web Services Conversation Language (W3C Note) HP March 2002 mark.baker@computer.org

A List of Web Services V • h) Metadata and State • RDF:Resource Description Framework (W3C) Set of recommendations expanded from original February 1999 standard • DAML+OIL: combining DAML (Darpa Agent Markup Language) and OIL (Ontology Inference Layer) (W3C) Note December 2001 • OWLWeb Ontology Language (W3C) Recommendation February 2004 • WS-DistributedManagement: Web Services Distributed Management Framework with MUWS and MOWS below (OASIS) • WSDM-MUWS: Web Services Distributed Management: Management Using Web Services (OASIS) V0.5 Committee Draft April 2004 • WSDM-MOWS: Web Services Distributed Management: Management of Web Services (OASIS) V0.5 Committee Draft April 2004 • WS-MetadataExchange: Web Services Metadata Exchange (BEA,IBM, Microsoft, SAP) March 2004 • WS-RF:Web Services Resource Framework including WS-ResourceProperties, WS-ResourceLifetime, WS-RenewableReferences, WS-ServiceGroup, and WS-BaseFaults(OASIS) Oasis TC set up April 2004 and V1.1 Framework March 2004 • ASAP: Asynchronous Service Access Protocol (OASIS) with V1.0 working draft G June 2004 • WS-GAF:Web Service Grid Application Framework (Arjuna, Newcastle University) August 2003 mark.baker@computer.org

A List of Web Services VI • g) General Service Characteristics • WS-Policy: Web Services Policy Framework (BEA, IBM, Microsoft, SAP) May 2003 • WS-PolicyAssertions:Web Services Policy Assertions Language (BEA, IBM, Microsoft, SAP) May 2003 • WS-Agreement: Web Services Agreement Specification (GGF under development) May2004 • i) User Interfaces • WSRP: Web Services for Remote Portlets (OASIS) OASIS Standard August 2003 • JSR168: JSR-000168 Portlet Specification for Java binding (Java Community Process) October 2003 mark.baker@computer.org

Thoughts • Should we be using standards IF they: • Are new and just emerging, • Are changing frequently, for example UDDI! • Enhance interoperability, but potentially cripple performance, • Are not widely adopted, • Are not easy to understand and complicated to implement. • What are the alternatives? • REST, • Web 2.0… mark.baker@computer.org

Representation State Transfer (REST) • REST is a style of software architecture for distributed systems such as the Web. • The term originated in a 2000 doctoral dissertation written by Roy Fielding, one of the principal authors of the HTTP protocol specification. • The term is also often used to describe any simple interface that transmits domain-specific data over HTTP without an additional messaging (e.g. SOAP) or session tracking (e.g. cookies). • The idea behind REST is to take what we all take for granted in the Web (i.e. HTTP and URIs) and use it with other payloads (other than HTML, images, etc.) and other HTTP Methods more than the usual GET and POST. • REST defines identifiable resources (URIs), and methods for accessing and manipulating the state of those resources (HTTP Methods). mark.baker@computer.org

REST • It uses well documented, established, and used technologies and methodologies. • It is already here today; been around for while! • Resource centric rather than method centric. • Given a URI, anyone already knows how to access it. • NOT another protocol on top of another protocol on top of another protocol on top of... • Response payload can be of any format. • Uses the inherent HTTP security model, certain methods to certain URIs can easily be restricted by firewall configuration, unlike other XML over HTTP messaging formats. • REST makes sense, use what we already have; it is the next logical extension of the Web… isn’t it!! mark.baker@computer.org

What’s Web2.0? • “Web 2.0, a phrase coined by O'Reilly Media in 2004, refers to a supposed second-generation of Internet-based services such as social networking sites, Wikis, communication tools, and folksonomies that let people collaborate and share information online in previously unavailable ways.” - Wikipedia mark.baker@computer.org

Emerging Tech Apps You Know… Some Apps You Don’t know Major Retailers Hard to Define, Know it When you See it… • Web API’s • “Folksonomies” / Content tagging • “AJAX” • RSS • Flickr • Google Maps • Blogging & Content Syndication • Linkedin, Friendster • Del.icio.us • Upcoming.org • 43Things.com • Amazon API’s • Google Adsense API • Yahoo API • Ebay API mark.baker@computer.org

Web 2.0: Evolution Towards a Read/Write Platform mark.baker@computer.org

Further Thoughts • What is special about the REST model? and Web 2.0? • Nothing special from what I can see, but both are gaining popularity! • Why? • REST architecture is simple and its technologies are mature and stable: • HTTP(S) and ASCII! • Web 2.0 is some mashup of web-based patterns and technologies: • Accessible to users and relatively simple to create Web-based applications. • In the end users will vote with their feet and use whatever technologies help them develop applications quickly and effectively. mark.baker@computer.org

Oh and BTW! (I) mark.baker@computer.org

Oh and BTW! (II) mark.baker@computer.org

Some Java Application Development Incidental REST Development! mark.baker@computer.org

Java-based Application Development • Over the last decade my group have been developing Java-based middleware for parallel and distributed applications. • We have gained a lot of experience and knowledge about the development of Java-based applications. • The next part of the talk will outline some the projects that my group have been developing and some interesting observations about the pitfalls that can be made developing Java applications. mark.baker@computer.org

Started off with GridRM (gridrm.org) a wide-area Java-based resource monitoring system: A scalable infrastructure for gathering native information from computer-based resources, Agents, gateways, linked in a hierarchical configuration, Started designing it pre-2000, No standards in monitoring arena (apart from SNMP), so issue was what to use for our infrastructure? Choose HTTP and XML as this seemed to provide us with a core, that we could adapt at the end points. Some History! mark.baker@computer.org

Local Layer Hierarchy of gateways: Each manages a site: Interacts with agents, Deals with policies and security. Embedded database: Stores resource and historical data. Agents: Use what is there! Global Layer A system to bind together gateways: Producer/consumer paradigm, Meta-registry, Messaging Layer, Default security. GridRM Architecture mark.baker@computer.org

Further History • Earlier versions of GridRM was relatively hardwired, we wanted to create a more flexible and maintainable solution. • Needed software to bind together GridRM gateways and provide a messaging infrastructure… • Looked at obvious solutions: MDS, G-RMA, NaradaBrokering… • In 2003 we came up with Tycho, a GMA-based infrastructure that would provide a meta-registry and a message system that could bind together our GridRM gateways: • GMA was a GGF informational recommendation, • Unfortunately in 2003 still no Grid standards - transition from GT2 to GT3 at the time! • Consequently, we continued to use HTTP and XML. mark.baker@computer.org

Tycho consists of the following components: Mediators that allow producers and consumers to discover each other and establish remote communications, Consumers that typically subscribe to receive information or events from producers, Producers that gather and publish information for consumers. There is an asynchronous messaging API. In Tycho, producers and/or consumers (clients) can publish their existence in a Virtual Registry (VR). Tycho Architecture mark.baker@computer.org

Tycho Design • Tycho is a based on a publish, subscribe and bind paradigm. • Design Philosophy: • We believed that the system should have an architecture similar to the Internet, where every node provides reliable core services, and the complexity is kept, as far as possible, to the edges: • The core services can be kept to the minimum, and endpoints can provide higher-level and more sophisticated services, that may fail, but will not cause the overall system to crash. • We have kept Tycho’s core small, simple and efficient, so that it has a minimal memory foot-print, is easy to install, and is capable of providing robust and reliable services. • More sophisticated services can then be built on this core and are provided via libraries and tools to applications. • Allows Tycho to be flexible and extensible so that it will be possible to incorporate additional features and functionality. mark.baker@computer.org

Tycho Messaging Tests Performance Tests of Tycho against NaradaBrokering mark.baker@computer.org

NaradaBrokering: The NaradaBrokering framework is a distributed messaging infrastructure, developed by the Community Grids Lab at Indiana University. NaradaBrokering is an asynchronous messaging infrastructure with a publish and subscribe based architecture. NaradaBrokering is Sun JMS compliant. This messaging standard allows application components to exchange unified messages in an asynchronous system. Performance Tests (Messaging) mark.baker@computer.org

Summary - Tycho vs NB • Latency and throughput: • When looking at end-to-end performance, on a LAN for messages less than 2 Kbytes, Tycho and NaradaBrokering have comparable performance, • Tycho achieves 95% bandwidth, whereas NB only 65.3%, • Tyco’s performance is inhibited by the fact that it creates a new socket for each message sent: • NB reuses sockets instances once they have been created. • Scalability Summary: • The scalability tests have shown Tycho and NaradaBrokering producers and consumers to be stable under heavy load: • Performance is weaker when there is a large ratio of consumers to producers. • Heap Size: • The heap size for NB becomes a limiting factor in when a broker is receiving messages faster than it can send them, as the internal message buffer fills until the heap is consumed and system fails, • Tycho tests were performed without modifying the heap size, throttling used as there is limited buffering. mark.baker@computer.org

Java-based Technologies to Support Parallel and Distributed Applications