1 / 26

Telegraph Java Experiences

Telegraph Java Experiences. Sam Madden UC Berkeley madden@cs.berkeley.edu. Telegraph Overview. 100% Java In memory database Query engine for alternative sources Web Sensors Testbed for adaptive query processing. Telegraph & WWW : FFF. Federated Facts and Figures

penn
Download Presentation

Telegraph Java Experiences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu RightOrder : Telegraph & Java

  2. Telegraph Overview • 100% Java • In memory database • Query engine for alternative sources • Web • Sensors • Testbed for adaptive query processing RightOrder : Telegraph & Java

  3. Telegraph & WWW : FFF • Federated Facts and Figures • Collect Data on the Election • Based on Avnur and Hellerstein Sigmod ‘00 Work: Eddies • Route tuples dynamically based on source loads and selectivities RightOrder : Telegraph & Java

  4. fff.cs.berkeley.edu RightOrder : Telegraph & Java

  5. Architecture Overview • Query Parser • Jlex & CUP • Preoptimizer • Chooses Access Paths • Eddy • Routes Tuples To Modules RightOrder : Telegraph & Java

  6. Modules • Doubly-Pipelined Hash Joins • Index Joins • For probing into web-pages • Aggregates & Group Bys • Scans • Telegraph Screen Scraper: View web pages as Relations RightOrder : Telegraph & Java

  7. Execution Framework • One Thread Per Query • Iterator Model for Queries • Experimented with Thread Per Module • Linux threads are expensive • Two Memory Management Models • Java Objects • Home Rolled Byte Arrays RightOrder : Telegraph & Java

  8. Tuples as Java Objects • Tuple Data stored as a Java Object • Each in separate byte array • Tuples copied on joins, aggregates • Issues • Memory Management between Modules, Queries, Garbage collector control • Allocation Overhead • Performance: 30,000 200byte tuples / sec -> 5.9 MB / sec RightOrder : Telegraph & Java

  9. Byte Array Offset, Size Offset, Size Offset, Size Directory Surrogate Objects Tuples As Byte Array • All tuples stored in same byte array / query • Surrogate Java Objects RightOrder : Telegraph & Java

  10. Byte Array (cont) • Allows explicit control over memory / query (or module) • Compaction eliminates garbage collection randomness • Lower throughput: 15,000 t/sec • No surrogate object reuse • Synchronization costs RightOrder : Telegraph & Java

  11. Other System Pieces • XML Based Catalog • Java Introspection Helps • Applet-based Front End • JDBC Interface • Fault Tolerance / Multiple Servers • Via simple UNIX tools RightOrder : Telegraph & Java

  12. RightOrder Questions • Performance vs. C • JNI Issues • Garbage Collection Issues • Serialization Costs • Lots of Java Objects • JDBC vs ODI RightOrder : Telegraph & Java

  13. Performance Vs. C • JVM + JIT Performance Encouraging: IBM JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks • IBM JIT 2x Faster than HotSpot for Telegraph Scans • Stability Issues • www.javalobby.org/features/jpr RightOrder : Telegraph & Java

  14. JIT Performance vs C Optimized Intel Optimized MS IBM JIT Source: www.javalobby.org/features/jpr RightOrder : Telegraph & Java

  15. Performance Gotchas • Synchronization • ~2x Function Call overhead in HotSpot • Used in Libraries: Vector, StringBuffer • String allocation single most intensive operation in Telegraph • Mercatur: 20% initial CPU Cost • Garbage Collection • Java dumb about reuse • Mercatur: 15% Cost • OceanStore: 30ms avg latency, 1S peak RightOrder : Telegraph & Java

  16. More Gotchas • Finalization • Finalizing methods allows inlining • Serialization • RMI, JNI use serialization • Philippsen & Haumacher Show Performance Slowness RightOrder : Telegraph & Java

  17. Performance Tools • Tools to address some issues • JAX, Jopt: make bytecode smaller, faster • www.alphaworks.ibm.com/tech/JAX • www.condensity.com • Bytecode optimizer • www.optimizeit.com • Good profiler, memory allocation and garbage collection monitor RightOrder : Telegraph & Java

  18. JNI Issues • Not a part of Telegraph • JNI overhead quite large (JDK 1.1.8, PII 300 MHz) Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis, UC Berkeley, 1999. RightOrder : Telegraph & Java

  19. More JNI • But, this is being worked on • IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII) • JNI allows synchronization (pin / unpin), thread management • See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html • GCJ + CNI: access Java objects via C++ classes • http://gcc.gnu.org/java/ RightOrder : Telegraph & Java

  20. Garbage Collection • Performance • Big problem: 1 S or longer to GC lots of objects • Most Java GCs blocking (not concurrent or multi-threaded) • Unexpected Latencies • OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC • In high-concurrency apps, such delays disastrous RightOrder : Telegraph & Java

  21. Garbage Collection Cont. • Limited Control • Runtime.gc() only a hint • Runtime.freeMemory() unreliable • No way to disable • No object reuse • Lots of unnecessary memory allocations RightOrder : Telegraph & Java

  22. Serialization • Not in Telegraph • Philippsen and Haumacher, “More Efficient Object Serialization.” International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. • Serialization costs for RMI are 50% of total RMI time • Discard longevity for 7x speed up • Sun Serialization provides versioning • Complete class description stored with each serialized object • Most standard classes forward compatible (JDK docs note special cases) • See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html RightOrder : Telegraph & Java

  23. Lots of Objects • GC Issues Serious • Memory Management • GC makes programmers allocate willy-nilly • Hard to partition memory space • Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries RightOrder : Telegraph & Java

  24. Storage Overheads • Java Object class is big: • Integer requires 23 bytes in JDK 1.3 • int requires 4.3 bytes • No way to circumvent object fields • Use primitives or hand-written serialization whenever possible RightOrder : Telegraph & Java

  25. JDBC vs ODI • No experience with Oracle • JDBC overheads are high, but don’t have specific performance numbers RightOrder : Telegraph & Java

  26. Bottom Line • Java great for many reasons • GC, standard libraries, type safety, introspection, etc. • Significant reductions in development and debugging time. • Java performance isn’t bad • Especially with some tuning • Memory Management an Issue • Lack of control over JVMs bad • When to garbage collect, how to serialize, etc. RightOrder : Telegraph & Java

More Related