1 / 69

The Future of MOCHA

The Future of MOCHA. Nick Roussopoulos October 5, 2001. The Problem. Distributed and heterogeneous data sources. Data Sources for an enterprise are : Distributed Internet, intranets, extranets Heterogeneous Web servers, relational databases, file systems Mission-critical

andrewz
Download Presentation

The Future of MOCHA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Future of MOCHA Nick Roussopoulos October 5, 2001

  2. The Problem Distributed and heterogeneous data sources • Data Sourcesfor an enterprise are: • Distributed • Internet, intranets, extranets • Heterogeneous • Web servers, relational databases, file systems • Mission-critical • Weather service, ocean temperature, stock status, … • Costly to replace or upgrade • Risk of breaking it and loss of investment Nick Roussopoulos

  3. Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client The Problem High volume access from everywhere Internet Oracle 8i Informix XML Data Text Data Nick Roussopoulos

  4. Client Client Client Client Client Internet Oracle 8i Informix XML Data Text Data Client-Server 2-tier architecture complex FAT clients Bad Idea Nick Roussopoulos

  5. Integration Server Catalog Client Client Client Client Client Client Translator Translator Translator Translator Middleware 3-tier architecture Thin & fit clients Internet Oracle 8i Informix XML Data Text Data Nick Roussopoulos

  6. Nice but… • Most middleware solutions are static • Not flexible for dynamic environments • Not scalable to hundreds of client and server sites • Development cost is high • One-site-at-a-time at a fixed cost • Maintenance cost is high • Upgrades are practically redevelopments Nick Roussopoulos

  7. A dynamic world needs Code extensibility & auto-deployment • Need for user-defined types and functions • Polygon • Composite() – image aggregation • Porting and manual installation of code (C/C++) • Operating System • Hardware Platform • High cost of code maintenance • Updates on all platforms • Version management • Security in hostile platforms Nick Roussopoulos

  8. Integration Server Catalog Client Client Translator Translator Translator Translator Code Deployment Problem Internet Not Scalable Oracle 8i Informix XML Data Text Data Nick Roussopoulos

  9. Query Processing • Query execution options • Limited by site-dependent software • Composite() – must be ported before use • Most processing done at the Integration Server • Powerful Data Servers are under-utilized • I/O Nodes • Excessive data movement over the network • Network bottleneck • Slow internet access Nick Roussopoulos

  10. Integration Server Catalog Client Client 200MB 200MB 200MB Translator Translator Translator Translator 100MB 100MB 100MB Query Processing Problem Internet Inefficient & not scalable Oracle 8i Informix XML Data Text Data Nick Roussopoulos

  11. Solution MOCHA Middleware Based On a Code SHipping Architecture Nick Roussopoulos

  12. DAP DAP Client Q Q Q Q Q Q Q Q Q MOCHA Solution: Ship Java Code Mochlets Code Repository Catalog Informix Oracle QPC No code porting & no maintenance Maryland Texas Virginia Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Virginia Internet Nick Roussopoulos

  13. 100MB 200MB tuples tuples DAP DAP Client 200KB 350KB 150KB 200KB 200KB 150KB 150KB 350KB results results results results results results results results MOCHA Solution: Filter Data @ Source Code Repository Catalog Informix Oracle QPC No bandwidth waste Maryland Texas Virginia Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Virginia Internet Nick Roussopoulos

  14. Code Repository Catalog OS File DBMS QPC DAP DAP Client Software architecture Nick Roussopoulos

  15. QPC: The Query Processing Coordinator QPC Controls and Coordinates Query Execution Client API Query Parser Code Repository XML Catalog Query Optimizer Catalog Manager Execution Engine SQL & XML Proc. Interface Code Loader DAP Access API DAP Nick Roussopoulos

  16. DAP: The Data Access Provider DAP Provides QPC with Remote Access to the Data DAP Access API Control Module Execution Engine SQL & XML Proc. Interface Code Loader Data Source Access Layer Data Source JDBC I/O API DOM JNI Nick Roussopoulos

  17. Data Server: Storage System • Stores and Manages the data sets • database, web server, file system, XML repository Data Server Nick Roussopoulos

  18. Table Rasters location image week band Query: Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Processing a Query in MOCHA • Query Parsing • Resource Discovery • Query Optimization • Metadata and Control Exchange • Code Deployment Phase • Query Execution Nick Roussopoulos

  19. Coordination Thread Execution Thread Client Client Execution Thread Plan Generation QPC Code Repository Catalog DAP DAP Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Informix Oracle Nick Roussopoulos

  20. Coordination Thread Execution Thread Client Client Execution Thread Automatic Code Deployment QPC Code Repository Catalog DAP DAP Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Informix Oracle Nick Roussopoulos

  21. Coordination Thread Execution Thread Client Client Execution Thread Data Processing QPC Code Repository Catalog DAP DAP Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Informix Oracle Nick Roussopoulos

  22. Features of MOCHA • Automatic code deployment • “Plug-N-Play” • no system-wide installations • Metadata and Schema Mapping framework • XML, RDF • easy to exchange and map schemas • semi-automatic mapping • Query optimization based on code shipping • reduce data movement overhead • filters at the source • expands at the client • metrics for code (operator) placement • optimization for selection, union and join plans Nick Roussopoulos

  23. MOCHA Demo: Global Land Cover Facility • Integrates the following DAP sites • University of New Hampshire (Webster), NASA GSFC, UMD-CS, UMD-Geography, UMD-UMIACS SP-2 HPSS • GLCF hosts the QPC • Operations supported: • Coverage queries • Visualization of preview images for • Data sets MODIS, TM, AVHRR • GIS Features • Dynamic Sub-setting of TM scenes • Composites of GIS Features and AVHRR images Nick Roussopoulos

  24. Multi-Sensor Analysis of the Los Alamos Fire Event Using MOCHA • Data Synergy and Multi-Resolution Instrument Analysis using MOCHA • Access data residing at various data sources • Utilize image processing tools • Fire Analysis required a multi-resolution approach • MOCHA is independent of instrument or resolution specifics • High Resolution: IKONOS and TM data • Moderate Resolution: 250m MODIS • Coarse Resolution: AVHRR and DMSP Nick Roussopoulos

  25. MOCHA Search Utility Nick Roussopoulos

  26. MOCHA Search Utility (cont’d) Nick Roussopoulos

  27. MOCHA Search Utility (cont’d) Nick Roussopoulos

  28. MOCHA Query Results Nick Roussopoulos

  29. MOCHA ETM+ Subsetting Utility Nick Roussopoulos

  30. May 9, 2000 Los Alamos (Bands 1,2,3) Nick Roussopoulos

  31. May 9, 2000 Los Alamos (Bands 7,5,4) Nick Roussopoulos

  32. Multi-Sensor Query Nick Roussopoulos

  33. Tabular Query Results Nick Roussopoulos

  34. MODIS: May 11, 2000: During Fire Nick Roussopoulos

  35. MODIS: May 24, 2000: After Fire Nick Roussopoulos

  36. DMSP: Night Visibility of Fire Nick Roussopoulos

  37. IKONOS 4m resolution Nick Roussopoulos

  38. IKONOS 4m Subset Nick Roussopoulos

  39. IKONOS 1m resolution Nick Roussopoulos

  40. IKONOS 1m Subset Nick Roussopoulos

  41. MOCHA Metadata Publishing Framework • Provides information about system resources • Data sources • schemas and mappings • user-defined types and functions • Automates operation of MOCHA • Incremental system growth • neither fixed nor hardwired parameters • no extension by re-compilation • Share metadata with others (Internet) • machine readable form Nick Roussopoulos

  42. MOCHA Catalog Organization • Metadata about “resources” • Local and global tables • UDF data types and operators • Schema mapping rules • DAPs • Each one has Uniform Resource Identifier (URI) • global namespace • e.g.: mocha://cs1.umd.edu/EarthSci/Polygon • Modeled with RDF, serialized with XML • easy to understand, use and exchange Nick Roussopoulos

  43. RDF Model: Data Types mocha://cs1.umd.edu/EarthSci/Raster mocha:Type mocha:Creator user1@cs.umd.edu Raster mocha:Size mocha:Class mocha:Repository cs1.umd.edu/EarthSci Raster.class 1 megabyte Nick Roussopoulos

  44. <rdf:Descriptionabout= “mocha://cs1.umd.edu/EarthSci/Raster”> <mocha:Type>Raster</mocha:Type> <mocha:Class> Raster.class </mocha:Class> <mocha:Repository> cs1.umd.edu/EarthSci </mocha:Repository> <mocha:Size> 1 MB</mocha:Size> <mocha:Creator>user1@cs1.umd.edu </mocha:Creator> </rdf:Description> XML Serialization: Data Types • W3C Standards • Easy to specify using GUI tools • Easy to exchange • Crawlers can harvest it • Stored in • DB • File System Nick Roussopoulos

  45. Other Resources in MOCHA • Local and Global tables • data sources + columns + types • UDF Functions • argument types + return type • code repository • Schema mapping rules • DAPs • URL • login information Nick Roussopoulos

  46. location image week band point1 point2 photo date band rect() week() Schema Mapping in MOCHA • Direct column mappings • Complex Expressions RastersMD Rasters Nick Roussopoulos

  47. Plan Tree SMP SMP SMP MOCHA Schema Mapping Rules • Use XML to encode mapping rules • Schema mapping sub-plans • leaf nodes <MapList> <mi mapped = “direct”> <mocha:Column>image</mocha:Column> <mocha:Expr>photo</mocha:Expr> </mi> <mi mapped = “expression”> <mocha:Column> location </mocha:Column> <mocha:Expr> rect(point1, point2) </mocha:Expr> </mi> … Nick Roussopoulos

  48. MOCHA Optimization Framework • Query optimization based on heuristics • cost = network + CPU + I/O • Network is the dominant factor (WAN) • optimize for it first • CPU and I/O are cheaper • optimize for them later • Operator placement: Enhanced Hybrid Shipping • Code • Data Nick Roussopoulos

  49. Composite() Operator Placement in MOCHA • Data-Reducing Operators • “Filter” the data • aggregates, predicates, projections, semi-joins • Composite(), Overlaps() , AvgEnergy() • Push to the DAPs • Return distilled results • Less data movement Nick Roussopoulos

  50. DoubleRes() Operator Placement in MOCHA • Data-Inflating Operators • “Expand” the data • projections, image processing, some joins … • DoubleResolution(), RotateSolid() • Pull to the QPC • Data Shipping policy [FJK96] • Only send back raw arguments • Less data movement Nick Roussopoulos

More Related