1 / 64

Database Extensibility

Database Extensibility. Jacques Roy jacquesr@us.ibm.com. Agenda. What is Database Extensibility? Database and Extensibility background IDS features Extensibility Examples DataBlade Modules Bladelets Considerations for building your own. What is Database Extensibility?.

asova
Download Presentation

Database Extensibility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Extensibility Jacques Roy jacquesr@us.ibm.com

  2. Agenda • What is Database Extensibility? • Database and Extensibility background • IDS features • Extensibility Examples • DataBlade Modules • Bladelets • Considerations for building your own

  3. What is Database Extensibility? • Ability to add business components in the database • Tailor the database to the business environment • Reduce application complexity • Put the processing where it makes the most sense • Set processing provided by the database • Higher performance • Less data movement, better data representation, better indexing • Faster development, lower maintenance cost

  4. Extensibility Background • Historic Approach • user exits, device drivers • Relational Databases • triggers, constraints, stored procedures • New development platforms • Web Server: CGI, NSAPI/ISAPI • App Servers: J2EE • Other • Eclipse and MS Visual Studio plugins • Object-Relational Capabilities • Relational Framework

  5. User-defined types Distinct, opaque Table/Type inheritance Functional index R-Tree index Smart Large Objects More. . .Extensions can be written in:SPL, C, Java Complex types Row, set, multiset, list Polymorphism User-Defined Function Also known as user-defined routine User-Defined Aggregate Primary/Secondary Access Methods IDS Extensibility Features

  6. When to Use UDFs/UDRs • Eliminate large data transfer • Transfer time is much larger than processing time • Simplify or improve processing • Eliminate table scan • Eliminate application set processing • Define new sorting order • Replace store procedures with aggregates • Provide business processing to applications • Consistency of result • Eliminate the need for custom interfaces

  7. Better Grouping • Quarter() function:SELECT Quarter(date), SUM(income)FROM ordersWHERE Quarter(date) LIKE '2008Q%‘GROUP BY 1ORDER BY 1; • AgeGroup() function:SELECT AgeGroup(birthdate, 20, 10) AgeGrouping, SUM(total_price) TotalFROM customer c, orders o, items IWHERE c.customer_num = o.customer_numAND o.order_num = i.order_numGROUP BY 1ORDER BY 1 DESC; AgeGrouping Total 40-49 $12646.97 50-59 $ 5507.80

  8. Quarter Function in “C” #include <mi.h> mi_lvarchar *quarter(mi_date date, MI_FPARAM *fparam) { mi_lvarchar *retVal; /* The return value. */ short mdy[3]; mi_integer ret, qt; char buffer[10]; /* Extract month, day, and year from the date */ ret = rjulmdy(date, mdy); qt = 1 + ((mdy[0] - 1) / 3); /* calculate the quarter */ sprintf(buffer, "%4dQ%d", mdy[2], qt); retVal = mi_string_to_lvarchar(buffer); return retVal; /* Return the value. */ }

  9. Compiling and Linking # Compilation COMPILE=-I$(INFORMIXDIR)/incl/public -O -c cc -DMI_SERVBUILD $(COMPILE) quarter.c # Creating the shared library ld -G -o qlib.bld quarter.o # Change permission on file to 775 # read and execute to other chmod a+x qlib.bld

  10. Creating the function CREATE FUNCTION quarter(date) RETURNS varchar(10) WITH (not variant, parallelizable) external name "$INFORMIXDIR/extend/class/qlib.bld(quarter)" LANGUAGE C; GRANT EXECUTE ON FUNCTION quarter(date) TO public;

  11. Stored Procedures SQL group UDR order order scan scan scan scan UDR Processing vs. Stored Procedures

  12. Better date manipulation: Day of the year Week of the year Week of the month Quarter Unit conversion Feet  Meters Gallons  Liters Fahrenheit  Celsius Functional indexes CREATE FUNCTIONquarter(dt date) RETURNS integer WITH (NOT VARIANT) RETURN (YEAR(dt) * 100) + 1 + (MONTH(dt) - 1) / 3; END FUNCTION; Example: EXECUTE FUNCTION quarter('9/2/2008'); (expression) 200803 Using SPL for Extensions See Developer Works Informix zone:http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0510roy/

  13. 1.0 1.1 1.2 1.3 1.2.3 1.2.2 1.2.1 1.2.3.2 1.2.3.3 1.2.3.4 1.2.3.4.5 The Node Type • "Hierarchically" aware type: Node • Pre-processed the hierarchical relationships • ex: Chapter 11, section 7, paragraph 3: 11.7.3 • Add new way to relate objects to one another • IsAncestor(), IsChild(), IsDescendant(), IsParent() • Can change processing from exponential to linear • examples: • policies, product classification, bill-of-material, LDAP, XML, etc.

  14. Node Application Example • Geo Hierarchy: • country > state > metro > city GeoNode Resource Policy Q: What policy apply to the Hyatt in Denver? A: A Colorado Policy

  15. component Parts Bill-of-Material Example CREATE TABLE Components ( ComponentId Node, Sequence integer, Quantity integer, Name varchar(30), PartNumber integer); CREATE TABLE Parts ( PartNumber integer, Name varchar(30), ProviderId integer, Price Money(10,2) ); • A component can be made up of multiple components • A component is made up of multiple parts

  16. Fine-Grained Auditing • Use triggers and user-define routines • Register event processing • Commit or rollback • Send events to file or outside process • Use a generic function for any table See the developerworks article:Event-driven fined-grained auditing with Informix Dynamic Serverhttp://www-106.ibm.com/developerworks/db2/library/techarticle/dm-0410roy/

  17. New Trigger Use CREATE TRIGGER tab1instrig INSERT ON tab1 FOR EACH ROW ( EXECUTE PROCEDURE do_auditing2() ); • The API includes functions to find the context of the trigger.

  18. Commit/Rollback MonitorProgram Callback Statement Register 8 4 7 3 2 1 6 5 Trigger Table EventTable Event-Driven Architecture

  19. loans Retail Financial Services Healthcare Telco Manufacturing Clothing Food Novelties Inheritance and Polymorphism

  20. Inheritance and Polymorphism (cont.) • SELECT branch_id, AVG(risk(loans)) FROM loans GROUP BY 1HAVING AVG(risk(loans)) > 1ORDER BY 2 DESC; • SELECT branch_id, AVGRISK(loans) FROM loans GROUP BY 1HAVING AVGRISK(loans) > 1ORDER BY 2 DESC;

  21. Replacing Store Procedures with UDA • Business problem: Merger multi-polygon types (ESRI) • Original Solution: • SPL stored procedure (82 lines, 3 SELECT statements, 1 insert statement, 2 embedded FOREACH) • New Solution: User-Defined Aggregate • 23 lines of SPL, no SQL, no FOREACH SELECT a.user_id, a.case_id, a.case_type, a.serial_nr_full,a.spt_disp_theme_cd, do_union(a.shp)FROM case_shp aWHERE user_id = "user0001" GROUP BY 1, 2, 3, 4, 5INTO TEMP my_temp_table WITH NO LOG;

  22. Original Code CREATE PROCEDURE union_caseid() DEFINE GLOBAL theUserId CHAR(8) DEFAULT USER; . . . BEGIN SELECT MAX(a.se_row_id) INTO max_serowid FROM case_shp a WHERE user_id = theUserId; FOREACH -- case_id in current townshipSELECT unique case_id INTO p_caseid FROM case_shp WHERE user_id = theUserId LET init = 'true'; LET newSeRowId = max_serowid + 999999; FOREACH -- shape and its display atributesSELECT a.case_type, a.serial_nr_full, a.spt_disp_theme_cd, a.shp, a.se_row_id INTO p_casetype, p_serialnrfull, p_sptdispthemecd, p_shp, p_serowid FROM case_shp a WHERE user_id = theUserId AND case_id = p_caseid IF ( p_shp IS NOT NULL ) THEN IF ( init = 'true' ) THEN LET init = 'false'; LET unionResult = p_shp; ELSE LET unionResult = (union(p_shp::MULTIPOLYGON, unionResult::MULTIPOLYGON)::MULTIPOLYGON); END IF -- init = 'true‘ END IF -- p_shp IS NOT NULL AND END FOREACH -- shape and its display atributes INSERT INTO case_shp VALUES (theUserId, p_caseid, p_casetype, p_serialnrfull, p_sptdispthemecd, unionResult, newSeRowId ); END FOREACH -- case_id in current township END -- BEGIN END PROCEDURE;

  23. User-Defined Aggregate Code CREATE FUNCTION do_union_iter(state lvarchar, arg lvarchar) RETURNING lvarchar DEFINE retval lvarchar; IF (state IS NULL ) THEN RETURN arg ; END IF IF (arg IS NULL) THEN RETURN state ; END IF LET retval = state || arg ; RETURN retval ; END FUNCTION ; CREATE AGGREGATE do_union WITH (ITER=do_union_iter, COMBINE=do_union_iter);

  24. User-Defined Aggregates • A lot more flexible than standard aggregate functions • Can take any type as input • ex.: row type • Can take an initialization parameter of any type • ex: row type • Can return complex types • ex: list

  25. Fabric Classification • Business Problem:Provide an efficient way to select fabrics • Indexing Colors:Cielab coding (3 dimensions) and other attributes(Total of 5 dimensions) • Fabric Type HierarchyRequires a hierarchy-aware typeex: We want a “natural” fabric • Fabric Style and Patternsex: What does “Victorian” mean?

  26. Fabric Classification Queries • Find the fabrics that have “blue” in them • “Blue” is not a specific value but a range of values in 3 dimensions • Find “natural” fabrics that have “blue” in them • What does “natural” mean? • Find “silk” fabrics that has “blue” in them • What does “silk” mean? There could be many types of silk An application wants the answer to the questions, not a list of potential candidates with “false positives”

  27. Problem with Indexing Colors • B-tree index indexes only one dimension • Selects too many rows • Rule of thumb:Index selected if it returns less than 20% of the rows • Result: likely to do a table scan (read all rows) • Solution: Use an R-tree index • Multi-dimensional index • Requires the creation of a 5-dimension type (UDT) • R-tree index • Preferred indexing method for spatial data(spatial, geodetic) • Data types: point, line, polygon, etc.

  28. Fabric Natural Synthetic Cotton . . . Wood . . . Silk . . . . . . Hard Soft Pine Handling the Fabric Type Hierarchy • Fabric types can be seen as a hierarchy • Hierarchies are difficult to deal with in a RDBMS • The hierarchical processing is only a part of the total required processing • We can use the Node type to solve our problem

  29. Handling Fabric Types and Patterns • This was not addressed in the PoC • Create a pattern hierarchy • Define “Victorian” in terms of colors, fabric types, and patterns.

  30. Solution Benefits • Scalability due to indexing • Colors • Hierarchy • Application simplification • No manipulation of extra data • No special algorithms to handle color matching, etc. • Performance • Less data movement • Less data through the network

  31. Other examples • Other multi-dimensional problems: • 3D CAD drawings • Support for Globally Unique Identifier (GUID)http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0401roy/index.html • Soundex/Phonex

  32. Advanced Feature: Named Memory • Usage:Allocate memory of a chosen duration that can be retrieved by name • Durations from function to server • Example:BeCentric uses names memory to keep formatted information in memory for quick sanctioning testing (financial entity validation)

  33. Advanced Feature: Virtual Table/Index Interface • Think of those as similar to device drivers • Purpose functions:am_create, am_drop, am_open, am_close, am_insert, am_delete, am_update, am_stats, am_scancost, am_beginscan, am_getnext, am_rescan, am_endscan, am_getbyid, am_check • Usage:Used to make “something” look like a table or an index • Examples: • Excalibur Text (virtual index interface) • TimeSeries: make a TimeSeries look like a relational table (Virtual Table Interface)

  34. Key to Successful Extensibility Project • Start small, develop your expertise • Remember the first OO projects • Use pre-built extensions • Study examples • Approaches: • Use DataBlade Modules • Use Bladelets • Build your own

  35. What are DataBlade Modules? • Business Components • A set of functionality that solves a specific business problem • Building Blocks • Can include: • User-defined types • User-defined Functions • Tables, views • Client component • A DataBlade can come from IBM, a third party or be built in-house

  36. Large Object Locator (LLD) Binary data type Node Basic Text Search (BTS) MQ Series Spatial XML and XSLT Web feature services Web Geodetic TimeSeries Real-Time Loader C-ISAM Image Foundation Video Foundation Excalibur Text Excalibur Image IDS DataBlade Modules

  37. Spatial is Everywhere • Where are my stores located related to my distributors? • How can I efficiently route my delivery trucks? • How can I micro-market to customers fitting a particular profile near my worst performing store? • How can I set insurance rates near to flood plain? • Where are the parcels in the city that are impacted by a zoning change? • Which bank branches do I keep after the merger based on my customers locations (among other things)?

  38. Complex Spatial Example CREATE TABLE e_Yellow_Pages ( Name VARCHAR(128) NOT NULL, Business Business_Type NOT NULL, Description Document NOT NULL, LocationGeoPoint NOT NULL, Occupied SET( Period NOT NULL ) ); -- -- “Show me available service stations -- specializing in Porsche brakes -- within 30 miles of where I am?” -- SELECT Y.Name FROM e_Yellow_Pages Y WHEREContains( Y.Location, Circle( :GPS_Loc, ‘30 Miles’ ) ) AND Y.Business MATCH “Automotive Service” ANDDocContains(Y.Description, “Porsche AND brakes”) AND NOTBooked ( Y.Occupied, Period (TODAY,TODAY+5)); • Raise the level of abstraction at the database level. • Buy components to solve common problems, and build components to achieve a competitive edge

  39. +90(90° N) R latitude longitude 0 -90(90° W) Geodetic DataBlade • GeoSpatial objects consist of data that is referenced to a precise location on the surface of the earth • Includes altitude and time range dimensions • Applications: • Global, Polar, Trans-Pacific, High Accuracy • Basic computations: Complex, Expensive • Example: • Find all the atmospheric elements that were present in the Denver area during the last thunderstorm

  40. Spatial vs. Geodetic: Distance Issue What is the distance from Anchorage to Tokyo? The shortest path is the shorter of thetwo possible geodesic paths:the thick part of the great circle

  41. Spatial vs. Geodetic: straddling the 180th Meridian split flat-plane representation into 2 or more pieces MULTIPOLYGON( (-180 30,-165 30,-165 40, -180 40,-180 30), (180 30,180 40,165 40, 165 30, 180 30) ) POLYGON( (165 30, -165 30, -165 40, 165 40) )

  42. Flat Map vs. Round Earth

  43. XML Capabilities • Self-describing data • Useful for information exchange • Useful for flexible data definition • IDS provides: • Functions to generate XML (UDA). • Functions to extract part of an XML document. • Functions to test for the existence of an element in an XML document. • XSLT transformation • Extract, exist and validate functions use Xerces-C. • Open source provided by IBM. • <customer> • <row> • <customer_num>101</customer_name> • <fname>Ludwig </fname> • <lname>Pauli </lname> • <company>All Sports Supplies</company> • <address1>213 Earstwild Court</address1> • . . . • </row> • . . . • </customer>

  44. Basic Text Search (BTS) DataBlade • Text search engine built into IDS. • Provides a variety of word and phrase searching on an unstructured document repository. • Search engine is provided by the CLucene text search package (open source). • Examples: SELECT pid FROM t WHERE bts_contains(text_data, 'foo AND bar'); SELECT id FROM product WHERE bts_contains(brands, 'standard', score # real) AND score > 70.0;

  45. Basic Text Search (BTS) DataBlade • Supports Wildcards and Fuzzy Search: • “?” single character wildcard • “*” multiple character wildcard • “~” fuzzy search SELECT cat_advert, score FROM catalog WHERE bts_contains(cat_advert, 'soarness~ and classic', score # real); • Proximity, range searches • Use stop words • Can index XML documents • Index all or selected XML elements • Index all or selected attributes

  46. TimeSeries DataBlade Module • A time series is a set of data as it varies over time • TimeSeries DataBlade optimizes storage usage • 50% Savings not uncommon • Optimized Access Time • 10 times performance improvement typical • Calendars • Defines period of time that data may or may not be collected • SQL, Java, and C interfaces • VTI Interface • Makes a time series look like a regular table

  47. Who’s Interested in TimeSeries • Capital Markets • Arbitrage opportunities, breakout signals, risk/return optimization, portfolio management, VaR calculations, simulations, backtesting... • Telecommunications: • Network monitoring, load prediction, blocked calls (lost revenue) from load, phone usage, fraud detection and analysis... • Manufacturing: • Machinery going out of spec; process sampling and analysis • Logistics: • Location of a fleet (e.g. GPS); route analysis • Scientific research: • Temperature over time... • Other • Live Sports statistics

  48. TimeSeries Performance Example • Financial brokerage firm: • Online stock collection • < 1 sec lag requirement • > 100,000 messages/sec • >5 TB of history kept online • Runs on a 4 CPU Linux box! Extreme Scalability!

  49. MQ DataBlade • Message queues are useful for exchanging information asynchronously • Communication mechanism in an SOA environment • Used as an Enterprise Service Bus • Access WebSphere Message Queue (WMQ): • Function interface. • Table interface. • Transactional • must be used within a transaction • Message is sent only if the transaction commits

  50. IDS Bladelets • Located at:http://www.ibm.com/developerworks/db2/zones/informix/library/samples/db_downloads.htmlandhttp://www.iiug.org/ver1/software/index_ORDBMS.html • Bladelets: • mrLvarchar, Node, regexp, shape, exec, period, etc. • For detailed information, see: "Open-Source Components for Informix Dynamic Server 9.x“Jacques Roy, William W. White, Jean T. Anderson, Paul G. BrownISBN 0-13-042827-2 Includes: Node, Period, ffvti, exec, shape, sqllib/IUtil, regexp, mrLvarchar, JPGImage

More Related