1 / 45

About Tables and Datatypes

5. About Tables and Datatypes. Introduction. This section is probably the most important in terms of performance of an IQ system We discuss Tables Datatypes The next section discusses the other vital part of IQ Indexes. Tables. Actually in IQ tables do not really exist

dstorey
Download Presentation

About Tables and Datatypes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5 About Tables and Datatypes

  2. Introduction • This section is probably the most important in terms of performance of an IQ system • We discuss • Tables • Datatypes • The next section discusses the other vital part of IQ • Indexes

  3. Tables • Actually in IQ tables do not really exist • Tables are implicit in the IQ Catalog Store meta-data • The concept of a table only comes to the fore in SQL, all other times IQ is a simple (hah!) Column Store • However the Create Table does have some interesting “features”

  4. CREATE TABLE • Of all of the create table command the following are of interest: [ GLOBAL | TEMPORARY ] [ { IN | ON } ] [ AT location ] { UNIQUE | PRIMARY KEY | REFERENCES … }

  5. GLOBAL TEMPORARY • In IQ a temporary table can be either TEMPORARY or GLOBAL TEMPORARY • A temporary table only exists for the duration of the transaction that creates it • In a Global Temporary table the schema lasts for ever, only the data is destroyed at transaction commit/rollback • All temporary data lives in IQ TEMP STORE

  6. Temp Tables • You may no longer specify the table owner when creating temp tables • If you specify the owner it will create a permanent (base table) in the IQ Store • Create Table dbo.#my_temp_table (…) Creates a Permanent Table in the IQ Store • Declare Local Temporary Table dbo.my_temp_table(…) Results in a Syntax Error

  7. IN | ON • In IQ you cannot position objects (tables or indexes) • The reason for the IN (or ON) clause is to allow you to create an ASA table (base or temporary) • An ASA table is created using • ON SYSTEM • System being the IQ Catalog Store • This table will obey all the rules of an ASA (not an IQ) table

  8. AT • The AT clause is used to define a proxy table that maps to a table at a remote location • The remote server name must be defined to IQ • This is not a fast way of accessing data • CREATE TABLE fred • AT ‘anotherserver;adatabase;;fred’

  9. Constraints • Check Constraints Added • Includes Check Constraints, Unique and Referential Integrity Constraints • Permits constraint modification without recreating a table • Constraints may be named for reuse UNIQUE, PK, FK, CHECK, IQ UNIQUE • No DEFAULT (expected in IQ 12.7/IQ 15) • New Stored procedures for maintenance • sp_iqprintconstraints • sp_iqdropconstraints

  10. Identity Columns • IDENTITY/DEFAULT AUTOINCREMENT Property • Column may be defined as IDENTITY or DEFAULT AUTOINCREMENT • Must be enabled using IDENTITY_INSERT Option • Only one column per table may be defined with this property • New Global Variable @@identity retains last value inserted • New Database Option to Auto-Index Identity Columns • The option IDENTITY_ENFORCE_UNIQUENESS is 'OFF' by default • If ON creates a Unique HG index on Identity Columns • Alter Table supports modifying/adding column as IDENTITY or DEFAULT AUTOINCREMENT

  11. New LOB Datatypes • Char() data type may be defined to 32K(-1) • Same as Sybase IQ varchar() • If defined > 255 bytes only FP, WD and CMP indexes are permitted • Varchar() and char() may now be the same • Certainly they behave identically, except that varchar() is one byte longer (per row) • "Select Into" a Permanent Table now permitted (select into temporary table support since 12.4.3)

  12. DDL Locks • Concurrent DDL Lock Reduced to Table Level • This was a Database Lock in previous versions • You may perform multiple DDL operations in a database as long as they operate on different tables • "Begin Parallel IQ" to create multiple indexes on one table remains available • (Multi-column) Key length increased from 1024 bytes to 1530 bytes (can still only be composed of 255 columns)

  13. Primary Key/Foreign Key • IQ-M does not enforce Primary Key/Foreign Key relationships – but it will in 12.5 (see following slides) • The optimiser does use the PK/FK relationship for query planning • Only specify this relationship if the relationship does exist • Incorrect specification can result in query plan errors (performance degradation) and possibly errors • ASA does modify a join that is defined as PK/FK to an ANSI NATURAL join – this can cause problems with orphan rows

  14. Key Specification • In a Data Warehouse the production key is not, generally, used as the warehouse key • It is more acceptable practice to use a generated key • Make this key an Unsigned INT or BIGINT • This is the absolutely most efficient key datatype in IQ-M

  15. Primary Keys • In IQ-M a Primary Key is an ANSI standard Primary Key • It is UNIQUE • It must not be null • If specified as a table or column constraint then a specialised form of the HG index is created

  16. Foreign Keys • Always generate an HG index on a Foreign Key • If the relationship is 1:1 then generate the Foreign Key column as a UNIQUE • This will force auto generation of a unique HG index • Again try to specify join columns as Unsigned INT or BIGINT

  17. Referential Integrity – 12.5 (1) • 12.5 supports Primary Key/Foreign Key referential Integrity on loads. • The overhead on loads is minimal. The maximum reduction in load performance that has been seen is under 8% of the total load time. • For RI to work there must be a HG index on both the Primary and Foreign keys – and both the Primary and Foreign keys must be defined at the table level. • This is the requirement (as above) for the Non-Unique Multi-Column Index.

  18. Referential Integrity – 12.5 (2) • The RI checking is accomplished after the sort phase for the foreign key index. • At this point the keys are all in sorted sequence, so we read the Primary Key (PK) HG index (or rather we read the Leaf Nodes of the PK HG index – which is a Unique Index – hence has no G-Array), and we walk the PK index Leaf Nodes. • Because all the data is sorted we only have to walk the Leaf Nodes once for the entire load. • Hence the low overhead for Referential Integrity.

  19. A digression on Datatypes • There are some very important issues concerning datatypes • We have discussed the actions of the indexes – there are areas where an index can be forced to run slowly if the datatype is specified wrongly • Always consider the requirements for the datatype • In correct datatype specification is as bad as incorrect index selection

  20. Signed vs. Unsigned - 1 • If you don’t need signed data in an int or bigint – use UNSIGNED • This will speed up the accessing of the HNG index sometimes doubling the performance • HNG stores negative data as 1s complement • This means SUM() AVG() etc. run quickly • But range checks require another set of scans • If we stored as 2s complement then • Range checks would run with 1 scan • But SUM() AVG() would be slower!!

  21. Signed vs. Unsigned - 2 • Use Unsigned for surrogate keys and join columns • Unsigned data comparisons are quicker (=, !=) • The caveat to this is that Open Client may misinterpret the value if it is too large as it does not understand large unsigned data • Can convert to signed integer, numeric, or decimal if returning data to an Open Client application • This caveat applies to moving data between IQ servers with INSERT FROM LOCATION

  22. Other Datatype Issues • Signed vs. Unsigned does not affect the other indexes to any great degree • But… • The selection of datatypes does • We have already discussed keys but some other areas are worth commenting on…

  23. Long Varchar() - 1 • A long varchar() is defined as a varchar() with a length greater than 255. • If you can avoid this please try to • Only FP and WRD index index is allowed • No enumerated indexes or HNG • We have seen a number of customers who use varchar(1024) as Primary Keys • please DO NOT DO THIS!!

  24. Long Varchar() - 2 • Long varchar() are stored as 256 byte chunks, so using 4 bytes in a varchar(32000) only uses 256 bytes • By default these 256 byte chunks are memset (set to zeros to improve compression) • There is an upgrade option to memset existing 12.4.0 varchar() – this is worth doing, if you have the time!

  25. Char() vs. Varchar() • Always, if you can, use char() • Generally this will improve performance, at the modest cost of storing some small number of extra bytes • Query performance on retrieval of char() vs. Varchar() indicates that there can be a 2-3% performance hit per column, and we have seen 10% degradation on single columns

  26. Float, Real and Double • Unless you really need them – please do not use • FLOAT • REAL • DOUBLE • They can only have Flat FP indexes – no others • The do not store “exact” values – only approximate • Please try to use • NUMERIC • DECIMAL

  27. NUMERIC and DECIMAL • Numeric and Decimal are aliases of each other • Any numeric or decimal with a precision of less than 12 will be stored as an INT (with conversions) • Any numeric or decimal with a precision of between 12 and 18 will be stored as a BIGINT (with conversions)

  28. Join Columns • You must generate the database schema with the table join columns having the same datatype. • INT, UINT and BIGINT are best, but the column datatypes for each join must be the same • Conversion cost is horrendous

  29. Case and Collation Sequences • In terms of RAW performance the fastest IQ database is one where CASE is set to RESPECT and the collation sequence is BINARY (ISO_bineng) • This is probably not suitable for the general application of the database or warehouse server • CASE set to IGNORE is the next fastest, then changes in the collation sequence • The performance hits can be quite high (around 10-20% - we think!)

  30. String Searches • String Searches such as substr(1,3,col_name) are really very slow, they rely on FP searches • With low cardinality (1 and 2 byte FP) data the search is faster, but this can still be a restriction • Create a new column which is the first 3 characters of the col_name column, then search on this • This way there is no function call, so no projection, so the optimiser can use a fast index LF or HG (or if it is a range query an HNG)

  31. Telephone Numbers • A classic example of the above is the telephone number • +1-301-896-1733 • +1 -> Country Code • 301 -> Area Code • 896 -> Sub Area Code • 1733 -> Local Number • Make this 4 columns (actually 5 - the whole number), then searches use fast indexes

  32. Date time • As with telephone numbers, try storing a data time as as series of columns (or a dimension table) • Try creating columns DD MM YY HH MM SS DoWeek DoYear Quarter etc. • This changes in 12.5 with the DATE, TIME and DTTM indexes

  33. Date vs. Datetime • A slightly better solution to the above can be considered in the light of the 1 and 2 byte FP indexes • Try storing the date part of a datetime as a date and the time part as hh mm ss • So: Datetime -> date_col, hh_col, mm_col, ss_col

  34. Loading Dates • There is NO default date or datetime format for loads into IQ • The format must be explicitly set for the load/insert to get the best performance • However some formats are conversion enhanced

  35. DD/MM/YYYY DD.MM.YYYY DD-MM-YYYY HH:NN:SS HHNNSS HH:NN:SS.S HH:NN:SS.SS HH:NN:SS.SSS HH:NN:SS.SSSS HH:NN:SS.SSSSS YYYY-MM-DD HH:NN:SS YYYYMMDD HHNNSS YYYY-MM-DD HH:NN:SS YYYY-MM-DD HH:NN:SS.S YYYY-MM-DD HH:NN:SS.SS YYYY-MM-DD HH:NN:SS.SSS YYYY-MM-DD HH:NN:SS.SSSS YYYY-MM-DD HH:NN:SS.SSSSS Enhanced Conversion formats

  36. Date Load • So it is better to use Col1 DATE(‘YYYY-MM-DD’) • than Col1 ASCII(10) • The performance enhancement can be as much as a 100 fold speed up in loads (for small tables)

  37. UNION • In IQ-M 12.4.3 the UNION clause has very few disadvantages • Generally UNIONs are all processed in parallel • so if you have a low user count they work well • Also the delete question now can be solved • Do not use DISTINCT in the UNION clause, or in the SELECT statement

  38. UNION and Delete • If you are storing a fixed (in time) amount of data e.g.. 6 months • Then every month you delete 1/6th of the data in the table • This is expensive • It is better to split the fact table into 6 x one month tables • At the end of the month you truncate the oldest table • And possibly rename the table sets • Remember for Multiplex table rename is DDL and hence can only be done in simplex mode!

  39. Cartesian Joins • These are expensive – they involve the join of every row in one table to every row in a second table. • Table A 1,000,000 rows • Table B 100,000 rows • Worktable 100,000,000,000 rows • Select * from T, R where T.a = 10 Cartesian • Select * from T, R where T.a between R.b and T.b Cartesian • Select * from T, R where ABS(T.a * R.b) = T.b Cartesian • But • Select * from T, R where ABS(T.a * T.b) = R.b Not Cartesian

  40. Cursors • Avoid using cursors • Generally means row based processing • IQ was designed for set based processing • Sometimes they cannot be avoided • If used, make sure to use NO SCROLL cursors • Open With Hold • Allows the cursor to remain open across transactions • If not used, the cursor may be closed when a commit is issued (depends on connectivity type)

  41. Watcom SQL vs. T-SQL • IQ (ASA) is not 100% T-SQL Compatible, but very close • Recommend using Watcom SQL • All system procedures written with it • Many more code examples and more IQ people versed in it • Watcom SQL has some extensions that T-SQL does not: • Dynamic SQL • Better Loop control • Full cursor movement rather than just read next • Batches and procedures must be written in the same dialect • Cannot mix T-SQL with Watcom SQL

  42. Global variables Variable Names CALL FOR ASA requires variables to be declared immediately after a BEGIN Watcom SQL vs. T-SQL • Behavior differences include: • DECLARE CURSOR • GOTO • IF • PRINT • RAISERROR • SET • WHILE (T-SQL) vs. LOOP

  43. Commit and Rollback • Use transaction control around logical units of work, even read only queries • Should commit before a read/write batch is started to ensure latest version of data is available • Should issue commit and rollback after batch completion to release all query resources • Rollback will free memory resources in use by previous operations • For systems with high number of connected users, freeing memory resources can aid in query performance

  44. Custom Functions • Custom functions can be written in either SQL or Java • Great way to encapsulate business logic for transforming data • Can have a significant performance impact on queries • Functions are executed in the catalog portion of the engine • All result rows may need to be moved to ASA • Can be time consuming for large result sets • Turn on query plans to see what impact the functions have on effective query plans

  45. About things - End

More Related