Native Multidimensional Indexing in Relational Databases
190 likes | 362 Views
Native Multidimensional Indexing in Relational Databases. David Hoksza , Tom áš Skopal Charles University in Prague Department of Software Engineering Czech Republic. Presentation Outline. Multidimensional querying attribute number growth indexing methods contemporary DB systems
Native Multidimensional Indexing in Relational Databases
E N D
Presentation Transcript
Native Multidimensional Indexingin Relational Databases David Hoksza, Tomáš Skopal Charles University in PragueDepartment of Software Engineering Czech Republic
Presentation Outline • Multidimensional querying • attribute number growth • indexing methods • contemporary DB systems • Indexing in PostgreSQL • user-defined access methods • external indexing • framework • Experiments COMAD 2008
Indexing • Single-attribute based indexing • to avoid sequential scan • B-tree • Multi-attribute based indexing • window query • straightforward solution – multiple B-trees • SELECT * FROM Products • WHERE • BrandID BETWEEN (3 AND 11) • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) COMAD 2008
Multi-attribute Based Indexing • Multiple B-trees • attribute number growth • → exponential growth of partial result-sets • → sequential scan • dimensionality curse COMAD 2008
B+-tree with Compound Keys most often employed solution multiple keys – single chained value key components compared in lexicograhpical order assymetry in the order of the keys UB-tree transformation of n-dim points into 1-dim Z-address→ Z-curve Z-curves divided into Z-regions indexed by a B+-tree Multi-dimensional access methods R-tree COMAD 2008
Native Multi-dimensional Indexing • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) • Table rows • points in n-dimensional space • R-tree • Queries • cubes in n-dimensional space (n-dimensionalwindows) COMAD 2008
Contemporary DB systems COMAD 2008
PostgreSQL • Object-relational DBMS • Open-source • Since 2005 (v. 8.0) runs on Windows • Emphasis on extensibility • data types • operators • procedural languages • access methods • … COMAD 2008
Relation Types in PostgreSQL • Heap relations (HRs) • user relations • system catalog • undefined order • Index relations (IRs) • <key,value> pairs • external • fast access to heap relations • internal • access methods’ structures COMAD 2008
User-defined Access Methods (AM) in PostgreSQL • Implement a set of functions communicating with PostgreSQL’s core (AM implementation). • Register the functions (located in libraries). • Define an AM (index type) by connecting the functions with a newly created AM. • Establish a class of operators and types for the AM. • Use the index. COMAD 2008
required functions index_build creating a structure index_insert inserts a record index_beginscan starts a new scan index_gettuple gets a record fulfilling search conditions index_getmulti gets a set of records index_endscan finishes a search index_markpos marks actual position in a scan index_restrpos returns to a marked position index_rescan repeats scan with the same structure of search keys index_bulkdelete removes a set of records index_costestimate estimates cost of a search User-defined Access Methods in PostgreSQL – cont. COMAD 2008
External Indexing • AM in PostgreSQL store data in IRs returning IRs’ TIDs to the core • External indexing framework • Storing IRs’ TIDs in external index storage • PostgreSQL’s interface to access methods still requires high level of mastering of PostgreSQL’s inner mechanisms • Framework for external indexing COMAD 2008
Framework interface • void FW_CreateStructure(Relation index_relation); • void*FW_PrepareInsert(Relation index_relation); • void FW_InsertTuple(void *fw_data, Relation index_relation, IndexTuple index_tuple, BlockNumberblock_number,OffsetNumber offset); • void FW_FinishInsert(void *fw_data); • void FW_InitSearch(IndexScanDesc scan, ScanDirectiondir); • bool FW_GetNextTID(IndexScanDesc scan, ScanDirectiondir, BlockNumber *block_number, OffsetNumber *offset); • void FW_DeleteTuple(BlockNumber block number,OffsetNumber offset); COMAD 2008
Experimental evaluation • The testbed • Uniform • clusters of uniformly distributed (up to 15-dimensional) objects • Gauss • (up to 3-dimensional) objects following Gaussian distribution • DBLP • 435,373 DBLP database records • author, type of publication, year of publication, number of pages • Studied costs • index access count • real-time COMAD 2008
Experiments – Database Growth COMAD 2008
Experiments – Query Selectivity COMAD 2008
Experiments – Dimension Growth COMAD 2008
Experiments – Real-time COMAD 2008
Conclusion • We have proposed and implemented • Native multidimensional indexing by R-tree • Indexing framework for PostgreSQL • Implementation of native external R-tree index • Results show • big speed-up on the real-world data according to index-access metric • poor physical implementation of access methods in PostgreSQL in comparison to Oracle and MSSQL Server COMAD 2008