1 / 9

File Processing : Index and Hash

File Processing : Index and Hash. 2004, Spring Pusan National University Ki-Joune Li. What is index ?. Index in a book Index : Keyword  Pages Without Index Exhaustive search : Too Expensive Index for a file or database A function or mechanism

kalona
Download Presentation

File Processing : Index and Hash

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Processing : Index and Hash Spatiotemporal Database Laboratory Pusan National University 2004, Spring Pusan National University Ki-Joune Li

  2. What is index ? • Index in a book • Index : Keyword  Pages • Without Index • Exhaustive search : Too Expensive • Index for a file or database • A function or mechanism • Index : Predicate  Blocks (block numbers on hard disk) • e.g. find student records where student.GPA > 4.0 Spatiotemporal Database Laboratory Pusan National University

  3. 2nd Phase Search Block Number Databaseon Disk 1st Phase Data Retrieval Time • Data retrieval on disk : Two phases • 1st phase : Search with a condition (Predicate) • 2nd phase : Data access Data Access Time- File Structure- Disk Placement- Clustering, etc.. Spatiotemporal Database Laboratory Pusan National University Search Condition { Block# }

  4. By maximizing blocking factor, we reduce the number of disk accesses Blocking Factor Bf • Blocking Factor • Number of Records in a Block • Blocking Number and Number of Disk Accesses • ND = Nrecord / Bf Spatiotemporal Database Laboratory Pusan National University

  5. How to Accelerate Phase 1 ? • Of course, we could accelerate the phase 1 • by index or by hash • Index vs. Hash • Index : a type of data structures • Needs additional data structures • Hash : a type of mechanism • May not need any additional data structure (not exactly true) Spatiotemporal Database Laboratory Pusan National University

  6. A Simple Idea on Index • Mapping Table from keywords to block numbers • Inverted File • Why inverted file is better than nothing ? • If the table is too large (to fit in main memory) • It have to be stored on disk • Disk Access for Index Access Keyword Block# Juliet Spatiotemporal Database Laboratory Pusan National University Romeo B26 Hamlet B22 … … Carmen B212

  7. 30, b27 14, b17 40, b26 34, b17 55, b26 Searching Algorithms and Index • A good way to accelerate searching • Tree : O( logn ) • Reorganize Inverted File to Tree • Binary Search Tree : Branching Factor = 2 • Tree in memory space vs. in disk space • Memory space : Number of Comparisons • Disk space : Number of Block Accesses Spatiotemporal Database Laboratory Pusan National University

  8. 34 57, b27 103, b28 … 343, b14 Number of delimiters Block number Delimiter 44 1, b29 … 54, b21 32 58, b17 … 96, b127 Paged Tree : m-way search tree • How to determine m ? • One Node : One Disk Page • e.g. When 1 disk page is 4 K bytes • 4+4m+8(m-1) = 4096  m = 341 • Very fat tree Spatiotemporal Database Laboratory Pusan National University

  9. Problem of m-Way search tree • m-way search tree • Search Performance : determined by the height • Not balanced • Average : O(log n) • Worst case : n / Bf  O(n) • Height : determined by insertion order • e.g : insertion by ascending order • How to make it balanced ? • Balanced m-Way search tree : B-tree Spatiotemporal Database Laboratory Pusan National University

More Related