1 / 19

Lecture 11: Indexed Files

CSC 213 – Large Scale Programming. Lecture 11: Indexed Files. Dictionaries in Real World. Often need large database on many machines Split search terms across machines Updating & searching work split between machines Database way too large for any single machine

aren
Download Presentation

Lecture 11: Indexed Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 213 – Large Scale Programming Lecture 11:Indexed Files

  2. Dictionaries in Real World • Often need large database on many machines • Split search terms across machines • Updating & searching work split between machines • Database way too large for any single machine • If you think about it, this is incredibly common • Where?

  3. Split Dictionaries

  4. Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways

  5. Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways

  6. Index & Data Files • Split information into two (or more) files • Data file uses fixed-size records to store data • Index files contain search terms & data locations • Fixed-size records usually used in data file • Each record will use exactly that much space • Extra space wasted if the value is smaller • But limits data size, cannot get more space • Makes it far easier to reuse space & rebuild index

  7. Index File Format • No standard format – depends on type of data • Often variable sized, but this not specific requirement • Each entry in index file begins with exact search term • Followed by position containing matching data • As a result, often find indexes smushed together • Can read indexes at start of program execution • Reasonably assumes index file smaller than data file • Changes written immediately, however • When program starts, do NOT read data file

  8. Never Read Data File

  9. Indexed Files • Enables splitting search terms across computers • Alphabetical split searches faster on many servers U-X Y-Z A - C S-T D-E Q-R F-H I-P

  10. Indexed Files • Enables splitting search terms across computers • Create indexes for different types of searching Song name Song Length

  11. How Does This Work? • Using index files simplified using positions • Look in index structure to find position of data in file • With this position can then seek to specific record • Create instance & initialize by reading data from file

  12. Starting with Indexed Files IBM 106 IBM AT & T 23 T Ford 2 F

  13. How Does This Work? • Adding new records takes only a few steps • Add space for record with setLength on data file • Update index structure(s) to include new record • Records in data file updated at each change

  14. Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 2 F 0

  15. Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C

  16. How Does This Work? • Removing records even easier • To prevent using record, remove items from indexes • Do NOT update index file(s) until program completes • Use impossible magic numbers for record in data file

  17. Removing Data As We Go IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C

  18. Removing Data As We Go IBM 106 IBM AT & T 23 T Ford 0 Ø Citibank -2 C

  19. For Next Lecture • Weekly assignment still available online • Continues to be due Wednesday at 5PM • Ask me questions, if you have trouble on a problem • Reading Section 9.1 in textbook about Map ADT • How do we look up data? • What other ADTs are out there? • How could they relate to today's lecture?

More Related