1 / 42

Cosequential Processing

Cosequential Processing. Chapter 8. Cosequential Processing. Coordinated processing of two or more sequential lists Goals To merge lists into a single sorted list (union) Make a single sorted list from many To match records with the same keys (intersection)

kylar
Download Presentation

Cosequential Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cosequential Processing Chapter 8

  2. Cosequential Processing • Coordinated processing of two or more sequential lists • Goals • To merge lists into a single sorted list (union) • Make a single sorted list from many • To match records with the same keys (intersection) • Apply transactions to a master file • Find entries which exist in multiple lists

  3. Cosequential Processing • Keys • Matching/merging may be by a single key or several. • Number of keys only affects compare operator, not sort strategy

  4. Master Transaction File Processing • Common processing strategy on sequential files. • Common since historically sequential processing was the rule (tapes, cards) • Companies stored data in sequential files • Lists of “transactions” posted against these record periodically.

  5. Master Transaction File Processing • Consider a grocery store • Record of inventory for each type of item stored in a large sequential file (master file) • As items sold, a the item number and quantity sold posted (written) as records to a transaction file • As trucks deliver new items, item numbers and quantities are entered into the transaction file. • As new types of items are added to inventory, or old items are discontinued, entries about this are placed in the transaction file.

  6. Master Transaction File Processing • grocery store example: Master File Transaction File Item # Item Name Type Quan 20231 Shoe Shine (br) 6 4 20231 Shoe Shine (bl) 6 1 20177 Cottage Cheese 5 392 20179 Chicken Soup 6 32 20231 T-bone 2 43 .... Item # Trans Quan Item Name 20231 U -2 20231 U 50 20379 U -5 20443 U -4 20445 A 40 Corn Chips 20532 A 300 Butter 20534 D 20558 U 200 .... U - Update A - Add D - Delete

  7. Master Transaction File Processing • Periodically update master from transaction New Master File Transaction File Update Operation Old Master File Update Messages

  8. Master Transaction File Processing • Transactions are applied against master. • New master is created • Invalid Transactions result in Message • Important changes in Messages - audit trail • Transaction and master must be in sorted order.

  9. Master Transaction File Processing • Processing Scheme Read record Mast from old Master and Trans from Transaction While more records in both files if Add and Trans.ID < Mast.ID, write Mast to new master else If Trans.ID = Mast.ID then If UPDATE then update record and write to new master If Delete then continue (no write) else trasaction error else write Mast to new master Read next from transaction, next from old master If more records in old master, write to new master If more records in transaction, give errors

  10. Merging • Merge two (or more) sorted lists into a single sorted list • May remove duplicates (union) or keep Bill Gray Hillery Jenny Linda Mary Randy Bill Cathy Fran Gray Hillery Jenny Kenny Linda Mary Pete Randy Sally Zeke merge Cathy Fran Kenny Pete Sally Zeke

  11. Merging Merge(List1,Max1,List2, Max2,Result) int next1 := 0; next2 := 0; out = 0; while Max1 >= next1 and Max2 >= next2 if (List1[next1] > List2[next2]) Result[out++] := List2[next2++]; else Result[out++] := List1[next1++]; if (List1 < Max1)for (; next1 <= Max1 ; Result[out++] := List1[next1++]); if (List2 < Max2)for (; next1 <= Max2 ; Result[out++] := List2[next1++]);

  12. Sorting • Small files • sort completely in memory • Called internal sorting.

  13. Sorting • Larger files • may be too large to fit in memory simultaneously • require "external sorting" • Sorting using secondary devices

  14. External Sorting • Criteria for evaluating external sorting algorithms • Different from internal sorts • Internal sort comparison criteria • Number of comparisons required • Number of swaps made • Memory needs • External sort comparison criteria • Dominated by I/O time • Minimize transfers between secondary storage and main memory

  15. External Sorting • Two major external sorting methods • in situ - sort the file in place • use additional storage space

  16. External Sorting • Characteristics of in situ sorting • uses less file space, thus larger files may be sorted. • if crash occurs during sort, file may be left in corrupt state • in site sorts may be done on direct-access files using standard internal type sorts. • direct-access required (may not be available) • performance of such algorithm's tends to be data sensitive

  17. External Sorting • Consider a file with 1000 records, 120 bytes each • We have 25,000 bytes available for a buffer. • Solution? • read in 200 records at a time, sort internally • This results in 5 sorted files • merge the resulting sorted files into 1sorted file

  18. Sort/Merge • A common non-in situ method is an algorithm called "sort-merge" • "safe" sorting technique • performance is guaranteed • requires only serial file access

  19. Sort/Merge Sort Sort Merge Partition Sort Sort

  20. Sort/Merge • Sort/Merge techniques have two stages: • sort stage - sorted partitions are generated • Size depends on available memory • merge stage - sorted partitions are merged (repetitively if necessary) • Why might more then one merge phase be needed?

  21. Basic Sort/Merge • initial partition size is 1 • Merge begins immediately (no sort) • Smallest main memory use • requires only 2 buffers in memory. • File starts with N "sorted" files of size 1 • Similar to internal merge/sort

  22. Improving Sort/Merge • Increase buffer size • Partitions sorted (in memory) with little I/O • Larger partitions mean fewer (I/O intensive) merges needed • Take advantage of already sorted runs of data • Consider the "unsortedness" of the data

  23. Sort/Merge • Producing sorted partitions • internal sorting • natural selection - (use already sorted runs) • replacement selection

  24. Internal sorting • read M records (M determined by available memory) • sort them using internal sorting techniques • write back out, creating a partition of size M

  25. Sort/Merge • Replacement selection (snowshovel) • files usually not totally out of order • take advantage of partial ordering in file • partition size varies with already existing ordering

  26. Replacement selection (snowshovel) • Start with primary buffer of size N (snowshovel) 1. Read in N records into buffer 2. Output record with smallest key 3. Replace with next record in file 4. if this new record is smaller then the last record written, "freeze" (must wait for next partition) 5. if unfrozen records remain, go to 2 6. If all records frozen, unfreeze them all, start new partition, go to 2

  27. Replacement selection (snowshovel) • if file is sorted or almost sorted, one pass may suffice for complete sort! • average partition length is 2N • Consider file with, N = 4: • 29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 [EOF]

  28. Natural Selection • Frozen records in the replacement scheme take up space and search time. • Natural, rather than freezing, writes these unused records to a fixed length secondary file (called reservoir) • partition creation terminates when reservoir full. • Next, buffer is refilled first with records from buffer, than records from file (if more needed) • expected partition length is 2.718N if reservoir and buffer same size - (about 30)

  29. Natural Selection • Redo example with reservoir size 4 • 29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 [EOF]

  30. Distribution and Merging • Merging • required to bring the sorted partitions together into a sorted whole • may require a series of merge “phases”, where shorter partitions are merged into larger partitions • More then one partitions per file • Not all partitions can be openned at once

  31. MergingSingle phase

  32. MergingMultiple phase

  33. P1 P3 P5 P7 P9 P11 P2 P4 P6 P8 P10 P12 P1-2 P5-6 P9-10 P3-4 P7-8 P11-12 P1-4 P9-12 MergingMultiple Partitions / File P5-8 P1-8 P9-12 P1-12

  34. Merging • Major issues - minimizing overall I/O • Different length partitions • Spend time simply reading and writing from one file • Left over partitions • Spend time simply copying partitions

  35. Distribution and Merging • Distribution • In order to merge, partitions must be “distributed” to files in a manner facilitaing the merge process. • If 1 partition per file, distribution is trivial • If >1 partition per file, distribution should minimize I/O • Several partitions may be placed in each file

  36. Balanced N-way merge • use as many files (or tapes) as the system can open at once • Distribute the partitions evenly amoung F/2 files • repetitively merge back and forth between one set of F/2 files and the other • Distribute the generated partitions evenly amoung the F/2 output files

  37. P1 P3 P5 P7 P9 P11 P2 P4 P6 P8 P10 P12 P1-2 P5-6 P9-10 P3-4 P7-8 P11-12 P1-4 P9-12 Balanced 2-way merge File 1 File 2 File 3 File 4 P5-8 File 1 File 2 P1-8 P9-12 File 3 File 4 P1-12 File 1

  38. Balanced 2-way merge • Example: 4 files, 700 records, 100 primary records can be sorted in memory 1-100 201-300 401-500 601-700 1-200 401-600 1-400 1-700 1-700 101-200 301-400 501-600 201-400 601-700 401-700

  39. Balanced N-way merge • advantage • simple • disadvantage • wastes time if partition size different • spend time reading and write records without actually merging

  40. Polyphase merging • Strategically distribute the partitions onto F files based on the Fibonacci Sequence • Algorithm • During each phase merge the F smallest files until the end of one file is reached. • After each phase at least one partition will now be empty - this file becomes available new place to merge into • Continue to merge until only one file exists

  41. Polyphase merging • Consider: Initially generate three files: • 24 partitions, 20 partitions , and 13 partitions

  42. Polyphase merging • advantages • No overhead from merging partitions of different sizes • disadvantages • complex management of files • must know partition sizes • still not completely optional - partition sizes not always maximal.

More Related