1 / 35

Algorithm Engineering „Parallele Suche“

Algorithm Engineering „Parallele Suche“. Stefan Edelkamp. Übersicht. Motivation PRAM Terminierung Depth-Slicing Hash- based Partitioning & Transposition Table Scheduling Stack Splitting & Parallel Window Search Parallele Suche mit Treaps. Parallel Shared Memory Graph Search.

shaun
Download Presentation

Algorithm Engineering „Parallele Suche“

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithm Engineering„Parallele Suche“ Stefan Edelkamp

  2. Übersicht • Motivation • PRAM • Terminierung • Depth-Slicing • Hash-basedPartitioning & Transposition Table Scheduling • Stack Splitting & Parallel WindowSearch • Parallele Suche mit Treaps

  3. Parallel Shared Memory Graph Search Single-core CPU Multi-core CPU • Parallelization is important for multi-core CPUs • But parallelizing graph-search algorithms such as breadth-first search, Dijkstra’s algorithm, and A* is challenging… • Issues: Load balancing, Locking, …

  4. Parallel Shared Memory Graph Search Single-core CPU Multi-core GPU • Parallelization is even more important for GPUs • But parallelizing graph-search algorithms such as breadth-first search, Dijkstra’s algorithm, and A* is challenging… • Issues: Kernel Function Design, Load balancing, Locking, …

  5. Parallel External Memory Graph Search Single-core CPU+HDD Multi-core C/GPU+HDD • …

  6. Motivation Parallel andExternal Memory Graph SearchSynergies: • Theyneedpartitionedaccesstolarge setsofdata • Thisdataneedstobeprocessedindividually. • Limited informationtransferbetweentwopartitions • Streaming in externalmemoryprogramsrelatestoCommunication Queues in distributedprograms (ascommunicationoftenrealized on files) • Goodexternalimplementationsoftenleadtogoodparallel implementations

  7. Experimente

  8. WeitereExperimente

  9. Parallel Random Access MachineCommon Read/Exclusive Write (CREW PRAM)

  10. Parallele Addition

  11. In Pseudo-Code

  12. Definitionen • Problemgröße • Parallele Rechenzeit • Arbeit • Sequentielle Zeit: • Effizienz: • Speedup: Im Beispiel • Linear Speedup • Effiziente Parallelisierung: • Im Beispiel

  13. Präfixsumme

  14. Terminierung

  15. Depth-Slicing

  16. Im Quelltext

  17. Hash-basedPartitioning

  18. Transposition DrivenScheduling

  19. Im Quelltext

  20. Parallele Tiefensuche (Parallel Branch-AndBound)

  21. Im Quelltext

  22. Load-Balancing via Stack Splitting

  23. Parallel WindowSearch(Iterative-DeepeningSearch)

  24. Treaps: Mischung aus Heaps und Suchbäumen

  25. Einsatz • Using a treap the need for exclusive locks can be alleviated to some extend. • Each operation on the treap manipulates the data structure in the same top-down direction. • Moreover, it can be decomposed into successive elementary operations. Tree partial locking protocol: Every process holds exclusive access to a sliding window of nodes in the tree. It can move this window down a path in the tree, which allows other processes to access different, non-overlapping windows at the same time. • Parallel search using a treap with partial locking has been tested for the FIFTEENPUZZLE on different architectures, with a speedup for 8 processors in between 2 and 5.

  26. Selbstanordnende Bäume mittelsSplay-Operation • Siehe Extra-Folien

  27. Parallel External-Memory Graph Search • Motivation Shared and Distributed Environments • Parallel Delayed Duplicate Detection • Parallel Expansion • Distributed Sorting • Parallel Structured Duplicate Detection • Finding Disjoint Duplicate Detection Scopes • Locking

  28. Distributed Search over the Network • Distributed setting provides more space. • Experiments show that internal time dominates I/O.

  29. Exploiting Independence • Since each state in a Bucket is independent of the other – they can be expanded in parallel. • Duplicates removal can be distributed on different processors. • Bulk (Streamed) transfersmuch better than single ones.

  30. Parallel Breadth-First FrontierSearchEnumerating 15-Puzzle • Hash function partitions both layers into files. • If a layer is done, children files are renamed into parent files. • For parallel processing a work queue contains parent files waiting to be expanded, and child files waiting to be merged

  31. Beware of the Mutual Exclusion Problem!!! Distributed Queue for Parallel Best-First Search P0 <g, h, start byte, size> <15,34, 20, 100> TOP P1 <15,34, 0, 100> <15,34, 40, 100> P2 <15,34, 60, 100>

  32. Distributed Delayed Duplicate Detection Single Files • Each state can appear several times in a bucket. • A bucket has to be searched completely for the duplicates. GOAL Sorted buffers P0 P1 P2 P3 Problem: Concurrent Writes !!!!

  33. h0 ….. hk-1 hk ….. hl-1 Multiple Processors - Multiple Disks Variant P1 P3 P4 P2 Sorted buffers w.r.t the hash val Sorted Files Divide w.r.t the hash ranges Sorted buffers from every processor Sorted File

More Related