DryadLINQ : A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Negru Raluca , SCPD. Overview. DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing .
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
(1) instantiating a job’s dataflow graph;
(2) scheduling processes on cluster computers;
(3) providing fault tolerance by re-executing failed or slow processes;
(4)monitoring the job and collecting statistics;
(5) transforming the job graph dynamically according to user supplied policies.
separate from Dryad, which manages a batch queue of jobs and executes a few at a time subject to cluster policy
-static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
-dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.