Literature Review: Parallel Computing for the Semantic Web

1. Literature Review:Parallel Computing for the Semantic Web Jesse Weaver Tetherless World Constellation Rensselaer Polytechnic Institute

2. Outline Overview Area of Interest Related/Adjacent Fields Reviewed Fields Literature Review Synthesis

6. [1,11,15,18] � Very applicable parallelization approaches for semantic web. [10,14,16] � Query for semantic web, but only a little parallelization. [7] � Reasoning on distributed data, but only a little parallelization. [4,17] � Loosely linked to semantic web, but applicable. [2,3,5,9][12] � Potentially applicable parallelization approaches from AI. [6,19] � Potentially applicable parallelization approaches from distributed data. [8,13,20] � Potentially applicable parallelization approaches from parallel computing.[1,11,15,18] � Very applicable parallelization approaches for semantic web. [10,14,16] � Query for semantic web, but only a little parallelization. [7] � Reasoning on distributed data, but only a little parallelization. [4,17] � Loosely linked to semantic web, but applicable. [2,3,5,9][12] � Potentially applicable parallelization approaches from AI. [6,19] � Potentially applicable parallelization approaches from distributed data. [8,13,20] � Potentially applicable parallelization approaches from parallel computing.

7. Outline Overview Literature Review Existing Parallelization Efforts MaRVIN [1,15] Parallel OWL Inferencing [18] Approaches Using some Parallelization Proposed Parallelization Approaches Suggestions from AI Field Suggestions from Distributed Data Field Suggestions from Parallel Computing Field Synthesis 1:541:54

8. MaRVIN [1,15] Provides sound, anytime, and eventually complete reasoning (with respect to reasoners used) using divide-conquer-swap strategy. Every process uses a reasoner and processes a fraction of the data to produce inferences. Each triple receives a score based upon how much information has been inferred due to the triple. This (evolving) score determines how many times the triple is included in reasoning, which nodes to which to route the triple, and when to consider the triple sufficiently used. Built on top of Ibis grid technology. Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands [1] George Anadiotis, Spyros Kotoulas, Eyal Oren, Ronny Siebes, Frank van Harmelen, Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, and Henri E. Bal. Marvin: a distributed platform for massive rdf inference. http://www.larkc.eu/marvin/btc2008.pdf, 2008. [15] Eyal Oren, Spyros Kotoulas, George Anadiotis, Ronald Siebes, Annette ten Teije, and Frank van Harmelen. Marvin: A platform for large-scale analysis of semantic web data. In Proceeding of the WebSci�09: Society On-Line, March 2009.Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands [1] George Anadiotis, Spyros Kotoulas, Eyal Oren, Ronny Siebes, Frank van Harmelen, Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, and Henri E. Bal. Marvin: a distributed platform for massive rdf inference. http://www.larkc.eu/marvin/btc2008.pdf, 2008. [15] Eyal Oren, Spyros Kotoulas, George Anadiotis, Ronald Siebes, Annette ten Teije, and Frank van Harmelen. Marvin: A platform for large-scale analysis of semantic web data. In Proceeding of the WebSci�09: Society On-Line, March 2009.

9. MaRVIN [1,15] Every node reads in a partition of the data, performs reasoning on it, and based on results, keeps some triples, forwards some triples, and removes some triples.Every node reads in a partition of the data, performs reasoning on it, and based on results, keeps some triples, forwards some triples, and removes some triples.

10. MaRVIN [1,15] Run on Distributed ASCI Supercomputer 3 (DAS-3) --a five-cluster grid system --271 machines total, 4GB RAM each --791 cores at 2.4 GHz Sesame in-memory store with a forward-chaining RDFS reasoner Random routing --Random routing: better load balancing, less efficient inferencing (because less possible joins on a node) --Hash routing: more efficient inferencing, poor load balancing SwetoDBLP dataset --14.9M triples, 145 distinct predicates, 11 distinct classes Not a very complex ontology if only 11 distinct classes. Would be interested to see results for OWL datasets using OWL Horst semantics. Why only up to 64 nodes when they have 271? Does communication cost become too high? Does data get spread too thin? Refer to self-deemed knees (�Aha� moments).Run on Distributed ASCI Supercomputer 3 (DAS-3) --a five-cluster grid system --271 machines total, 4GB RAM each --791 cores at 2.4 GHz Sesame in-memory store with a forward-chaining RDFS reasoner Random routing --Random routing: better load balancing, less efficient inferencing (because less possible joins on a node) --Hash routing: more efficient inferencing, poor load balancing SwetoDBLP dataset --14.9M triples, 145 distinct predicates, 11 distinct classes Not a very complex ontology if only 11 distinct classes. Would be interested to see results for OWL datasets using OWL Horst semantics. Why only up to 64 nodes when they have 271? Does communication cost become too high? Does data get spread too thin? Refer to self-deemed knees (�Aha� moments).

11. Parallel OWL Inferencing [18] Two approaches to inferencing with �OWL Horst� semantics. Data partitioning: Every process gets a fraction of the data and all the rules. Rule partitioning: Every process get all the data and a fraction of the rules. In both cases, ontologies must be compiled into rules, and dependencies must be determined to perform partitioning. Then, each process gets its partition and applies its rules to its data, routing inferences to appropriate processes. University of Southern California [18] Ramakrishna Soma and V. K. Prasanna. Parallel inferencing for owl knowledge bases. In ICPP �08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 75�82, Washington DC, USA, 2008. IEEE Computer Society. University of Southern California [18] Ramakrishna Soma and V. K. Prasanna. Parallel inferencing for owl knowledge bases. In ICPP �08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 75�82, Washington DC, USA, 2008. IEEE Computer Society.

12. Parallel OWL Inferencing [18] Data Partitioning Policies Data Graph Partitioning Hash-based Partitioning Domain-specific Partitioning Rule Partitioning Policies Rule-dependency Graph Partitioning Data Graph Partitioning � For n processes, partition the RDF graph into n partitions such that the minimum number of edges are cut. Rule-dependency Graph Partitioning � In a graph G, each node is a rule, and an edge shows that the head of a rule appears in the clause of another rule.Data Graph Partitioning � For n processes, partition the RDF graph into n partitions such that the minimum number of edges are cut. Rule-dependency Graph Partitioning � In a graph G, each node is a rule, and an edge shows that the head of a rule appears in the clause of another rule.

13. Parallel OWL Inferencing [18] Why does UOBM scale worse than LUBM? Aren�t they very similar? MDC is an unknown dataset created by the authors. Implemented using shared files! Synchronization overhead when accessing files? How does this affect results? What about �shared-nothing�?Why does UOBM scale worse than LUBM? Aren�t they very similar? MDC is an unknown dataset created by the authors. Implemented using shared files! Synchronization overhead when accessing files? How does this affect results? What about �shared-nothing�?

14. Parallel OWL Inferencing [18] MDC is an unknown dataset created by the authors. Implemented using shared memory (rather than file) because of high communication cost! Synchronization overhead when accessing shared memory? How does this affect results? What about distributed memory? MDC is an unknown dataset created by the authors. Implemented using shared memory (rather than file) because of high communication cost! Synchronization overhead when accessing shared memory? How does this affect results? What about distributed memory?

15. Parallel OWL Inferencing [18] It�s curious that domain-specific partitioning performs almost as well as data graph partitioning. Could this be because of the structured nature of LUBM data? What about cost of sequential partitioning algorithm compared to inferencing speedup?It�s curious that domain-specific partitioning performs almost as well as data graph partitioning. Could this be because of the structured nature of LUBM data? What about cost of sequential partitioning algorithm compared to inferencing speedup?

16. Outline Overview Literature Review Existing Parallelization Efforts Approaches Using some Parallelization RDFS on DHTs [11] DORS [7] Clustered TDB [16] YARS2 [10] Proposed Parallelization Approaches Suggestions from AI Field Suggestions from Distributed Data Field Suggestions from Parallel Computing Field Synthesis 9:00 Results and evaluation not discussed as much in this section because approaches use fine-grained parallelism.9:00 Results and evaluation not discussed as much in this section because approaches use fine-grained parallelism.

17. RDFS on DHTs [11] Forward-chaining and backward-chaining to perform RDFS inferencing and querying on DHTs. Each triple is stored in three places based on hashing of subject, predicate, and object. Dept. of Informatics and Telecommunications National and Kapodistrian University of Athens, Greece [11] Zoi Kaoudi, Iris Miliaraki, and Manolis Koubarakis. Rdfs reasoning and query answering on top of dhts. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 499�516. Springer Berlin / Heidel- berg, 2008. Dept. of Informatics and Telecommunications National and Kapodistrian University of Athens, Greece [11] Zoi Kaoudi, Iris Miliaraki, and Manolis Koubarakis. Rdfs reasoning and query answering on top of dhts. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 499�516. Springer Berlin / Heidel- berg, 2008.

18. RDFS on DHTs [11] Forward chaining: essentially materialization at load time. ONLY HAVE TO LOOK IN DB id BECAUSE OF COMMON JOIN VARIABLE. Boxes highlight opportunities for parallelism. The box emphasizes that forward chaining on triple spawns three further�potentially parallel�forward chaining operations.Forward chaining: essentially materialization at load time. ONLY HAVE TO LOOK IN DB id BECAUSE OF COMMON JOIN VARIABLE. Boxes highlight opportunities for parallelism. The box emphasizes that forward chaining on triple spawns three further�potentially parallel�forward chaining operations.

19. RDFS on DHTs [11] Backward chaining: essentially deriving inferences at query time. Get all assertions that match tp, but then also perform special queries (with �adorned predicates�) to derive inferences. Box indicates where authors claim potential parallelism. Essentially, it�s saying that when a rule has two (rule) predicates in the body, one must be selected to evaluate first. One of them will always be able to evaluate locally because of the shared variable Z between the head and body predicate. This is the one to be evaluated first, locally. The box illustrates where the authors claim parallelism. Essentially, the sendto�s can happen in parallel and asynchronously.Backward chaining: essentially deriving inferences at query time. Get all assertions that match tp, but then also perform special queries (with �adorned predicates�) to derive inferences. Box indicates where authors claim potential parallelism. Essentially, it�s saying that when a rule has two (rule) predicates in the body, one must be selected to evaluate first. One of them will always be able to evaluate locally because of the shared variable Z between the head and body predicate. This is the one to be evaluated first, locally. The box illustrates where the authors claim parallelism. Essentially, the sendto�s can happen in parallel and asynchronously.

20. DORS [7] Seems to use a forward-chaining approach. Each node in the DHT uses a TBox reasoner and an ABox reasoner. When data is added, the TBox reasoner materializes all TBox triples and the ABox reasoner reasons from there. Description of parallelism is brief. Referred to as �task parallelism� which leads me to believe it is probably similar to the forward-chaining approach in [11]. Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China (ISWC/ASWC 2007) [7] Qiming Fang, Ying Zhao, Guangwen Yang, and Weimin Zheng. Scalable distributed ontology reasoning using dht-based partitioning. In The Semantic Web, volume 5367/2008 of Lecture Notes in Computer Science, pages 91�105. Springer Berlin / Heidelberg, 2008. Similar to [11] using COTS. Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China (ISWC/ASWC 2007) [7] Qiming Fang, Ying Zhao, Guangwen Yang, and Weimin Zheng. Scalable distributed ontology reasoning using dht-based partitioning. In The Semantic Web, volume 5367/2008 of Lecture Notes in Computer Science, pages 91�105. Springer Berlin / Heidelberg, 2008. Similar to [11] using COTS.

21. Clustered TDB [16] Clustered triple store based on Jena TDB. Three forms of parallelism in querying: Inter-query: The ability to run more than one query simultaneously. Intra-query: The ability to run different subqueries in parallel and pipeline operators. Intra-operation: Distributing single operations over more than one node for concurrent execution. Parallel Operations Merge: gathering together the results of independent parallel operations. Split: splitting the output stream of a relational operator to all parallel computation on the results. IAM Group, Electronics and Computer Science, University of Southampton, UK HP Labs Bristol, Stoke Gifford, Bristol, UK (Andy only) (Unpublished) [16] Alisdair Owens, Andy Seaborne, Nick Gibbins, and mc schraefel. Clustered tdb: A clustered triple store for jena. http://eprints.ecs.soton.ac.uk/16974/1/www2009?xedref.pdf, 2008. In paper�s evaluation, the prototype showed scaling loading time but very inconsistent and usually limited scaling in query time.IAM Group, Electronics and Computer Science, University of Southampton, UK HP Labs Bristol, Stoke Gifford, Bristol, UK (Andy only) (Unpublished) [16] Alisdair Owens, Andy Seaborne, Nick Gibbins, and mc schraefel. Clustered tdb: A clustered triple store for jena. http://eprints.ecs.soton.ac.uk/16974/1/www2009?xedref.pdf, 2008. In paper�s evaluation, the prototype showed scaling loading time but very inconsistent and usually limited scaling in query time.

22. YARS2 [10] Peer-to-peer triple storage and query. Claim parallel index creation, but no detail given. Parallel queries performed, but details uncertain. At one point, considered as (mostly) a directed lookup. Later discussed as multithreaded querying of all index managers. National University of Ireland, Galway Digital Enterprise Research Institute (ISWC/ASWC 2007) [10] Andreas Harth, Jurgen Umbrich, Aidan Hogan, and Stefan Decker. Yars2: A federated repository for querying graph structured data from the web. In The Semantic Web, volume 4825/2008 of Lecture Notes in Computer Science, pages 211�224. Springer Berlin / Heidelberg, November 2007. National University of Ireland, Galway Digital Enterprise Research Institute (ISWC/ASWC 2007) [10] Andreas Harth, Jurgen Umbrich, Aidan Hogan, and Stefan Decker. Yars2: A federated repository for querying graph structured data from the web. In The Semantic Web, volume 4825/2008 of Lecture Notes in Computer Science, pages 211�224. Springer Berlin / Heidelberg, November 2007.

23. YARS2 [10]

24. YARS2 [10] M is querying thread. S floods first quad pattern to all index managers via threads S1, S2, �, Sn. J essentially does the same for the second quad pattern using bindings from the first quad pattern.M is querying thread. S floods first quad pattern to all index managers via threads S1, S2, �, Sn. J essentially does the same for the second quad pattern using bindings from the first quad pattern.

25. Outline Overview Literature Review Existing Parallelization Efforts Approaches Using some Parallelization Proposed Parallelization Approaches Evolutionary Query Answering [14] Approximate Reasoning [17] Parallel Techniques for Ontology Reasoning [4] Suggestions from AI Field Suggestions from Distributed Data Field Suggestions from Parallel Computing Field Synthesis 16:0616:06

26. Evolutionary Query Answering [14] Approximate anytime querying. Graphs are stored as bloom filters; the soundness of the lookups increases with the size of the bloom filter. Start with a bunch of candidate solutions and determine fitness. Based on fitness, select parents, breed, and add mutation. Implementation does not use parallelism (as it is a proof of concept), but parallelism is straightforward in that evolution of candidate solutions can occur in parallel. Vrije Universiteit Amsterdam, the Netherlands [14] Eyal Oren, Christophe Gueret, and Stefan Schlobach. Anytime query answering in rdf through evolutionary algorithms. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 98�113. Springer Berlin / Heidelberg, 2008. Last bullet, some communication scheme may be needed to ensure appropriate breeding.Vrije Universiteit Amsterdam, the Netherlands [14] Eyal Oren, Christophe Gueret, and Stefan Schlobach. Anytime query answering in rdf through evolutionary algorithms. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 98�113. Springer Berlin / Heidelberg, 2008. Last bullet, some communication scheme may be needed to ensure appropriate breeding.

27. Evolutionary Query Answering [14]

28. Evolutionary Query Answering [14] Fitness really seems to level off at 70 for (b). Does this indicate an inherent limitation to this approach? Maybe bounded by soundness of fixed-size bloom filters?Fitness really seems to level off at 70 for (b). Does this indicate an inherent limitation to this approach? Maybe bounded by soundness of fixed-size bloom filters?

29. Approximate Reasoning [17] Proposes a mathematical framework for Approximating Reasoning, providing inferences that are not necessarily sound or complete. An algorithm is an anytime algorithm if it provides sound and complete results as time approaches infinity. If the results get better and better as time continues, then the algorithm is monotonic anytime. AIFB, University of Karlsruhe, Germany FZI Karlsruhe, Germany (Tuvshintur only) [17] Sebastian Rudolph, Tuvshintur Tserendorj, and Pascal Hitzler. What is approximate reasoning? In Web Reasoning and Rule Systems, volume 5341/2008 of Lecture Notes in Computer Science, pages 150�164. Springer Berlin / Heidelberg, 2008. AIFB, University of Karlsruhe, Germany FZI Karlsruhe, Germany (Tuvshintur only) [17] Sebastian Rudolph, Tuvshintur Tserendorj, and Pascal Hitzler. What is approximate reasoning? In Web Reasoning and Rule Systems, volume 5341/2008 of Lecture Notes in Computer Science, pages 150�164. Springer Berlin / Heidelberg, 2008.

30. Approximate Reasoning [17] Different algorithms providing different anytime behaviors can be combined (run in parallel) to provide results that are at least as sound and complete as the best algorithm provided. An evaluation performed using KAON2, Screech-all, and Screech-none. Defect in this case is F1-measure. KAON2 � Sound and complete Screech-all � Complete Screech-none � Sound F1-measure is a measure of how complete and sound solution is, but gives no indication of completeness alone or soundness alone.KAON2 � Sound and complete Screech-all � Complete Screech-none � Sound F1-measure is a measure of how complete and sound solution is, but gives no indication of completeness alone or soundness alone.

31. Approximate Reasoning [17] 0-5 sec, use Screech-none (sound) 5-6 sec, use Screech-all (complete) 6-inf sec, use KAON2 (sound and complete) Interesting behavior: 4.5-5 sec, almost all of the right answers 5-6 sec, all the right answers and a few wrong ones 6-inf sec, all of the right answers0-5 sec, use Screech-none (sound) 5-6 sec, use Screech-all (complete) 6-inf sec, use KAON2 (sound and complete) Interesting behavior: 4.5-5 sec, almost all of the right answers 5-6 sec, all the right answers and a few wrong ones 6-inf sec, all of the right answers

32. Parallel Techniques forOntology Reasoning [4] �An observation is, that available state-of-the-art reasoners do not exploit the benefits of parallel computation techniques, as these are not straightforwardly applied for reasoning calculi.� �In fact, it remains an open question whether algorithms currently used in reasoning adapt well to the paradigm shift in computer architecture [toward parallel computing]. In particular, it is unclear whether well established tableau algorithms, as widely used in state-of-the-art reasoners, can be parallelised.� FZI Research Center for Information Technologies, Karlsruhe, Germany [4] Jurgen Bock. Parallel computation techniques for ontology reasoning. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 901�906. Springer Berlin / Heidelberg, 2008. FZI Research Center for Information Technologies, Karlsruhe, Germany [4] Jurgen Bock. Parallel computation techniques for ontology reasoning. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 901�906. Springer Berlin / Heidelberg, 2008.

33. Parallel Techniques forOntology Reasoning [4] Hypothesis 1: Independent Ontology Modules Generally, ontologies are interrelated. Challenge to identify parts that can actually be reasoned on independently. Hypothesis 2: A Parallel Reasoning Algorithm Parallel tableau algorithms exploit on parallel processing of nondeterministic branches. Extensively studied parallelization of logic programming is unsatisfactory for TBox reasoning. Hypothesis 1: some existing work, but not mentioned here because (1) not that parallel, (2) limited presentation time. First bullet under hypothesis 2: results in uneven workload and limits scaling.Hypothesis 1: some existing work, but not mentioned here because (1) not that parallel, (2) limited presentation time. First bullet under hypothesis 2: results in uneven workload and limits scaling.

34. Outline Overview Literature Review Existing Parallelization Efforts Approaches Using some Parallelization Proposed Parallelization Approaches Suggestions from AI Field Parallel Tableaux [12] Taxonomy of Parallel Strategies for Deduction [5] Parallel Prolog [9] Parallel Non-monotonic Reasoning [3] Lana-Match [2] Suggestions from Distributed Data Field Suggestions from Parallel Computing Field Synthesis 23:1223:12

35. Nondeterministic Search Tree A search plan (certificate, witness) determines the choices to make.A search plan (certificate, witness) determines the choices to make.

36. Parallel Tableaux [12] Exploits non-determinism in tableaux. This work concerned with two rules: Disjunction rule (a : C ? D) Number Restriction Merge rule (a : = n r) A fixed number of threads are spawned, and each thread draws an ABox from the pool of ABoxes. (Originally, there is just one ABox.) After processing the ABox, the resulting ABox(es) is/are placed in the pool of ABoxes. Inst. Of AI, University of Ulm, Ulm, Germany [12] Thorsten Liebig and Felix Muller. Parallelizing tableaux-based description logic reasoning. In On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, volume 4806/2007 of Lecture Notes in Computer Science, pages 1135�1144. Springer Berlin / Heidelberg, 2007. Disjunction: (a : C UNION D) in A ? (a : C) in A or (a : D) in A Number Restriction Merge: (a : le n r) in A and a has gt n r-successors, then one of (every combiniation making exactly n r-successors) is in A. Evaluation was on very simple datasets. Limited opportunity for parallelism.Inst. Of AI, University of Ulm, Ulm, Germany [12] Thorsten Liebig and Felix Muller. Parallelizing tableaux-based description logic reasoning. In On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, volume 4806/2007 of Lecture Notes in Computer Science, pages 1135�1144. Springer Berlin / Heidelberg, 2007. Disjunction: (a : C UNION D) in A ? (a : C) in A or (a : D) in A Number Restriction Merge: (a : le n r) in A and a has gt n r-successors, then one of (every combiniation making exactly n r-successors) is in A. Evaluation was on very simple datasets. Limited opportunity for parallelism.

37. Taxonomy of Parallel StrategiesFor Deduction [5] Department of Computer Science, The University of Iowa, Iowa City, IA, USA [5] Maria Paola Bonacina. A taxonomy of parallel strategies for deduction. Annals of Mathematics and Arti?cial Intelligence, 29(1):223�257, February 2000. Department of Computer Science, The University of Iowa, Iowa City, IA, USA [5] Maria Paola Bonacina. A taxonomy of parallel strategies for deduction. Annals of Mathematics and Arti?cial Intelligence, 29(1):223�257, February 2000.

38. Taxonomy of Parallel StrategiesFor Deduction [5] AND-parallelism, OR-parallelism, and parallel unification discussed later. Distributed search means dividing up the search space among processes by decomposing the problem in some way. (Like data partitioning and rule partitioning in [18].) Multi-search means giving the entire problem to each process but with different search plans. (Parallel execution of nondeterministic branches.) Both distributed search and multi-search require inter-process communication to ensure completeness and speedup. Heterogeneous systems allow different inferencers for different processes, whereas homogeneous requires all processes use the same inferencer. Most methods for term-level and clause-level parallelism use shared-memory paradigm, while most methods for search-level parallelism use a distributed memory paradigm. Mention Team-Work method (�best ball�)? AND-parallelism, OR-parallelism, and parallel unification discussed later. Distributed search means dividing up the search space among processes by decomposing the problem in some way. (Like data partitioning and rule partitioning in [18].) Multi-search means giving the entire problem to each process but with different search plans. (Parallel execution of nondeterministic branches.) Both distributed search and multi-search require inter-process communication to ensure completeness and speedup. Heterogeneous systems allow different inferencers for different processes, whereas homogeneous requires all processes use the same inferencer. Most methods for term-level and clause-level parallelism use shared-memory paradigm, while most methods for search-level parallelism use a distributed memory paradigm. Mention Team-Work method (�best ball�)?

39. Parallel Prolog [9] University of Texas at Dallas New Mexico State University Swedish Institute of Computer Science (2x) Technical University of Madrid (UPM) [9] Gopal Gupta, Enrico Pontelli, Khayri A. M. Ali, Mats Carlsson, and Manuel V. Hermenegildo. Parallel execution of prolog programs: a survey. ACM Transactions on Programming Languages and Systems, 23(4):472�602, 2001. And-parallelism consists of selecting and solving for multiple literals (from the query/resolvent) at the same time. Or-parallelism consists of selecting and solving for multiple clauses (from the program) at the same time. Unification parallelism consists of unifying in parallel. B (predicate of query) is replaced with Body from unification, and repeat.University of Texas at Dallas New Mexico State University Swedish Institute of Computer Science (2x) Technical University of Madrid (UPM) [9] Gopal Gupta, Enrico Pontelli, Khayri A. M. Ali, Mats Carlsson, and Manuel V. Hermenegildo. Parallel execution of prolog programs: a survey. ACM Transactions on Programming Languages and Systems, 23(4):472�602, 2001. And-parallelism consists of selecting and solving for multiple literals (from the query/resolvent) at the same time. Or-parallelism consists of selecting and solving for multiple clauses (from the program) at the same time. Unification parallelism consists of unifying in parallel. B (predicate of query) is replaced with Body from unification, and repeat.

40. Parallel Prolog [9] AND-parallelism: Query: p1(x) ^ p2(x,y) ^ p3(z,y,w) ^ p4(w) Try to solve all literals at the same time. Difficulties: Data dependencies between subgoals. Determining where the dependencies exist. Determining how processes work together to handle dependencies. Two variations: Independent and dependent.Two variations: Independent and dependent.

41. Parallel Prolog [9] OR-parallelism p(x) :- h(x) h(x) :- q(x) h(x) :- r(x,y) When a subgoal can unify with the heads of more than one clause. Difficulties Managing multiple �environments� (versions of the decision tree). One process will unify with [h(x) :- q(x)] and one with [h(x) :- r(x,y)]. However, now they BOTH need a copy of the �environment� for independent modifications. (Particularly true in distributed paradigm.)One process will unify with [h(x) :- q(x)] and one with [h(x) :- r(x,y)]. However, now they BOTH need a copy of the �environment� for independent modifications. (Particularly true in distributed paradigm.)

42. Parallel Prolog [9] Unification Parallelism (very fine-grained) Unify(f(t1,�,tn), g(s1,�sm)): if (f == g && n == m) { P1: r1 = unify(t1, s1); � Pn: rn = unify(tn, sn); } return (r1 and � and rn);

43. Parallel Non-Monotonic Reasoning [3] Department of Computer Science, Texas Tech University, Lubbock, TX, USA Department of Computer Science, New Mexico State University, Las Cruces, NM, USA (3x) [3] Marcello Balduccini, Enrico Pontelli, Omar Elkhatib, and Hung Le. Issues in parallel execution of non-monotonic reasoning systems. Parallel Computing, 31(6):608�647, June 2005. ASP Vertical Parallelism from Vertical/�Don�t know� Non-determinism Horizontal Parallelism from Horizontal/�Don�t care� Non-determinismDepartment of Computer Science, Texas Tech University, Lubbock, TX, USA Department of Computer Science, New Mexico State University, Las Cruces, NM, USA (3x) [3] Marcello Balduccini, Enrico Pontelli, Omar Elkhatib, and Hung Le. Issues in parallel execution of non-monotonic reasoning systems. Parallel Computing, 31(6):608�647, June 2005. ASP Vertical Parallelism from Vertical/�Don�t know� Non-determinism Horizontal Parallelism from Horizontal/�Don�t care� Non-determinism

44. Parallel Non-Monotonic Reasoning [3] Horizontal Parallelism Different agents cooperate in the construction of one model. Work together in the expand function. Vertical Parallelism Different agents work towards different models by selecting different literals. Typically very unbalanced, but can be balanced with an appropriate dynamic scheduling scheme. Parallel Lookahead If adding literal x to B makes it consistent and adding �x makes it inconsistent, then x can be added to B. Sharing Models Model Copying and Model Recomputation. In distributed environment, vertical parallelism performed extremely poorly due to overhead of copying large models. Static horizontal parallelism is like rule-partitioning in parallel OWL inferencing paper. Implementation is MASTER-SLAVE.In distributed environment, vertical parallelism performed extremely poorly due to overhead of copying large models. Static horizontal parallelism is like rule-partitioning in parallel OWL inferencing paper. Implementation is MASTER-SLAVE.

45. Lana-Match [2] Parallel, distributed Rete-Match. Master-slave paradigm. Everyone has a copy of the RETE network. When a rule is activated on the master, it sends all the �facts� that are activating the rule to a slave. The rule is then activated on the slave�s node and it performs the computation and returns the results. The master applies the results in timestamp order. Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia [2] Mostafa M. Aref and Mohammed A. Tayyib. Lana-match algorithm: a parallel version of the rete-match algorithm. Parallel Computing, 24(5-6):763�775, 1998. PARALLEL RETE Without getting into the details, the RETE algorithm forms a match/join network for efficiently matching productions and firing rules. NO EVALUATION GIVEN!Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia [2] Mostafa M. Aref and Mohammed A. Tayyib. Lana-match algorithm: a parallel version of the rete-match algorithm. Parallel Computing, 24(5-6):763�775, 1998. PARALLEL RETE Without getting into the details, the RETE algorithm forms a match/join network for efficiently matching productions and firing rules. NO EVALUATION GIVEN!

46. Outline Overview Literature Review Existing Parallelization Efforts Approaches Using some Parallelization Proposed Parallelization Approaches Suggestions from AI Field Suggestions from Distributed Data Field Map-Reduce-Merge [19] Suggestions from Parallel Computing Field Challenges in Parallel Graph Computing [13] Distributed Breadth-First Search for BlueGene/L [20] Synthesis 32:4232:42

47. Map-Reduce-Merge [19] �Apply the principles of databases rather than the artifacts.� [6] adds merge to MapReduce operations for supporting data intensive scientific analyses. Adding merge enables implementation of the following relational operators: Projection Aggregation Generalized Selection Joins Sort-Merge Join Hash Join Block Nested-Loop Join Set Union Set Intersection Set Difference Cartesian Product Rename Yahoo!, Sunnyvale, CA, USA (2x) Computer Science Department, UCLA, Los Angeles, CA, USA (2x) [19] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simpli?ed relational data processing on large clusters. In SIGMOD �07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029�1040, New York, NY, USA, 2007. ACM. Merge allows merging results from different Map-Reduce operations. Yahoo!, Sunnyvale, CA, USA (2x) Computer Science Department, UCLA, Los Angeles, CA, USA (2x) [19] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simpli?ed relational data processing on large clusters. In SIGMOD �07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029�1040, New York, NY, USA, 2007. ACM. Merge allows merging results from different Map-Reduce operations.

48. Challenges in Parallel Graph Processing [13] Challenges Data-driven computations Unstructured problems Poor locality High data access to computation ratio [8] mentioned as providing library for developing distributed parallel graph algorithms. Possible resolution of challenges is to reduce graph algorithms to numerical computations of matrices, thus providing more structure. A great example is [20], BFS for BlueGene/L. Indiana University, Bloomington, Indiana, USA (2x) Sandia National Laboratories, Albuquerque, New Mexico, USA (2x) [13] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5�20, 2007. *Data-driven computations: Partitioning the problem execution is difficult when the execution is mainly determined by the data. *Unstructured problems: Graphs are unstructured and irregular, making it difficult do data partitioning. *Poor locality: Because of the interdependencies in graph data, locality cannot be exploited. *High data access to computation ratio: Tied in with poor locality, more time is spent accessing data than in scientific computing applications.Indiana University, Bloomington, Indiana, USA (2x) Sandia National Laboratories, Albuquerque, New Mexico, USA (2x) [13] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5�20, 2007. *Data-driven computations: Partitioning the problem execution is difficult when the execution is mainly determined by the data. *Unstructured problems: Graphs are unstructured and irregular, making it difficult do data partitioning. *Poor locality: Because of the interdependencies in graph data, locality cannot be exploited. *High data access to computation ratio: Tied in with poor locality, more time is spent accessing data than in scientific computing applications.

49. Outline Overview Literature Review Synthesis Fine-Grained Parallelism Exploiting Non-Determinism Cooperating on Determinism Helping Parallelism with Approximation (and vice versa) Exploiting Distributed Graph Algorithms 35:0635:06

50. Fine-Grained Parallelism Well-exploited in database/storage systems. From distributed data: Parallel matching: YARS2, Clustered TDB, DHTs Can scale well for large datasets. Advantages: Lots of previous work. Easy to implement (compared to algorithmic changes). Issues: Limited performance improvements; generally only speed up by a factor. Usually not algorithmic parallelization. Traditionally, this could be things like loop unrolling. For this presentation, it refers more to performing lookups in multiple places at the same time. Examples: --YARS2 --Clustered TDB --RDFS on DHTs --DORSUsually not algorithmic parallelization. Traditionally, this could be things like loop unrolling. For this presentation, it refers more to performing lookups in multiple places at the same time. Examples: --YARS2 --Clustered TDB --RDFS on DHTs --DORS

51. Exploiting Non-Determinism From AI: Vertical Parallelism OR-Parallelism Multi-Search Advantages: Usually very parallelizable Issues: Potentially poor load balancing; can be relieved by good scheduling scheme. Potentially high data-sharing cost. Often uses a Master-Slave approach in distributed computing, which tends not to scale as well as �peer-to-peer� paradigms. Problems do not always have a high-degree of non-determinism, thus limiting opportunities for parallelization. Examples: --Parallel Tableaux, ASP, Prolog --TeamWork a good example of Multi-Search --MaRVIN (in a sense; must decide who gets what data) Most implementations in AI are shared-memory.Examples: --Parallel Tableaux, ASP, Prolog --TeamWork a good example of Multi-Search --MaRVIN (in a sense; must decide who gets what data) Most implementations in AI are shared-memory.

52. Cooperating on Determinism From AI: Horizontal Parallelism AND-Parallelism Distributed Search Advantages: Generally, if possible to exploit well, this approach scales better than non-determinism. Issues: Difficult to know how to get processes to cooperate or how to partition the problem. Communication overhead incurred; can possibly mask performance gains. Examples: --Parallel OWL Inferencing (example of distributed search) --MaRVIN (in a sense; each node work deterministically) Again, most implementations in AI seem to be shared memory.Examples: --Parallel OWL Inferencing (example of distributed search) --MaRVIN (in a sense; each node work deterministically) Again, most implementations in AI seem to be shared memory.

53. Helping Parallelism with Approximation (and vice versa) Approximation allows the problem to be partitioned more easily, but soundness and/or completeness must be sacrificed. In MaRVIN, reduces need for cooperation and communication. (Evaluated implementation is homogeneous.) Approximate Reasoning allows high-level parallelism to get better results. (Evaluated implementation is heterogeneous.) How well can evolutionary approaches be applied to the problem? Examples: --MaRVIN (homogeneous) --Approximate Reasoning (heterogeneous) --Evolutionary ApproachExamples: --MaRVIN (homogeneous) --Approximate Reasoning (heterogeneous) --Evolutionary Approach

54. Exploiting Distributed Graph Algorithms Turns out work in this area is as new as parallelism for semantic web. Currently, not a lot of previous work to exploit. Best outlook seems to be reduction to numerical computations on matrices, for which there is much previous work. Difficulty: RDF graphs are not your run-of-the-mill graphs. Labeled, directed, loops, and predicates can be subjects/objects (i.e., edges can be nodes)! Possibly learn from BitMat work.Possibly learn from BitMat work.

55. Questions? 39:54 (~40:00)39:54 (~40:00)

56. Works Cited [1] George Anadiotis, Spyros Kotoulas, Eyal Oren, Ronny Siebes, Frank van Harmelen, Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, and Henri E. Bal. Marvin: a distributed platform for massive rdf inference. http://www.larkc.eu/marvin/btc2008.pdf, 2008. [2] Mostafa M. Aref and Mohammed A. Tayyib. Lana-match algorithm: a parallel version of the rete-match algorithm. Parallel Computing, 24(5-6):763�775, 1998. [3] Marcello Balduccini, Enrico Pontelli, Omar Elkhatib, and Hung Le. Issues in parallel execution of non-monotonic reasoning systems. Parallel Computing, 31(6):608�647, June 2005. [4] Jurgen Bock. Parallel computation techniques for ontology reasoning. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 901�906. Springer Berlin / Heidelberg, 2008. [5] Maria Paola Bonacina. A taxonomy of parallel strategies for deduction. Annals of Mathematics and Arti?cial Intelligence, 29(1):223�257, February 2000. [6] Jaliya Ekanayake, Shrideep Pallickara, and Geo?rey Fox. Mapreduce for data intensive scienti?c analyses. In 4th IEEE International Conference on e-Science, 2008. [7] Qiming Fang, Ying Zhao, Guangwen Yang, and Weimin Zheng. Scalable distributed ontology reasoning using dht-based partitioning. In The Semantic Web, volume 5367/2008 of Lecture Notes in Computer Science, pages 91�105. Springer Berlin / Heidelberg, 2008. [8] Douglas Gregor and Andrew Lumsdaine. Lifting sequential graph algorithms for distributed-memory parallel computation. SIGPLAN Not., 40(10):423�437, 2005.

57. Works Cited [9] Gopal Gupta, Enrico Pontelli, Khayri A. M. Ali, Mats Carlsson, and Manuel V. Hermenegildo. Parallel execution of prolog programs: a survey. ACM Transactions on Programming Languages and Systems, 23(4):472�602, 2001. [10] Andreas Harth, Jurgen Umbrich, Aidan Hogan, and Stefan Decker. Yars2: A federated repository for querying graph structured data from the web. In The Semantic Web, volume 4825/2008 of Lecture Notes in Computer Science, pages 211�224. Springer Berlin / Heidelberg, November 2007. [11] Zoi Kaoudi, Iris Miliaraki, and Manolis Koubarakis. Rdfs reasoning and query answering on top of dhts. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 499�516. Springer Berlin / Heidel- berg, 2008. [12] Thorsten Liebig and Felix Muller. Parallelizing tableaux-based description logic reasoning. In On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, volume 4806/2007 of Lecture Notes in Computer Science, pages 1135�1144. Springer Berlin / Heidelberg, 2007. [13] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5�20, 2007. [14] Eyal Oren, Christophe Gueret, and Stefan Schlobach. Anytime query answering in rdf through evolutionary algorithms. In The Semantic Web - ISWC 2008, volume 5318/2008 of Lecture Notes in Computer Science, pages 98�113. Springer Berlin / Heidelberg, 2008.

58. Works Cited [15] Eyal Oren, Spyros Kotoulas, George Anadiotis, Ronald Siebes, Annette ten Teije, and Frank van Harmelen. Marvin: A platform for large-scale analysis of semantic web data. In Proceeding of the WebSci�09: Society On-Line, March 2009. [16] Alisdair Owens, Andy Seaborne, Nick Gibbins, and mc schraefel. Clustered tdb: A clustered triple store for jena. http://eprints.ecs.soton.ac.uk/16974/1/www2009?xedref.pdf, 2008. [17] Sebastian Rudolph, Tuvshintur Tserendorj, and Pascal Hitzler. What is approximate reasoning? In Web Reasoning and Rule Systems, volume 5341/2008 of Lecture Notes in Computer Science, pages 150�164. Springer Berlin / Heidelberg, 2008. [18] Ramakrishna Soma and V. K. Prasanna. Parallel inferencing for owl knowledge bases. In ICPP �08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 75�82, Washington DC, USA, 2008. IEEE Computer Society. [19] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simpli?ed relational data processing on large clusters. In SIGMOD �07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029�1040, New York, NY, USA, 2007. ACM. [20] Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. A scalable distributed parallel breadth-?rst search algorithm on bluegene/l. In SC �05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, Washington DC, USA, 2005. IEEE Computer Society.

Literature Review: Parallel Computing for the Semantic Web

Literature Review: Parallel Computing for the Semantic Web

Presentation Transcript

Computing Real Language Meaning for the Semantic Web

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

Parallel Computing

Parallel Computing

Parallel Computing

Parallel Literature, Parallel Language

Parallel Computing

Centre for Parallel Computing

Semantic Gadgets Pervasive Computing Meets the Semantic Web

Parallel Computing

Languages for the Semantic Web

Ontologies for the Semantic Web

Parallel Computing

RuleML for the Semantic Web

Parallel Computing

Languages for the Semantic Web and Semantic Web Services

Web 2.0, Grids and Parallel Computing

Preparing for the Semantic Web

Grid Computing & Semantic Web

Parallel Computing Final Exam Review

Languages for the Semantic Web and Semantic Web Services

Literature Review: Parallel Computing for the Semantic Web