1 / 1

Sketch-Based Distance Estimates for Web Scale Graphs

Sketch-Based Distance Estimates for Web Scale Graphs. Atish Das Sarma ( Georgia Tech ), Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy ( Microsoft ). Distance Computation Algorithm. Online Distance Computation on Massive Graphs Distance/path computation on Social Networks

shadow
Download Presentation

Sketch-Based Distance Estimates for Web Scale Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sketch-Based Distance Estimates for Web Scale Graphs Atish Das Sarma (Georgia Tech), Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy (Microsoft) Distance Computation Algorithm • Online Distance Computation on Massive Graphs • Distance/path computation on Social Networks • Distance between search and ad results • Building block for other online algorithms • pre-computation : all sketches • query time: nodes u and v • at runtime, retrieve Obama • Road Networks • Already solved very efficiently – specific to 2D • Set Sketch Based Distances Effectiveness of our Algorithm For all nodes x, precompute small information Sketch(x) At query time, combine Sketch(u) and Sketch(v) to estimate distance. You undirected Real Data • 65M web pages, 420M URLs, 2.3B edges • C = 60M (directed), C = 128M (undirected) • Undirected distance [1,15] • Directed distance [1,100] (∞ otherwise) • Sketch size: (s+8)k |logC|bits • k = 3 number of copies of seed sets • s = 12 size of seed id. 8 to store distance • ~200, 400 bytes for undirected, directed Sketch computation Repeatedly (k times), sample random set of nodes (S) of sizes 20, 21, 22, …, 2│logC| from candidate set C and store nearest node and distance to it from all nodes in the graph. directed

More Related