Approximately counting triangles in sublinear time

Approximately counting triangles in sublinear time Talya Eden, Tel Aviv UniversityAmit Levi, University of WaterlooDana Ron, Tel Aviv UniversityC. Seshadhri, UC Santa Cruz

Counting Triangles Basic graph-theoretic algorithmic question that arises in various applications (e.g. Bioinformatics and Social networks). Has been studied quite extensively in the past: Algorithms for exact counting: O(m3/2) – [Itai&Rodeh], [Chiba&Nisizeki] (m is num of edges)O(m1.41) – [Alon,Yuster&Zwick] (based on matrix multiplication) Algorithms for approximate countingMany algorithms in a variety of models (including streaming) (e.g., [Schank&Wagber], [Tsourakakis], [Avron], [Kolointzakis,Miller,Peng,Tsourakakis], [Chu&Cheng], [Suri&Vassilvitskii], [Arifuzzamna,Khan,Marathe], [Seshadhri,Kolda,Pinar], [Tangwongsan,Pavan,Tirthapura]… ) All previous algorithms (exact/approximate) read the entire graph

Counting Triangles in Sublinear Time Problem considered by [Gonen,R,Shavit], whose main focus was on counting the number of s-stars They considered algorithms that had access to degree queries: what is d(v)for vertex v, and neighbor queries: what is i‘th neighbor of vertex v.Showed that in general no sublinear algorithm for approximately counting num of triangles (in contrast to s-stars) Simple LB construction: Num of triangles linear in n (and m) No triangles Natural question: Is there sublinear alg if also allow vertex-pair queries (is there an edge btwnu and v)? We answer question affirmatively

Our Results Given query access (degree, neighbor, vertex-pair) to graph G with n vertices, m edges, t triangles and parameter (0,1], our algorithm returns s.t. with high constant probability (1-)t    (1+)t Expected query complexity O(n/t1/3+ m3/2/t) poly(log n,1/) More precisely: O(n/t1/3+ min{m,m3/2/t}) poly(log n,1/) Also give matching lower bound (up to polylog(n) factors and for constant )

Related Works (Sublinear algs) • Approximating the average degree (number of edges) [Feige], [Goldreich,R] • Approximating the number of stars[Gonen,R,Shavit] • Other sublinear algorithms for approximating graph parameters: MST[Chazelle,Rubinfeld,Trevisan], [Czumaj&Sohler], [Czuman,Ergun,Fortnow,Magen,Newman,Rubinfeld,Sohler], Min VC [Parnas&R], [Nguyan&Onak], [Marko&R], [Yoshida,Yamamoto,Ito], [Onak,R,Rosen,Rubinfled], Max Match [Nguyan&Onak], [Yoshida,Yamamoto,Ito] • Testing Triangle-Freeness[Alon,Fischer,Krivelevich,Szegedy], [Alon], [Alon,Kaufman,Krivelevich,R]

Towards an algorithm I Start with following assumptions (removed later) • Can sample a uniform edge • Can query t(e): num of triangles edge e participates in • Also assume that know m (estimate suffices - use [Feige]) and that know constant factor estimate of t(can remove by search) Given these assumptions can get (1) estimate of t: Select q edges uniformly at random. Denote sample by Y Query t(e) for each e in Y Return (eY t(e))/3q)m Analysis • Since et(e) = 3t, Expe[t(e)] = 3t/m, • so Exp[eYt(e)/(3q)] = t/m • To get h.c.p: Suffices to take q=O((m/t) maxe{t(e)}) (for const) • Difficulty:maxe{t(e)} may be large

Towards an algorithm II : Bounding t(e) Modify t(e) so that e = (u,v) only assigned triangles (u,v,w) s.t.d(w)>d(u),d(v) (break ties by id). Observe: each triangle assigned to single edge: et(e)=t Claim: t(e)=O(m1/2). Proof:If d(u)  m1/2, then immediate. Otherwise (d(u)>m1/2), num of neighbors w of u with degree at least m1/2 is O(m1/2) (or else get more than m edges). If have oracle access to (modified definition of) t(e) and can sample edges uniformly, get an algorithm with query complexity O((m/t)  maxe{t(e)}) = O(m3/2/t) w u v

Towards an algorithm III: Removing oracle assumption (for t(e)) Procedure replacing oracle for t(e) given edge e=(u,v) Consider lower degendpoint of e=(u,v), wlog, it’s u • Select neighbor w of uunif. at random • Query the pair (w,v) • If (w,v)E and d(w)>d(u),d(v), set (e)=d(u)o.w., (e)=0 w u v ? Analysis (for fixed e) • Exp[(e)] = Pr[hit tri assigned to e]d(u) = (t(e)/d(u))d(u) = t(e) • If d(u)  m1/2then (e)  m1/2 • Otherwise, to reduce variance “internal to procedure”, let (e) be average value over d(u)/m1/2repetitions of above. Resulting algorithm for estimating t: Select q=O(m3/2/t) edges uniformly at random. Denote sample by Y Run procedure on each e in Y to get (e) Return (m/q)eY(e) Expected query complexity O(m3/2/t)

Towards an algorithm IV: Removing assumption on unif edge selection Idea: Select subset S of vertices unif at random, consider set of incident (“ordered”) edges E(S) = {(u,v): uS, v(u)} If query deg of all S, can sample edge unif in E(S) S u (almost..) Algorithm Select s=O(n/t1/3) vertices uniformly at random. Denote sample by S Select q=O(m3/2/t) edges uniformly at random in E(S) Denote sample by Y Run procedure on each e in Y to get (e) Return (n/2s)(|E(S)|/q)eY(e) Exp[(n/2s)(|E(S)|/q)eY(e)] = (n/2s)((sdavg)/q)q(t/m) = t Can show that by modifying t(e) and procedure that computes (e), getalgorithm that computes (1) estimate of tbyperforming O(n/t1/3+ min{m3/2/t,m}) queries in expectation.

Towards an algorithm IV: Removing assumption on unif edge selection Algorithm (almost) Select s=O(n/t1/3) vertices uniformly at random. Denote sample by S Select q=O(m3/2/t) edges uniformly at random in E(S) Denote sample by Y Run procedure on each e in Y to get (e) Return (n/2s)(|E(S)|/q)eY(e) What’s missing? By slightly generalizing what we have already shown, whp, (|E(S)|/q)eY (e) is a good approximation of eE(S)t(e). If we write eE(S)t(e) asvSt(v), where t(v) = eE(v)t(e) Would like to show that (n/s)vSt(v) is close to vVt(v)=2t We show this for variant of t(v) (t(e)) which requires modifying the procedure for(e).

Lower bound idea(s) Recal: (n/t1/3 + min{m3/2/t,m}) LB of (n/t1/3) is a simple “hitting” lower bound: With fewer than n/t1/3queries cannot distinguish between: An empty graph - no triangles, A graph containing a clique of over t1/3 vertices, and n-t1/3 independent set – (t)triangles.

Lower bound idea(s) continued LB of (m3/2/t ) (for tm1/2)Basic structure: Complete bipartite graph with both sides of size m1/2(remaining vertices, independent set). No triangles. Consider adding edges btwn vertices on lhs of bipartite graph. Each edge givesm1/2triangles. (For example: t=(m), add (random) perfect matching.) Small difficulty: degrees of lhs vertices “give it away”. Take care by removing bipartite edges and adding matching edges on rhs. Intuition for LB: Let k be number of added edges so that k=t/m1/2.Probability of “hitting” added edge (or removed edge) is k/m=t/m3/2.

Summary Present algorithm computing s.t. with high constant probability (1-)t    (1+)t Expected query complexity O(n/t1/3+ min{m,m3/2/t}) poly(log n,1/) Main ideas: • Assign triangles to edges so that each edge e assigned t(e)=O(m1/2) triangles (if had oracle to t(e) and could sample edges uniformly, would be done) • Give simple procedure for computing r.v. (e)s.t.Exp[(e)]=t(e) (if could sample edges uniformly, would be done) • Replace uniform sampling of edges from entire graph by uniformly sampling edges incident to uniformly sampled subset of vertices. Matching lower bound (up to polylog(n) factors and for constant )

Thanks

Approximately counting triangles in sublinear time