Download
e 3 143 n.
Skip this Video
Loading SlideShow in 5 Seconds..
EE1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 PowerPoint Presentation
Download Presentation
EE1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0

EE1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0

5 Views Download Presentation
Download Presentation

EE1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. E3 143 E41342 Graph Path Analytics(using pTrees) 1 0 0 0 Two-Level Stride=4, Edge pTrees Two-Level Str=4, Unique Edge pTrees 0 0 0 0 E3 1 L13 14 L13 41 L13 31 L13 43 E3 133 L23 4 E3 142 L23 3 E3 132 L23 2 E3 13 E3 121 L23 1 E3 141 E3 124 E3 11 E3 123 L13 24 E3 134 L13 13 E3 142 E3 131 E3 243 E3 134 E3 431 L13 34 E3 241 E3 241 E3 143 E3 243 E3 114 E3 144 E3 341 E3 314 E3 122 E3 14 E3 143 E3 413 E3 134 E3 113 E3 12 E3 341 E3 342 E3 314 E3 413 E3 431 E3 112 E3 111 E3 2 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h=1 j=4 k=2 E3142=E2&M’4 E3 E3 E3 M’4 1 1 1 0 E3key E2 0 0 0 1 1 E3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1111 1112 1113 1114 1121 1122 1123 1124 1131 1132 1133 1134 1141 1142 1143 1144 1211 1212 1213 1214 1221 1222 1223 1224 1231 1232 1233 1234 1241 1242 1243 1244 1311 1312 1313 1314 1321 1322 1323 1324 1331 1332 1333 1334 1341 1342 1343 1344 1411 1412 1413 1414 1421 1422 1423 1424 1431 1432 1433 1434 1441 1442 1443 1444 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 pure0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2111 2112 2113 2114 2121 2122 2123 2124 2131 2132 2133 2134 2141 2142 2143 2144 2211 2212 2213 2214 2221 2222 2223 2224 2231 2232 2233 2234 2241 2242 2243 2244 2311 2312 2313 2314 2321 2322 2323 2324 2331 2332 2333 2334 2341 2342 2343 2344 2411 2412 2413 2414 2421 2422 2423 2424 2431 2432 2433 2434 2441 2442 2443 2444 3111 3112 3113 3114 3121 3122 3123 3124 3131 3132 3133 3134 3141 3142 3143 3144 3211 3212 3213 3214 3221 3222 3223 3224 3231 3232 3233 3234 3241 3242 3243 3244 3311 3312 3313 3314 3321 3322 3323 3324 3331 3332 3333 3334 3341 3342 3343 3344 3411 3412 3413 3414 3421 3422 3423 3424 3431 3432 3433 3434 3441 3442 3443 3444 4111 4112 4113 4114 4121 4122 4123 4124 4131 4132 4133 4134 4141 4142 4143 4144 4211 4212 4213 4214 4221 4222 4223 4224 4231 4232 4233 4234 4241 4242 4243 4244 4311 4312 4313 4314 4321 4322 4323 4324 4331 4332 4333 4334 4341 4342 4343 4344 4411 4412 4413 4414 4421 4422 4423 4424 4431 4432 4433 4434 4441 4442 4443 4444 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 L1 1 1 1 0 L1 E 1 1 1 1 h=2 j=4 ListE224={1,3} k=1 E3241=E1&M’4 V1 L0 Vertex Masks 1 2 3 4 Edge Mask pTree Unique Edge Mask V2 L0 M’4 1 1 1 0 E1 0 0 1 1 Edges V1 V2 3 4 1 E1 0 0 1 1 U 0 0 1 1_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 U1 0 0 1 1 M1 1 0 0 0 E 0 0 1 1_ 0 0 0 1_ 1 0 0 1_ 1 1 1 0 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 Graph Path: sequence of edges connecting a sequence of vertices (usually) distinct from each other except for the endpoints. 2 1 3 E2 0 0 0 1 U2 0 0 0 1 M2 0 1 0 0 h=2 j=4 k=3 E3243=E3&M’4 4 1 1 1 M’4 1 1 1 0 E3 1 0 0 1 EE 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 EE1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 EE11 0 0 0 0 111 112 113 114 121 122 123 124 131 132 133 134 141 142 143 144 211 212 213 214 221 222 223 224 231 232 233 234 241 242 243 244 311 312 313 314 321 322 323 324 331 332 333 334 341 342 343 344 411 412 413 414 421 422 423 424 431 432 433 434 441 442 443 444 E3 1 0 0 1 U3 0 0 0 1 M3 0 0 1 0 1paths (1 edge) are the Edges EE12 0 0 0 0 We use pTrees to find and exhibit the 2paths (EE or E2) and the 3paths (E3), etc. E4 1 1 1 0 U4 0 0 0 0 M4 0 0 0 1 E3& 1 0 0 1 M’1= 0 1 1 1 EE13 0 0 0 1 EE13 0 0 0 1 E4 1 1 1 0 M’1 0 1 1 1 h=3 j=1 k=4 E3314=E4&M’1 E2key v1v2v3 kListEh, E2hk = Ek&M’h. (other k, E2hk=0) EE14 0 1 1 0 E4 1 1 1 0 M’1= 0 1 1 1 EE14 0 1 1 0 For h=1, ListE1={3,4} EE2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 EE21 0 0 0 0 E1 0 0 1 1 M’4 1 1 1 0 h=3 j=4 k=1 E3341=E1&M’4 For h=1 k=3: EE13=E3&M’1 EE22 0 0 0 0 EE23 0 0 0 0 For h=1 k=4: EE14=E4&M’1 E2 0 0 0 1 M’4 1 1 1 0 h=3 j=4 k=2 E3342=E2&M’4 EE24 1 0 1 0 M’2 1 0 1 1 EE24 1 0 1 0 E4 1 1 1 0 EE31 0 0 0 1 EE3 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 For h=2, ListE2={4} E1& 0 0 1 1 M’3= 1 1 0 1 EE31 0 0 0 1 EE32 0 0 0 0 E3 1 0 0 1 M’1 0 1 1 1 h=4 j=1 k=3 E3413=E3&M’1 E4& 1 1 1 0 M’3= 1 1 0 1 EE34 1 1 0 0 EE33 0 0 0 0 For h=2 k=4: EE24=E4&M’2 For h=3, ListE3={1,4} EE34 1 1 0 0 E1 0 0 1 1 M’3 1 1 0 1 h=4 j=3 k=1 E3431=E1&M’3 For h=3 k=1: EE31=E1&M’3 E1& 0 0 1 1 M’4= 1 1 1 0 EE41 0 0 1 0 EE41 0 0 1 0 EE4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 kListE2hj, E3hjk=Ek&M’j. E4 1 1 1 0 M’3 1 1 0 1 h=1 j=3 ListE213={4} k=4 E3134=E4&M’3 EE42 0 0 0 0 E2& 0 0 0 1 M’4= 1 1 1 0 h=1 j=4 k=3 E3143=E3&M’4 EE42 0 0 0 0 pure0 M’4 1 1 1 0 E3 1 0 0 1 For h=3 k=4: EE34=E4&M’3 EE43 1 0 0 0 kListE3hij, E4hijk = Ek & M’j & M’i ListE3143={1} For h=4, ListE4={1,2,3} EE44 0 0 0 0 E3& 1 0 0 1 M’4= 1 1 1 0 EE43 1 0 0 0 For h=4 k=1: EE41=E1&M’4 Level=2 (These are exactly the Level=1 of E) For h=4 k=2: EE42=E2&M’4 E43142 E42413 E42431 M’1 0 1 1 1 M’1 0 1 1 1 M’3 1 1 0 1 E1 0 0 1 1 E2 0 0 0 1 E3 0 0 0 1 M’4 1 1 1 0 M’4 1 1 1 0 M’4 1 1 1 0 M’3 1 1 0 1 E2 0 0 0 1 M’4 1 1 1 0 ListE3134={1,2} h=1 i=3 j=4 k=2 Level=3(So E2 is the upper 3 levels of E3) 0 0 0 0 0 0 0 0 0 0 0 0 ListE3241={3} h=2 i=4 j=1 k=3 ListE3314={2,3} h=3 i=1 j=4 k=2 ListE3243={1} h=2 i=4 j=3 k=1 Level=2 For h=4 k=3: EE43=E3&M’4 1 1 1 1 ListE3341={3} 1 1 1 1 ListE3413={4} Level=1 (These are exactly the Level=0’s of E2) ListE3431={4} Level=1=just E1,E2,E3,E4 with pure0 bits turned off. E1 0 0 1 1 E2 0 0 0 1 E3 1 0 0 1 3Level, Stride=4 pTrees for paths of len=2 (2 edges and 3 vertices (unique except for endpts) ) E4 1 0 bit turned off 1 0 No 5 vertex (4 edge) paths. Creation stops. The Stride=|V|, Levels=Diam Path Mask is: E E2 E3 : Edlongest_path Level=0 (We just computed these) Level=0 EE13 0 0 0 1 EE14 0 1 1 0 EE24 1 0 1 0 EE34 1 1 0 0 EE31 0 0 0 1 EE41 0 0 1 0 EE43 1 0 0 0

  2. Graph Path pTreestride=|V|=7. L is the longest path length, Path pTree is a L+1 level pTree (Levels 0-L) Level L-1 NPZ key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 3,6 3,7 4,1 4,2 4,3 4,4 4,5 4,6 4,7 5,1 5,2 5,3 5,4 5,5 5,6 5,7 6,1 6,2 6,3 6,4 6,5 6,6 6,7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 E 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 E1 E2 E3 E4 E5 E6 E7 kListEh, E2hk = Ek & M’h M’h forces off bit=h, lest we repeat it.(Note Ek already has k bit turned off.) 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 Even tho these are undirected graphs, we must be careful doing path analytics. If we only extend in one direction, we need to list paths and their reverse, e.g., If we have 1paths, 12 and 14 and not 21 and 41, then as we look for 2paths by extending on the right, we would look at E2 to extend 12 and E4 to extend 14, likely missing 2 path, 214. So either we record all reverses (as we do with E) or we look for extensions on both sides. We’ll record (redundantly) all paths together with their reverse paths in Ek. E212 E213 E214 E216 E221 E223 E224 E231 E232 E234 E256 E267 E241 E242 E243 E261 E265 E276 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 165 E3 143 214 216 241 341 342 412 413 416 421 423 431 432 561 567 612 613 614 761 765 142 231 234 243 312 314 316 321 324 132 134 167 123 124 213 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 6 E4 1243 1324 1341 1342 1423 1432 2132 2134 2142 2143 2165 2167 2314 2316 2341 2342 2413 2416 2432 3123 3124 3142 3143 3165 3167 3124 3126 3241 3243 3412 3416 3421 1231 1234 1241 5 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 7 2 1 E3(Last 14 copied) 412 413 416 421 423 431 432 561 567 612 613 614 761 765 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 E4 4132 4134 4123 4124 4165 4167 4213 4231 4234 4312 4316 4321 5612 5613 5614 6123 6124 6132 6134 6142 6143 7612 7613 7614 4216 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 4 3

  3. Path pTree Continued stride=|V|=7. L is the longest path length, Path pTree is a L+1 level pTree (Levels 0-L) kListEh, E2hk = Ek & M’h M’h forces off bit=h, lest we repeat it.(Note Ek already has k bit turned off.) E4 1243 1324 1341 1342 1423 1432 2132 2134 2142 2143 2165 2167 2314 2316 2341 2342 2413 2416 2432 3123 3124 3142 3143 3165 3167 3124 3126 3241 3243 3412 3416 3421 1231 1234 1241 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 E5 12341 14231 14321 23142 23167 23416 31243 31267 34123 34167 34213 34216 12341 13241 13421 21432 23165 23412 24132 24167 31243 31265 34165 14321 24165 32413 31423 32416 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 234165 234167 342165 342167 324165 324167 E6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E4 4132 4134 4123 4124 4165 4167 4213 4231 4234 4312 4316 4321 5612 5613 5614 6123 6124 6132 6134 6142 6143 7612 7613 7614 4216 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 6 41234 5 41324 7 42134 31267 31265 42314 42316 43167 56132 56142 43165 43214 56123 56134 56143 61234 43124 43216 56124 76123 61243 61324 61342 61423 61432 76124 76132 76134 E5 76142 76143 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 1 432165 432167 561234 561243 561423 561432 761234 761243 761423 761432 561324 561342 761324 761342 423165 423167 E6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 234165 234167 342165 342167 423165 423167 432165 432167 561234 561243 561423 561432 761234 761243 761423 761432 561324 561342 761324 761342 324165 324167 23167 56134 31267 34167 43167 23165 23416 24167 31265 32416 34165 31267 43165 43216 56124 24165 34216 31265 56132 56142 56123 56143 61243 61234 61324 61342 61423 61432 76123 76124 76142 76132 76134 76143 3124 3142 4132 4165 4167 4213 4231 4312 4316 4321 5612 5613 5614 6123 6124 6132 6134 6142 6143 4216 7612 7613 7614 1342 1423 1432 2134 2143 2165 2167 2314 2316 2341 2413 2416 3165 3167 3124 3126 3241 3412 3416 3421 4123 165 143 142 132 134 167 412 413 416 421 423 431 432 561 567 612 613 614 761 765 214 216 241 341 342 412 413 416 421 423 431 432 561 567 612 613 614 761 765 123 124 213 231 234 243 312 314 316 321 324 12 13 14 16 23 24 34 56 67 4 3 The COMBO(7,2)=21 endpoint pairs are: 12 13 14 15 16 17 23 24 25 26 27 34 35 36 37 45 46 47 56 57 67 The shortest paths are of lengths: 1 1 1 2 1 2 1 1 3 2 3 1 3 2 3 3 2 3 1 2 1 So the diameter of the graph is 3. For very big graphs, how can we determine the diameter using pTrees only?

  4. The Path pTree E1 0 0 1 1 E2 0 0 0 1 E3 1 0 0 1 E4 1 1 1 0 E1 2  1bit? no 1 E2 13 0 0 0 1 14 0 1 1 0 24 1 0 1 0 31 0 0 0 1 34 1 1 0 0 41 0 0 1 0 43 1 0 0 0 42 0 0 0 0 • 1bit? Yes  ShortestPath1,2 =132 G1 3 4 E3 143 1 0 0 0 342 0 0 0 0 314 0 1 1 0 134 1 1 0 0 142 0 0 0 0 241 0 0 1 0 243 1 0 0 0 341 0 0 1 0 413 0 0 0 1 431 0 0 0 1 DiamG1 = maxkV(Diamk) = 2 How can we determine graph diameter? The Ek-diameter, diamk is the maximum of the minimum path lengths from k to the other vertices. For each k, proceed down from Ek a level at a time and record the first occurrence of kh , hk. Diam4=max{fo41 fo42 fo43}=max{111}=1 Diam2=max{fo21 fo23 fo24}=max{2 2 1}=2 Diam1 = max{fo12 fo13 fo14} = max{2 1 1} = 2 Diam3=max{fo31 fo32 fo34}=max{1 2 1}=2 1 0 1 1 1 0 1 0 2 1 0 1 1 0 0 0 3 1 1 0 1 0 0 0 4 1 1 1 0 0 0 0 5 0 0 0 0 0 1 0 6 1 0 0 0 1 0 1 7 0 0 0 0 0 1 0  1bit? no 1bit? No  1 2 0 0 1 1 0 0 0 6 7 0 0 0 0 0 0 0 1 3 0 1 0 1 0 0 0 1 4 0 1 1 0 0 0 0 1 6 0 0 0 0 1 0 1 2 1 0 0 1 1 0 1 0 2 3 1 0 0 1 0 0 0 2 4 1 0 1 0 0 0 0 3 1 0 1 0 1 0 1 0 3 2 1 0 0 1 0 0 0 3 4 1 1 0 0 0 0 0 4 1 0 1 1 0 0 1 0 4 2 1 0 1 0 0 0 0 4 3 1 1 0 0 0 0 0 5 6 1 0 0 0 0 0 1 6 1 0 1 1 1 0 0 0 6 5 0 0 0 0 0 0 0 7 6 1 0 0 0 1 0 0 G2  1bit? no Yes! SP15 = 165 . 6 5 1 4 3 1 1 0 0 0 0 0 3 1 2 0 0 1 1 0 0 0 3 2 1 0 0 1 1 0 1 0 3 2 4 1 0 1 0 0 0 0 3 4 1 0 1 1 0 0 1 0 3 4 2 1 0 1 0 0 0 0 1 3 2 1 0 0 1 0 0 0 1 3 4 1 1 0 0 0 0 0 1 4 2 1 0 1 0 0 0 0 1 6 5 0 0 0 0 0 0 0 1 2 3 1 0 0 1 0 0 0 1 2 4 1 0 1 0 0 0 0 1 6 7 0 0 0 0 0 0 0 2 1 4 0 1 1 0 0 0 0 2 1 6 0 0 0 0 1 0 1 2 4 1 0 1 1 0 0 1 0 3 1 4 0 1 1 0 0 0 0 4 1 2 0 0 1 1 0 0 0 4 1 3 0 1 0 1 0 0 0 4 1 6 0 0 0 0 1 0 1 4 2 1 0 0 1 1 0 1 0 4 2 3 1 0 0 1 0 0 0 4 3 1 0 1 0 1 0 1 0 4 3 2 1 0 0 1 0 0 0 5 6 1 0 1 1 1 0 0 0 5 6 7 0 0 0 0 0 0 0 6 1 2 0 0 1 1 0 0 0 6 1 3 0 1 0 1 0 0 0 6 1 4 0 1 1 0 0 0 0 7 6 1 0 1 1 1 0 0 0 7 6 5 0 0 0 0 0 0 0 2 1 3 0 1 0 1 0 0 0 2 3 1 0 1 0 1 0 1 0 2 3 4 1 1 0 0 0 0 0 2 4 3 1 1 0 0 0 0 0 3 1 6 0 0 0 0 1 0 1 7 y SP72 =7612 2 4 1 3 2 0 0 0 1 0 0 0 4 1 3 4 0 1 0 0 0 0 0 2 3 1 6 0 0 0 0 1 0 1 1 2 3 4 1 0 0 0 0 0 0 1 2 4 1 0 0 1 0 0 1 0 1 2 4 3 1 0 0 0 0 0 0 1 3 2 4 1 0 0 0 0 0 0 1 3 4 1 0 1 0 0 0 1 0 1 3 4 2 1 0 0 0 0 0 0 1 4 2 3 1 0 0 0 0 0 0 1 4 3 2 1 0 0 0 0 0 0 2 1 3 2 0 0 0 1 0 0 0 2 1 3 4 0 1 0 0 0 0 0 2 1 4 2 0 0 1 0 0 0 0 2 1 4 3 0 1 0 0 0 0 0 2 1 6 5 0 0 0 0 0 0 0 2 1 6 7 0 0 0 0 0 0 0 2 3 1 4 0 1 0 0 0 0 0 2 3 4 1 0 1 0 0 0 1 0 2 3 4 2 1 0 0 0 0 0 0 2 4 1 3 0 1 0 0 0 0 0 2 4 1 6 0 0 0 0 1 0 1 2 4 3 2 1 0 0 0 0 0 0 3 1 2 3 0 0 0 1 0 0 0 3 1 2 4 0 0 1 0 0 0 0 3 1 4 2 0 0 1 0 0 0 0 3 1 4 3 0 1 0 0 0 0 0 3 1 6 5 0 0 0 0 0 0 0 3 1 6 7 0 0 0 0 0 0 0 3 1 2 4 0 0 1 0 0 0 0 3 1 2 6 0 0 0 0 1 0 1 3 4 1 2 0 0 1 0 0 0 0 3 4 1 6 0 0 0 0 1 0 1 3 4 2 1 0 0 1 0 0 1 0 1 2 3 1 0 0 0 1 0 1 0 4 1 2 3 0 0 0 1 0 0 0 4 1 2 4 0 0 1 0 0 0 0 4 1 6 5 0 0 0 0 0 0 0 4 1 6 7 0 0 0 0 0 0 0 4 2 1 3 0 0 0 1 0 0 0 4 2 1 6 0 0 0 0 1 0 1 4 2 3 1 0 0 0 1 0 1 0 4 2 3 4 1 0 0 0 0 0 0 4 3 1 2 0 0 0 1 0 0 0 4 3 1 6 0 0 0 0 1 0 1 4 3 2 1 0 0 0 1 0 1 0 5 6 1 2 0 0 1 1 0 0 0 5 6 1 3 0 1 0 1 0 0 0 5 6 1 4 0 1 1 0 0 0 0 6 1 2 3 0 0 0 1 0 0 0 6 1 2 4 0 0 1 0 0 0 0 6 1 3 2 0 0 0 1 0 0 0 6 1 3 4 0 1 0 0 0 0 0 6 1 4 2 0 0 1 0 0 0 0 6 1 4 3 0 1 0 0 0 0 0 7 6 1 2 0 0 1 1 0 0 0 7 6 1 3 0 1 0 1 0 0 0 7 6 1 4 0 1 1 0 0 0 0 3 2 4 1 0 0 1 0 0 1 0 3 2 4 3 1 0 0 0 0 0 0 1 Diam1=max{fo 12 13 14 15 16 17}=max{111212}=2 Diam2=max{fo 21 23 24 25 26 27}=max{111333}=3 Diam3=max{fo 31 32 34 35 36 37}=max{111323}=3 Diam4=max{fo 41 42 43 45 46 47}=max{111323}=3 Diam5=max{fo 51 52 53 54 56 57}=max{233312}=3 Diam6=max{fo 61 62 63 64 65 67}=max{122211}=2 DiamG2 = maxkV(Diamk) = 3 Diam7=max{fo 71 72 73 74 75 76}=max{233321}=3 For G2, shortest path from 7 to 2? from 1 to 5? Find the shortest path from Vk to Vh: Go down from Ek until you first encounter h. For G1, shortest path from 1 to 2? 4 3

  5. SubGraph Path pTrees E1 0 0 1 1 C1 0 0 1 1 E2 0 0 0 1 E3 1 0 0 1 C3 1 0 0 1 E4 1 1 1 0 C4 1 0 1 0 E1 2 1 0 1 1 1 0 1 1 &1= 0 1 1 1 E2 13 0 0 0 1 13 0 0 0 1 14 0 1 1 0 14 0 0 1 0 24 1 0 1 0 31 0 0 0 1 31 0 0 0 1 34 1 1 0 0 34 1 0 0 0 41 0 0 1 0 41 0 0 1 0 43 1 0 0 0 43 1 0 0 0 42 0 0 0 0 42 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 G1 SubGraph C in orange PC= 3 4 E3 143 1 0 0 0 143 1 0 0 0 342 0 0 0 0 342 0 0 0 0 314 0 1 1 0 314 0 0 1 0 134 1 1 0 0 134 1 0 0 0 142 0 0 0 0 142 0 0 0 0 241 0 0 1 0 243 1 0 0 0 341 0 0 1 0 341 0 0 1 0 413 0 0 0 1 413 0 0 0 1 431 0 0 0 1 431 0 0 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 To get the C Path pTree, just remove all C’ pTrees (in this case just E2) then AND each G pTree with PC. One can then remove the second bit of every resulting pTree to get a Path pTree for three Cvertices in standard form, but it is not necessary to do so. Instead we will think of vertex2 as being there but with no incident edges. That is tantamount to using the full vertex set for all SubGraphs and just removing all incident edges to vertices not in C. An advantage of this second point of view for pTree analytics is that all pTrees are the same depth and can operate on each oneanother. Diameter of C? The Ck-diameter, Cdiamk is the max of the min path lengths from k to the other Cvertices. For each k, proceed down from Ck a level at a time and record the first occurrence of kh , hk. DiamC= maxkV(Diamk) = 1 CDiam1=max{fo13 fo14}=max{11}=1 Diam3=max{fo31 fo34}=max{11}=1 Diam4=max{fo41 fo43}=max{11}=1 Assuming we always use pop-count to instantaneously compute the 1-count as we AND, then C is a clique iff all C1-counts are |VC|-1. In fact one can mine out all cliques by just analyzing the level=1 counts. Note: If one creates the G Path pTree, lots of tasks become easy! E.g., clique mining, shortest path mining, degree community mining, density community mining! What else? A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. k-plex existence alg (using the GPpT): C is a k-plex iff vC|Cv|  |VC|2–k2 k-plex inheritance thm: Every induced subgraph of a k-plex is a k-plex. Mine all max k-plexes: Use |Cv| vC Mine all max k-cores: Use |Cv| vC k-core inheritance thm:If  a cover of G by induced k-cores, G is a k-core. k-core existence alg (using the GPpT): C is a k-core iff vC,|VC|  k.

  6. A community is a subgraph with more edges inside than linked to its outside. clique is a community s.t.  edge between each vertex pair. As a V2 Rolodex card Clique Analytics for Graphs 1:2 2:3 3:2 4:3 2:3 Gene-Gene Interactions: # edges = 1B (109) Person-Tweet HomeLand Security: # edges = 7B*10K= 1014 12 Ekey V1 V2 ELabel 1,3 1 | 3 1 1,4 1 | 4 2 2,4 2 | 4 3 3,4 3 | 4 1 V1 E.g., Friend-Friend Social Nets: # edges = 4BB (1018) Stock-price Stock Market Advisor: # edges = 1013 Cust-Item Recommenders: # edges = 1MB (1015) E=Adj matrix 1:2 An Induced SubGraph (ISG) C, is a subgraph that inherits all of G’s edges on its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. C 1 2 3 2:3 G=Vertex-Labelled, Edge-Labelled Graph (C=Induced SubGraph with VC={1,3,4}) 1 3:2 PU 0 0 1 1_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 3:2 4:3 Bit offset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Ekey 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 PE 0 0 1 1_ 0 0 0 1_ 1 0 0 1_ 1 1 1 0 EL 0 0 1 2_ 0 0 0 3_ 0 0 0 1_ 2 3 1 0 PEL.,1 0 0 0 1_ 0 0 0 1_ 0 0 0 1_ 1 1 0 0 PEL.,0 0 0 1 0_ 0 0 0 1_ 1 0 0 0_ 0 1 1 0 P1 1 1 1 1_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 P2 0 0 0 0_ 1 1 1 1_ 0 0 0 0_ 0 0 0 0 P4 0 0 0 0_ 0 0 0 0_ 0 0 0 0_ 1 1 1 1 PEC=PE&PC 0 0 1 1_ 0 0 0 0_ 1 0 0 1_ 1 0 1 0 PUC=PU&PC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 1 P3 0 0 0 0_ 0 0 0 0_ 1 1 1 1_ 0 0 0 0 4:3 2 3 1 PUC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 Ct=3 PUD 0 0 1 0_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 Ct=1 PUF 0 0 0 1_ 0 0 0 1_ 0 0 0 0_ 0 0 0 0 Ct=2 PUH 0 0 0 0_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 Ct=2 V (vertex tbl) Vkey VL 1 2 2 3 3 2 4 3 PVL,1 1 1 1 1 PVL,0 0 1 0 1 PC 1 0 1 1 A Clique Existence Alg determines whether an induced subgraph (given by vertices) is a clique. Edge Count clique existence thm (EC): |EC|  |PUC| is COMB(|VC|,2)  |VC|! / ((|VC|-2)!2!) Apply EC to the 4 Induced 3 vertex subgraphs (3-Clique iff |PU|= 3!/(2!1!)=3) C only 3-Clique. VC={1,3,4} VH={2,3,4} VD={1,2,3} VF={1,2,4} SubGraph clique existence theorem (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. Which is better? Which will extend more easily to quasi-cliques? Which can be extended to an algorithm that mines out all cliques from a graph? A Clique Mining algorithm finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet By the SG clique thm, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. (We just need to check a single bit in PE.) Form CCSk+1: Union CSk pairs sharing k-1 vertices, check single PE bit. Below, k=2, so we check edge pairs sharing 1 vertex, then check the 1 new edge bit in PE. Already have 134 The only expensive part of this is forming CCSk. And that is expensive only for CCS3 (as in Apriori ARM) CS2=E={13 14 24 34} δintH - δintF - δintD - δintC- δextC=1–1/3=2/3 δextH=2/3-2/3=0 δextF=2/3-2/3=0 δextD=1/3–1=-2/3 kCint PE(3,4) = PE(4*[3-1]+4=12)=1 134CS3 PE(1,2) = PE(4*[1-1]+2=2)=0 Next? List out CS3 = {134} form CCS4 = . Done. Internal degree of v∈C, kvint =# of edges from v to vertices in C=134 6=kCint 2=|PC&PE&Pv1|=kv1int 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int PE(2,3)=PE(4*[2-1]+3=7)=0 1=kCext External degree of v∈C, kvext =# of edges from v to vertices in C’ 0=|P’C&PE&Pv3|=kv3ext 0=|P’C&PE&Pv1|=kv1ext 1=|P’C&PE&Pv4|=kv4ext kC=7 Total degree of C, kC= +kCext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 kvext kvint Internal degree of C, kCint =vC External degree of C, kCext =vC Tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Differencealgorithm: kCint – kCext> Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are ubiquitous for complex data in all of science. Ignoring Subgraphs of  2 vertices, the four 3-vertex subgraphs are: C={1,3,4}, D={1,2,3}, F={1,2,4}, H={2,3,4} δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 D δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 F H One could use label values (weights) instead of the 0/1 existence values.

  7. Clique Mining using the SubGraph Algorithm 12 13 14 16 23 24 34 56 67 k=2: 12 13 14 15 16 17 Turn PU into a positions list = {2 3 4 6 10 11 18 34 42}. 23 24 25 26 27. Find endptsof each edges (Int((n-1)/7)+1, Mod(n-1,7) +1) 34 35 36 37 45 46 47 56 57 67 Using the SubGraph clique theorem to find all k-Cliques. k=3: 123 124 126 127 134 135 136 137 145 146 147 156 157 167 234 235 236 237 245 246 247 256 257 267 345 346 347 356 357 367 456 457 467 567 key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 3,6 3,7 4,1 4,2 4,3 4,4 4,5 4,6 4,7 5,1 5,2 5,3 5,4 5,5 5,6 5,7 6,1 6,2 6,3 6,4 6,5 6,6 6,7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 E 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 EU 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 C 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CU 0 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 k=4: 1234 (since 3 3subgraphs are 3cliques, 123 124 234) 123 and 134 give 1234. 123 and 234 give 1234. 124 and 134 give 1234. 124 and 234 give 1234. 134 and 234 give 1234. Therefore, 1234 is a 4-clique and the only 4-clique So there are 5 cliques: 123 124 134 234 1234, 4 3-Cliques and 1 4-Clique. Using the EdgeCountthm: on C={1,2,3,4}, CU=C&EU C is a clique since ct(CU)=comb(4, 2)=4!/2!2!=6 6 6 In this example graph there are five 3Cliques and the one 4Clique. Let’s see if SG can find them (and how efficiently.). 5 5 7 7 Pairs that share 3 Slowest part of this alg is the generation of CCS, the Candidate Clique Set? Evaluating a candidate = 1 bit lookup in PE. The generation of CCS is like Apriori ARM. PE(2,4)=1 1234CS4 key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 3,6 3,7 4,1 4,2 4,3 4,4 4,5 4,6 4,7 5,1 5,2 5,3 5,4 5,5 5,6 5,7 6,1 6,2 6,3 6,4 6,5 6,6 6,7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 PE 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 Already have 123CS3 PE(2,6)=0 PE(2,4)=1 124CS3 PE(6,7)=1 567CS3 PE(2,3)=1 So 123CS3 already have 567 Have 124CS3 Have 134 PE(1,5)=0 PE(1,7)=0 EC, requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). EdgeCount Algorithm (EC): |PUC| = (k+1)!/(k-1)!2! then CCCS The SG alg only needs Edge Mask pTree, E, and a fast way to find those pairs of subgraphs in CSk that share k-1 vertices (then check E to see if the two different kth vertices are an edge in G. Again this is a standard part of the Apriori ARM algorithm and has therefore been optimized and engineered ad infinitum!) Have 124CS3 PE(1,4)=1 134CS3 PE(2,3)=1 234CS3 2 2 1 1 Have 567 Pairs that share 2 Pairs that share 4 Have 1234 Pairs share 7 k=2: 12 13 14 16 23 2434 56 57 67 = E = CS2. k=3: 123 124 134 234 567= CS3. Pairs that share 1 4 4 3 3 Pairs share 5 Pairs share 6

  8. More Clique Mining using the SubGraph thm (SG) key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 8.1 8,2 8,3 8,4 8,5 8,6 8,7 8.8 E 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 There are 11 3cliques, 4 4cliques and 1 5clique. ( adding 1 vertex, V8, and 4 edges, (1,8) (2,8) (3,8) (4,8) ) Note there are many pTree and other data structures we can employ to aid in performing the CCS creation as well as other “path” based needs. These include the following (but there may be others????): • 2-level, stride=|V|, pTree for E • An ExE relationship matrix showing (using a 1-bit) which edge pairs form a 2 path. Then an ExExE matrix showing which edge triples form a 3 path, etc. 8 k=2: 12 13 14 16 23 2434 56 57 67 18 28 38 48 =E=CS2=edges. PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(4,8)=1 2348CS4 PE(3,8)=1 1348CS4 Have 567 6 Have 1348 Have 1248 k=4: 1234 1238 1248 1348 2348 = CS4. 5 PE(2,6)=0 7 PE(1,5)=0 Have 1238 PE(1,7)=0 Already have 123CS3 PE(2,4)=1 124CS3 Have 1334 Have 124CS3 Have 134 PE(2,3)=1 123CS3 PE(2,4)=1 1234CS4 Have 1334 have 567 Have 1348 Have 124CS3 Have 238 Have 138 PE(1,4)=1 134CS3 PE(4,8)=1 348CS3 PE(6,8)=0 Have 348 2 1 have 12348 have 12348 have 12348 have 1234 have 12348 Have 248 PE(2,3)=1 234CS3 PE(4,8)=1 248CS3 k=5: 12348 = CS5. PE(6,7)=1 567CS3 Have 128 PE(4,8)=1 148CS3 PE(4,8)=1 12348CS5 have 12348 have 12348 have 12348 have 12348 PE(2,8)=1 128CS3 Have 1234 PE(3,8)=1 238CS3 PE(3,8)=1 138CS3 Have 348 k=3: 123 124 134 234 567 128 138 148 238 248 348 = CS3. 4 3

  9. Mining for Communities with more relaxed definitions than cliques (from Fortunatos survey) There are many cohesiveness definitions other than a Clique. Another criterion for subgraph cohesion relies idea that a vertex must be adjacent to some min # of other vertices. In social network analysis there are two complementary ways of expressing this. A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. An LS-set is a subgraph such that the internal degree of each vertex is greater than its external degree. A community is strong if the internal degree of any vertex exceeds the number of edges that the vertex shares with any other community. A community is weak if its total internal degree exceeds the number of edges shared by the community with the other communities. Another def focuses on robustness of clusters to edge removal and uses of edge connectivity. Edge connectivity of a pair of vertices is min # of edges need to be removed to disconnect (no path between). A lambda set is a subgraph s.t. any pair of vertices has larger edge connectivity than any pair formed by one vertex of the subgraph and one outside the subgraph. However, vertices of a lambda-set need not be adjacent and may be quite distant from each other. Communities can also be identified by a fitness measure, expressing to which extent a subgraph satisfies a given property related to its cohesion. The larger the fitness, the more definite is the community. This is the same principle behind quality functions, which give an estimate of the goodness of a graph partition. The simplest fitness measure for a cluster is its intra-cluster density int(C) (see slide 1). One could say subgraph C with k vertices is a cluster if int(C)>threshold. Finding such subgraphs is NP-complete, as it coincides with the NP-complete Clique Problem when the threshold =1. It is better to fix the size of the subgraph because, without this conditions, any clique would be one of the best possible communities, including trivial two-cliques (simple edges). Variants of this problem focus on the number of internal edges of the subgraph. 8 k-plex’s are subgraphs s.t. each vertex is adjacent to all other vertices of the subgraph except at most k of them. k-plex existence algorithms: C is a k-plex iff vVC, |PUC|  COMB(|VC|,2) – k k-plex inheritance theorem: Every induced subgraph of a k-plex is a k-plex. Proof: Let C be an induced subgraph of G. A vertex of C cannot be missing more adjacent C-edges in C than it is missing adjacent C-edges as a vertex in G, because every missing edge in C is also missing in G (If an edge (v,w) is missing in the induced graph, C then since v,w are vertices in G, that edge (v,w) cannot be in EG, lest it would have been induced into C). Edge Count k-plex existence theorem: C is a k-plex iff |PUC|  (|VC|!/((|VC|-2)!2!))-k Mining all maximal k-plexes: Start with G by checking |PUG|. If G is a k-plex, so are all induced subgraphs (Inheritance Thm.) Done. Else check |PUC| induced subgraph C s.t. |VC|=|VG|-1.  such C that is not a k-plex, check |PUD|  induced subgraph, D of C s.t. |VD|=|VC|-1. Continue this until all induced subgraphs that are maximal k-plexes have been identified. A k-coreis a subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. There is a hierarchy of cores of different order. Edge Count k-core existence theorem: C is a k-core iff |PUC|  k k-core inheritance theorem:If  a cover of G by induced k-cores, then G is a k-core. Mining k-cores: If C is s k-core and D is a supergraph s.t. VD -VC={w1,…,wW}, D is s k-core iff degD(wh)k h=1..W Note degD(w)=|PDU&PW| = |PD0n| where w is the nth vertex. So compute all |PD0k| then one can build the hierarchy of k-cores in D by examining the set of vertices where this deg is k=max. Any k-core, would have to be a subset of that set. Then go to k=max-1 6 Springer, May 2015 Charu C. Aggarwal. Comprehensive textbook on data mining (see secret site) Mohammad Zaki’s Data Mining book (See secret site) 5 S. Fortunato, “Community Detection in Graphs.“ (see secret site). Bipartite Communities Matthew P. Yancey April 15, 2015 (see secret site) 7 Degree Calculations using pTrees key 1,1 2,1 3,1 4,1 5,1 6,1 7,1 8,1 1,2 2,2 3,2 4,2 5,2 6,2 7,2 8,2 1,3 2,3 3,3 4,3 5,3 6,3 7,3 8,3 1,4 2,4 3,4 4,4 5,4 6,4 7,4 8,4 1,5 2,5 3,5 4,5 5,5 6,5 7,5 8,5 1,6 2,6 3,6 4,6 5,6 6,6 7,6 8,6 1,7 2,7 3,7 4,7 5,7 6,7 7,7 8,7 1,8 2,8 3,8 4,8 5,8 6,8 7,8 8,8 Ec1 0 1 1 1 1 1 1 1 key 1 2 3 4 5 6 7 8 Er1 1 1 1 1 1 1 1 1 key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 8.1 8,2 8,3 8,4 8,5 8,6 8,7 8.8 Ec 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 Uc 0 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 V1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Er 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 Ur 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 2 E1 2 3 4 5 6 7 8 1 0 1 1 1 0 1 0 1 2 1 0 1 1 0 0 0 1 3 1 1 0 1 0 0 0 1 4 1 1 1 0 0 0 0 1 5 0 0 0 0 0 1 1 0 6 1 0 0 0 1 0 1 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 U1 2 3 4 5 6 7 8 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 0 0 1 0 0 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 1 Ec0 Er0 4 1 1 1 0 0 0 0 1 4 1 1 1 0 0 0 0 1 1 0 1 1 1 0 1 0 1 2 1 0 1 1 0 0 0 1 3 1 1 0 1 0 0 0 1 5 0 0 0 0 0 1 1 0 6 1 0 0 0 1 0 1 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 2 1 0 1 1 0 0 0 1 3 1 1 0 1 0 0 0 1 5 0 0 0 0 0 1 1 0 6 1 0 0 0 1 0 1 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 Deg(Vk,C)=|PC&PVk|=|PCrk| Uc1 1 1 1 1 1 1 0 0 Ur1 0 1 1 1 1 0 1 1 V1&Er1-8 = Er0.1 so we don’t need to precompute the 2-level pTrees but it saves 1 AND each time. 4 3 Uc0 Ur0 2 0 0 1 1 0 0 0 1 3 0 0 0 1 0 0 0 1 4 0 0 0 0 0 0 0 1 3 1 1 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 8 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 2 1 0 0 0 0 0 0 0 5 0 0 0 0 0 1 1 0 6 0 0 0 0 0 0 1 0 6 1 0 0 0 1 0 0 0 7 0 0 0 0 1 1 0 0

  10. APPENDIX: AC, is confident if a high fraction of the fF which are related to every aA, are also related to every cC F is the Focus Entity and the high fraction is the MinimumConfidence ratio. F 1 1 1 1 4 1 1 0 1 3 1 0 1 1 2 1 0 1 1 1 R(E,F) E 2 2 3 3 4 4 5 5 ct(&flist&eAReSf & PC) / &flist&eAReSf  mncf ct(&f&eAReSf & PC) / &f&eAReSf  mncf 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 0 ct(PA&f&gCSgRf ) / ct(PA)  mncf • G • C • S(F,G) 4 SuppSetA (set of F’s related to every element of A) = {2,3,5}  F SuppSetC = {2,4,5} 2/3 = ConfAC = |SuppSetAC|/|SuppSetA| = ct(&eACPe) / A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confidentrules 3 2 1 SRMs are categorized by the number of hops, k, whether transitive or non-transitive and by the focus entity. ARM is 1-hop, non-transitive (A,CE), F-focused SRM (1nF) ct(&eAPe) A C • F 1 0 0 1 1 1 0 1 0 1 0 1 4 0 0 0 1 ct(&eARe)  mnsp 3 1-hop, transitive (AE,CF), F-focused SRM (1tF) ct(&eARe &PC) / ct(&eARe)  mncf 0 0 1 0 2 0 0 0 1 1 antecedent downward closure: If A is frequent, all subsets are frequent (A is infrequent, supersets infreq) Since frequency involves only A, we can mine for all qualifying antecedents using downward closure. Question: Why isn’t ConfAC = SuppC / SuppA? • R(E,F) • A  • E consequent upward closure: If AC is non-confident, then so is AD for all subsets, D, of C. So  frequent antecedent, A, use upward closure to mine for all of its' confident consequents. Transitive (a+c)-hop Apriori strong rule mining with a focus entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even then one can use downward/upward closure on that step in the mining of strong (frequent and confident) rules. In this case A is 1-hop from F (odd, use downward closure). C is 0-hops from F (even, use upward closure). We will be checking more examples to see if the Odddownward Evenupward theorem seems to hold. ct(PA&fCRf) / ct(PA)  mncf |A|=ct(PA)  mnsp 1-hop, transitive, E-focused rule, AC SRM (1tE) antecedent upward closure: If A is infrequent, then so are all of its subsets. 2-hop transitive F-focused ct(&eARe &gCSg) / ct(&eARe)  mncf ct(&eARe)  mnsp and AC strong if: consequent downward closure) If AC is non-confident, then so is AD for all supersets, D, of C. In this case A is 0-hops from E (even, use upward closure). C is 1-hop from E (odd, use downward closure). 1,1 odd so down, down correct. Apriori for 2-hops: Find all freq antecedents, A, using downward closure.  find C1G, the set of g's s.t. A{g} is confident. Find C2G, set of C1G pairs that are confident consequents for antecedent, A. Find C3G, set of triples (from C2G) s.t. all subpairs are in C2G (ala Apriori), etc. 2,0 even so up,up is correct. 0,2 even so up,up is correct. 2-hop trans G-foc ct(&flist&eAReSf)mnsp ct(&fl&eAReSf)mnsp 2-hop trans E-foc ct(PA)mnsp antecedent upward closure: If A is infrequent, so are all subsets. 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. consequent upward closuree: If AC non-conf so is AD for all subsets, D. 2. (consequent upward closure) If AC non-conf, so is AD for all subsets, D.

  11. We can form multi-hop relationships from RoloDex cards. 1 1 1 1 1 1 1 1 1 1 1 1 1 … … … … … … … … … … … … … 3 7 9 9 9 9 9 9 9 3 9 7 9 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 2 3 T 1 2 … 9 D 1 2 3 D 7 9 7 … … … 2 2 2 1 1 1 Pos Pos Term AC confident if most of the fF related to every aA, are also related to every cC. F is the Focus Entity and “most” means at least a MinimumConfidence ratio. Protein-Protein Interaction RoloDex (different card for each interaction in some pathway) Market Basket RoloDex w different Cust-Item card for each day A confident DThk rule means: A high fraction of the terms, tT in Position=h of every doc A, are also in Position=k of every doc C. Is there a high payoff research area here? • Cust Gene 0 0 0 0 0 0 D P T P T D • I 1 1 C C C C C • B C DP (T=k) DT (P=k) • Buys (Day=2) PD (T=k) TP (D=k) PT (D=k) TD (P=k) 0 0 0 0 1 1 … … 1 1 1 1 1 1 1 0 0 0 0 0 0 3 3 Confident TDhk rule means a high fraction of the Documents, dD having in Position=h, every Term, t A, also have in Position=k, every Term, t C. Again, A,C must be singletons. Hi payoff? It suggests in 1-hop ARM: … … … … … … … Interaction=k • Buys (Day=k) Gene • Item 3 7 9 7 9 3 3 Conf Buy12 rule: Custs who Buy A on Day=1, Buy B on Day=2 w hi prob T P D T P • C D Conf TD rules: hi fraction of Docs, dD having every term t A also have every term t C. Again, A,C must be singletons. Is there a high payoff research area here? DTPe k=1..3 PTCd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 DTPe k=1..9 PDCd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 … … … … … … … 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 3 9 3 7 9 3 • EI • Buys (Day=4) DT (P=h) DP (T=h) PT (D=h) TP (D=h) PD (T=h) • Buys (Day=1) TD (P=h) T P P • I T D D A A A • A A A A 1 … A confident DPhk rule means: Hi fraction of Positions, pP which hold Term=h for every doc A, hold Term=k in Pos=p for every doc C Conf PDhk rule: A high fraction of the Documents, dD having Term=h in every Pos, pA, also have Term=k in every Pos. pC. 3 DC Buys (Day=2) • C • I • I 1 0 0 0 • Buys (Day=2) 0 0 1 … 1 0 0 0 0 0 0 3 “Buys” pathways? 0 0 1 … 0 0 0 I 3 Buys day=3 C DTPe k=1..7 TDRolodexCd • Buys (Day=3) • C Conf Buy123 pathway: Most custs who Buy A Day=1 Buy B Day=2. Most of those custs Buy all of D on Day=3 0 0 0 1 0 0 1 … 0 0 0 1 0 0 0 0 0 1 3 … 0 0 0 Buys day=1 3 I A • Buys (Day=1) • I • A A confident PThk rule means: A high fraction of the Terms, tT in Doc=h which occur at every Pos, p A, also occur at every Pos, pC in Doc=k Is this a high payoff research area? Conf Buy1234 pathway: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2, then most of those customers will Buy all of D on Day=3 And most of those customers Buy all of E Day=4 Conf TPhk: Hi fraction of pP in Doc=h holding every t A, also hold every t C in Doc=k This only makes sense for A ,C singleton Terms. Also it seems like P would have to be singleton?

  12. RoloDex Model: 2 Entitiesmany relationships DataCube Model for 3 entities, items, people and terms. Item 4 3 2 1 Author People  2 1 2 2 3 3 4 4 3 4 5 5 5 6 7  Customer 1 1 1 1 1 1 1 1 1 1 1 Enrollments 2 1 1 1 1 1 1 1 3 Doc 1 4 movie 2 Course 3 term  G 3 0 0 0 5 0 4 0 5 0 0 0 1 0 1 2 3 4 5 6 7 Doc 0 0 3 0 0 customer rates movie card 0 2 2 0 3 4 0 0 0 0 1 0 0 1 0 0 4 0 0 5 0 t 3 2 1 1 2 3 PI PI termterm card (share stem?) Gene 4 5 3 6 4 7 Relational Model: 5 6 1 People: p1 p2 p3 p4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Items: i1 i2 i3 i4 i5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| Terms: t1 t2 t3 t4 t5 t6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p1i1 t1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 1 3 Gene Exp 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 customer rates movie as 5 card 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 3 4 people 1 5 items 3 2 1 4 terms 3 2 1 One can form multi-hops with any of these cards. Are there any that provide and interesting setting for ARM data mining? cust item card termdoc card authordoc card genegene card (ppi) docdoc People  expPI card expgene card genegene card (ppi)

  13. Collapse T: TC≡ {gG|T(g,h) hC} That's just 2-hop w TCG replacing C. ( can be replaced by . Collapse T and S: STC≡{fF |S(f,g) gTC} Then it's 1-hop w STC replacing C. 3-hop 4 U(H,I) 3 2 1 C J C I H ct(&eARe  mnsup 0 1 0 1 4 U(H,I) G S(F,G) 0 0 0 1 3 1 0 1 0 0 1 0 1 4 2 0 0 0 1 0 0 0 1 3 1 ct(&f&eAReSf ct( &f&eAReSf &h&iCUiTh) &h&iCUiTh ) / ct(&f&eAReSf) / ct(&f&eAReSf) 1 0 1 0  mncnf 2 V(I,J) I 0 0 0 1 1 H T(G,H) G S(F,G) 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 F 0 1 0 1 4 G ct(&f&eAReSf) U(G,I) 0 0 0 1  mnsp 3 Sn(G,G) 1 0 1 0 0 1 0 0 ... 4 / ( (ct(&eARe))n 2 4 *ct(&iCUi) ) mncnf ct(Sn(&eARe &iCUi)) ) 0 0 0 1 0 0 0 1 C 3 I 1 3 S1(G,G) 0 0 1 0 2 2 T(G,H) 0 0 0 1 1 1 F A ct(S2(&eARe &iCUi))+... (ct(S1(&eARe &iCUi))+ R(E,F) E 1 1 1 1 0 1 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 0 0 1 0 0 4 0 0 0 1 3 0 0 1 0 G 2 0 0 0 1 1 A 0 1 0 0 4 R(E,F) E 0 0 0 1 3 0 0 1 1 2 0 0 1 1 1 A R(E,G) E ct(&eARe / ct(&eARe &g&hCThSg)  mncnf C H G S(F,G) 0 1 0 1 4 0 0 0 1 3 1 0 1 0 2 0 0 0 1 1 T(G,H) ct(&f&eAReSf &hCTh) / ct(&f&eAReSf)  mncnf F 0 1 0 0 4 0 0 0 1 3 0 0 1 0 2 0 0 0 1 1 A R(E,F) E Focus on F antecedent downward closure: A infreq. implies supersets infreq. A 1-hop from F (down consequent upward closure: AC noncnf implies AD noncnf. DC. C 2-hops (up 4-hop APRIORI focus on G: 5-hop Focus on G antecedent upward closure: A infreq. implies all subsets infreq. A 2-hop from G (up) consequent downward closure: AC noncnf impl AD noncnf. DC. C 1-hops (down) Focus on F different because the confidences can be different. Focus on G. ct(&f=2,5Sf &1101 ) / ct(&f=2,5Sf ct( 1000 )/2 = 1/2 ct( 1001 &1001&1000&1100) / 2 = ct( 1001 &g=1,3,4 Sg ) /ct(1001) = 4-hop Focus on G? Replace C by UC; A by RA as above (not different from 2 hop?) Focus on H (RA for A, use 3-hop) or focus on F (UC for C, use 3-hop). / ct(1101 & 0011 ) = 1/1 =1 ct(1101 & 0011 & &1101 ) Another focus on G(main) ct(&f&eAReSf)  mnsup 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 2. (consequent upward closure) If AC is non-confident, then so is AD for all subsets, D, of C (the "list" will be larger, so the AND over the list will produce fewer ones) So  frequent antecedent, A, use upward closure to mine out all confident consequents, C.

  14. Given any 1-hop labeled relationship (e.g., cells have values from {1,2,…,n} then there is: 1. a natural n-hop transitive relationship, A implies D, by alternating entities for each specific label value relationship. 2. cards for each entity consisting of the bitslices of cell values. E.g., in netflix, Rating(Cust,Movie) has label set {0,1,2,3,4,5}, so in 1. it generates a bonafide 6-hop transitive relationship. In 2. an alternative is to bitmap each label value (rather than bitslicing them). Below Rn-i can be bitslices or bitmaps D C D Types (of Items) Buys(C,T) M 0 1 0 1 4 4 R5(C,M) 0 0 0 1 3 3 1 0 1 0 2 2 0 0 0 1 1 1 R0(M,C) C Customers M A  Items R3(C,M) 0 1 0 1 0 1 0 0 4 20 0 0 0 1 0 0 0 1 3 19 1 0 1 0 0 0 1 0 2 0 0 0 1 18 0 0 0 1 1 17 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 R4(M,C) 15 C 14 R1(C,M) D M F 13 0 1 0 0 4 12 0 0 0 1 3 0 0 1 0 11 2 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 10 4 1 0 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 3 R2(M,C) 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 2 A C 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 7 R0(E,F) A  E 6 ... 5 4 Rn-2(E,F) &tDBt)mnsp, etc. ct(&iABBi 3 Rn-1(E,F) 2 1 BoughtBy(I,C,) E.g., equity trading on a given day, QuantityBought(Cust,Stock) w labels {0,1,2,3,4,5} (n means nK shares) generates a bonafide 6-hop: equity trading - moved similarly, (moved similarly on a day --> StockStock(#DaysMovedSimilarlyOfLast10) equity trading - moved similarly2, (moved similarly: stock2 moved similarly stock1 previous day. StockStock(#DaysMovedSimilarlyOfLast10) Bipartite G=((V,W),E) Gene-Experiment, Label values could be "expression level". Intervalize and go! v1 1 1 Has Strong Transitive Rule Mining (STRM) been done? Are their downward/upward closure theorems already for it? Is it useful? That is, are there good examples of use: stocks, gene-experiment, MBR, Netflix predictor,... v2 1 0 Let Types be an entity which clusters Items, E.g., in a store, Types might incl; dairy, hardware, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and let D by a TypesSet which does not include TA. Then: v3 1 0 0 0 w1 w2 AD: If iA s.t. BB(i,c) then tT, B(c,t) AD: If iA s.t. BB(i,c) then tT, B(c,t) 0 AD: If iA s.t. BB(i,c) then tT, B(c,t) AD: If iA s.t. BB(i,c) then tT, B(c,t) ct( |iABBi)mnsp ct( |tDBt)mnsp ct(&tDBt)mnsp ct(&iABBi)mnsp AD frequent: w2 0 ct(&iABBi &tDBt) / ct(&iABBi)mncf AD confident: ct( | iABBi | tDBt) / ct( | iABBi)mncf ct(&iABBi | tDBt) / ct(&iABBi)mncf v1 w1 ct( | iABBi &tDBt) / ct( | iABBi)mncf G=Unipartite graph (VW, EVW) v1 1 1 v2 v3 v2 1 0 v3 1 0 w1 w2 v1 v2 w1 v3 w2 Big Graph Mining (BipartiteGraphs) Recommenders: N=B, M=M, NM=MB Social Nets: N=M=2B, NM=4BB Closure: An induced Subgraph (ISG), C, of a graph, G, inherits all of G’s edges between its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. # E 1 1 : 1 Ct key 1 2 : M Gene-Gene Ints: N=M=25K, NM=625M key 1,1 1,2 : 1,N_ 2,1 2,2 : 2,N_ . . . _ M,1 M,2 : M,N E 0 1 : 0 0 0 : 0 . . . 1 0 : 1 U 0 1 : 0 0 0 : 0 . . . 0 0 : 0 Assume graph is Bipartite G=(I,C,E) (Unipartite iff C=I) |I|=N, |C|=M (|E|=MN) 2 level pTrees stride=N: Ct key 1 2 : : N # E1 0 1 : : 0 # E2 0 0 : : 0 # EM 1 0 : : 1 U=Unique. For Bipartite & Directed graphs, E=U So, Communities in bipartite graphs studied as unipartite? # U 1 1 : 1 tree is bipartite. Cycle graphs w even # of vertices bipartite. Planar graph whose faces all even length is bipartite # U1 0 1 : : 0 # U2 0 0 : : 0 # UM 0 0 : : 0 e.g., UM masks items of cust=M, friends of person=M, genes interacting with gene=M.

  15. 1 2 3 T 5 6 1 2 … 9 D 1 2 3 D 7 7 9 .Doc … … … 3 2 1 1 2 2 2 1 1 1 Term Pos Pos apple always. buy always. always. apple buy buy apple AAPL April AAPL $AAPL an April an an April are are are and and and all all all Text Mining using pTrees P1D1 noun P1D1 adj 1 1 1 1 1 0 0 1 1 0 0 0 0 0 DTtf DocTerm termfreq Data Cube 0 0 0 0 0 0 0 0 0 0 tf is the +rollup of the DTPe datacube along the position dimension. One can use any measurement or data structure of measurements, e.g., DT tfidf in which each cell has a decimal tfidf, which can be bitsliced directly into whole number bitslices plus fractional bitslices (one for each binary digit to the right of the binary point-no need to shift!) using: MOD(INT(x/(2k),2), e.g., a tfidf =3.5 is k: 3 2 1 0 -1 -2 bit: 0 0 1 1 1 0 0 0 0 0 0 0 0 1 DT SR DocTerm StockRating Cube 3 3 2 2 0 0 0 0 0 0 0 2 1 1 0 0 0 1 0 0 0 0 3 1 0 0 .Docs .Docs 0 0 0 0 0 0 0 0 DTPe TpTreeSet index (D,P) Positions 1 2 … 0 0 Rating of T=stock at doc date close: 1=sell, 2=hold,3=buy 0=non-stock Term 0 0 0 0 0 0 0 0 0 0 Doc3 Doc2 Doc1 0 0 0 0 0 0 1 0 0 0 0 2 0 1 DTPe k=1..3 PTCd DTPe k=1..9 PDCd DTPe Term Usage Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 DTPe Term Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 0 0 PT card D=k k=1,2,3 0 0 1 noun verb adj adv …noun 1 1 0 1 ... 0 … 0 . . . . . . Terms Terms 9 0 … 0 . . . 1 … 1 9 adj noun noun adj noun DTPe DocTbl DpTreeSet indexed by (T,P)) Position 1 2 3 4 5 6 7 DT tfidf Doc Table: Doc T1 T2 . . . T9 DTPe Document Table: Doc T1P1…T1P7 . . . T9P1…T9P7 Classical Document Table: Doc Auth… Date . . .Subj1 …Subjm T2k1 T2,R=hold T2,R=sell T1k0 T1k-1 T1k-2 DTPe Data Cube DTPe k=1..7 TDRolodexCd Term 1 .75 0 . . . 1 1 1 1/2/13 . . . 0 … 0 1 1 … 0 . . . 0 … 0 TDcard P=k k=1..7 PDcard T=k k=1..9 T2k2 T2,R=buy Subjm Subj1 Date Auth T1k1 ... Term 1 0 0 0 0 1 1 buy DTPe Position Table Pos T1D1 T1D2 T1D3...T9D1…T9D3 0 . . . 0 2 0 2/2/15 . . . 1 … 0 2 0 1 .25 2 0 … 0 . . . 1 … 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 0 1 0 . . . 1 0 3 0 3/3/14 . . . 1 … 1 3 0 0 0 3 0 … 0 . . . 1 … 1 1 0 0 0 0 0 0 . . . 1 1 0 1 ... 0 … 0 0 1 0 0 0 0 0 . . . AAPL . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . all 7 0 … 0 . . . 1 … 1 DT SR bitmap DpTreeSet DT tfidf DpTreeSet Classical DocTbl DpTreeSet 0 0 0 0 0 0 0 Doc3 Doc2 Doc1 0 0 0 0 0 1 0 . . . always DT SR bitslice DpTreeSet 0 0 0 0 0 0 0 Term buy DTPe in PpTreeSet index (T,D) 0 0 0 0 0 0 0 . . . an 0 0 0 1 0 0 1 0 0 1 0 0 0 0 . . . 0 0 1 0 0 0 1 . . . and 0 0 0 0 0 0 0 . . . 0 1 0 0 0 0 0 . . . 1 0 0 0 0 1 0 . . . 0 0 0 0 0 0 0 . . . apple 0 0 0 0 0 0 0 . . . 7 1 2 4 3 April 0 0 0 0 0 0 0 . . . are Pos

  16. Horiz Vertex data Vertical Vertex data Vkey VLabel 1 2 2 3 3 2 4 3 Stride=4. Two-Level pTrees V2 As Rolodex card E Ekey V1 V2 ELabel 1,3 1 | 3 1 1,4 1 | 4 2 2,4 2 | 4 3 3,4 3 | 4 1 Ekey 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 PEL.,1 0 0 0 1_ 0 0 0 1_ 0 0 0 1_ 1 1 0 0 PEL.,1 0 0 1 0_ 0 0 0 1_ 1 0 0 0_ 0 1 1 0 PLT 0 1 1 1_ 0 0 1 1_ 0 0 0 1_ 0 0 0 0 PC 1 0 1 1_ 0 0 0 0_ 1 0 1 1_ 1 0 1 1 Pv1 1 1 1 1_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 Pv2 0 0 0 0_ 1 1 1 1_ 0 0 0 0_ 0 0 0 0 Pv4 0 0 0 0_ 0 0 0 0_ 0 0 0 0_ 1 1 1 1 PE 0 0 1 1_ 0 0 0 1_ 1 0 0 1_ 1 1 1 0 EL 0 0 1 2_ 0 0 0 3_ 0 0 0 1_ 2 3 1 0 Pv3 0 0 0 0_ 0 0 0 0_ 1 1 1 1_ 0 0 0 0 PD 1 1 1 0_ 1 1 1 0_ 1 1 1 0_ 0 0 0 0 PF 1 1 0 1_ 1 1 0 1_ 0 0 0 0_ 1 1 0 1 PH 1 1 0 1_ 1 1 0 1_ 0 0 0 0_ 1 1 0 1 1:2 2:3 3:2 4:3 VL 2 3 2 3 PVL,1 1 1 1 1 PVL,0 0 1 0 1 PC 1 0 1 1 12 2:3 L=0 PE,1 0 0 1 1 L=0 PE,2 0 0 0 1 L=0 PE,3 1 0 0 1 L=0 PE,4 1 1 1 0 1:2 L=1 PE =PLT 1 1 1 1 C 1 2 3 2:3 Fixed Pt Colmn 1 3:2 3:2 4:3 1 4:3 2 3 1 PLT,1 0 1 1 1 PLT,2 0 0 1 1 PLT,3 0 0 0 1 PLT,4 0 0 0 0 V1 E=Adjacency matrix Vertex-Labelled, Edge-Labelled Graph Useful masks A community has more edges inside than linked to the outside. Let Subgraph, C, have nc vertices of a graph, G, having n vertices. Internal degree of v∈C, kvint =# of edges from v to vertices in C 6=kCint 2=|PC&PE&Pv1|=kv1int 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int kC=7 External degree of v∈C, kvext =# of edges from v to vertices in C’ 1=kCext 0=|P’C&PE&Pv1|=kv1ext 0=|P’C&PE&Pv3|=kv3ext 1=|P’C&PE&Pv4|=kv4ext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 δintC- δintD - δintH - δintF - δextC=1–1/3=2/3 δextH=2/3-2/3=0 δextF=2/3-2/3=0 δextD=1/3–1=-2/3 kCint Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 The tradeoff between large δint(C) and small δext(C) is goal of community mining and clustering algorithms. The simple ways is to Maximize Differences, δint(C)−δext(C) = D (or Dk=kCint – kCext) over all clusters (use Sum of Differences for partitions). It is easy to compute each SD and SDk with pTrees, even for BigDataGraphs. One can use downward (upward?) closure properties (precisely) to facilitate maximizing differences over all clusters, C? Graphs are the ubiquitous data structures for complex data in all of science. A table is a graph with no edges, a relationship is a bipartite graph… Extend to multigraphs (edge sets =vertex triples, quadruples, etc.). Ignoring Subgraphs of 1 or 2 vertices, the other three 3subgraphs are D={1,2,3}, F={1,2,4}, H={2,3,4} Total degree of C, kC= +kCext kvint kvext External degree of C, kCext =vC Internal degree of C, kCint =vC δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 D F δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 H δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 Maximizing Difference of Cluster Densities: C is strongest community (subgraph/cluster). One could use label values (weights) instead of the 0/1 existence values.

  17. 2-lev, str=|V|=8, pTrees for path analytics? Find all paths of length=3 that start at vertex: key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 8.1 8,2 8,3 8,4 8,5 8,6 8,7 8.8 E 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 U 0 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E1key 1 2 3 4 5 6 7 8 E 1 1 1 1 1 1 1 1 2 1 0 1 1 0 0 0 1 4 1 1 1 0 0 0 0 1 6 1 0 0 0 1 0 1 0 8 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 3 1 1 0 1 0 0 0 1 5 0 0 0 0 0 1 1 0 7 0 0 0 0 1 1 0 0 E0key 1 2 3 4 5 6 7 8 3rd: P’h&E0kkEOh V=h 1st, 2ndEOh h=4 next h=7 EO7= h=1 EO1 h=6 EO6= h=3 EO3= h=4 EO4= h=2 EO2= h=8 EO8= h=5 EO5= 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 1 0 1 0 U1key 1 2 3 4 5 6 7 8 U 1 1 1 1 1 1 0 0 2 0 0 1 1 0 0 0 1 4 0 0 0 0 0 0 0 1 6 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 3 0 0 0 1 0 0 0 1 5 0 0 0 0 0 1 1 0 U0key 1 2 3 4 5 6 7 8 E06= 765 E01= 812 813 814 816 E03= 831 832 834 E05= 756 E02= 821 823 824 E01= 612 613 614 618 E07= 675 E04= 341 342 348 E02= 321 324 328 E01= 312 314 316 318 E07= 576 E06= 561 567 E02= 421 423 428 E01= 412 313 416 418 E03= 431 432 438 E05= 657 E08= 182 183 184 E04= 841 842 843 E03= 132 134 138 E04= 142 143 148 E06= 165 167 E01= 213 214 216 218 E08= 481 482 483 E04= 248 E08= 281 283 284 E02= 123 124 128 E08= 381 382 384 E03= 231 234 238 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 8 E1 2 3 4 5 6 7 8 1 0 1 1 1 0 1 0 1 2 1 0 1 1 0 0 0 1 3 1 1 0 1 0 0 0 1 4 1 1 1 0 0 0 0 1 5 0 0 0 0 0 1 1 0 6 1 0 0 0 1 0 1 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 E1key 1 2 3 4 5 6 7 8 E 1 1 1 1 1 1 1 1 2 1 0 1 1 0 0 0 1 4 1 1 1 0 0 0 0 1 6 1 0 0 0 1 0 1 0 8 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 3 1 1 0 1 0 0 0 1 5 0 0 0 0 0 1 1 0 7 0 0 0 0 1 1 0 0 E0key 1 2 3 4 5 6 7 8 U1 2 3 4 5 6 7 8 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 0 0 1 0 0 0 7 0 0 0 0 1 1 0 0 8 1 1 1 1 0 0 0 0 U1key 1 2 3 4 5 6 7 8 U 1 1 1 1 0 1 1 1 2 1 0 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 6 1 0 0 0 1 0 0 0 8 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 3 1 1 0 0 0 0 0 0 7 0 0 0 0 1 1 0 0 U0key 1 2 3 4 5 6 7 8 6 5 7 234 238 248 283 284 6123 6124 6128 6132 6134 6138 6142 6143 6148 6182 6183 6184 7123 7124 7128 7132 7134 7138 7142 7143 7148 7165 7182 7183 7184 8123 8124 8132 8134 8142 8143 8165 8167 2134 2138 2143 2148 2165 2183 2184 3124 3128 3142 3148 3165 3167 3182 3184 4123 4128 4132 4128 4165 4167 4182 4183 5123 5124 5128 5132 5143 5138 5154 5143 5148 5167 5182 5183 5184 1 1 1 1 1 2 123 124 128 1 The # of 3paths starting at: 1 2 3 4 5 6 7 8 Tot 14 11 13 13 3 6 2 13 76 132 134 138 Find 4paths that ending with each 3path Concat with each elim if digit duplicates 142 143 148 8 8 8 8 8 213 214 216 231 234 3 3 3 3 3 3 214 216 218 248 281 284 4 4 4 4 4 4 4 213 216 218 231 238 281 283 4 4 4 4 4 4 4 312 316 318 321 328 381 382 8 8 8 8 8 8 8 312 314 316 321 324 341 342 165 167 0 1 1 1 0 1 0 1 1 1 1 1 1 1 324 328 342 348 382 384 2 2 2 2 2 2 2 314 316 318 341 348 381 384 8 8 8 8 8 8 312 314 316 318 321 324 328 341 342 348 381 382 384 4 3 182 183 184 pref E01

  18. 1 0 1 1 1 0 1 0 2 1 0 1 1 0 0 0 3 1 1 0 1 0 0 0 4 1 1 1 0 0 0 0 5 0 0 0 0 0 1 0 6 1 0 0 0 1 0 1 7 0 0 0 0 0 1 0 1 2 0 0 1 1 0 0 0 6 7 0 0 0 0 0 0 0 1 3 0 1 0 1 0 0 0 1 4 0 1 1 0 0 0 0 1 6 0 0 0 0 1 0 1 2 1 0 0 1 1 0 1 0 2 3 1 0 0 1 0 0 0 2 4 1 0 1 0 0 0 0 3 1 0 1 0 1 0 1 0 3 2 1 0 0 1 0 0 0 3 4 1 1 0 0 0 0 0 4 1 0 1 1 0 0 1 0 4 2 1 0 1 0 0 0 0 4 3 1 1 0 0 0 0 0 5 6 1 0 0 0 0 0 1 6 1 0 1 1 1 0 0 0 6 5 0 0 0 0 0 0 0 7 6 1 0 0 0 1 0 0 1 4 3 1 1 0 0 0 0 0 3 1 2 0 0 1 1 0 0 0 3 2 1 0 0 1 1 0 1 0 3 2 4 1 0 1 0 0 0 0 3 4 1 0 1 1 0 0 1 0 3 4 2 1 0 1 0 0 0 0 1 3 2 1 0 0 1 0 0 0 1 3 4 1 1 0 0 0 0 0 1 4 2 1 0 1 0 0 0 0 1 6 5 0 0 0 0 0 0 0 1 2 3 1 0 0 1 0 0 0 1 2 4 1 0 1 0 0 0 0 1 6 7 0 0 0 0 0 0 0 2 1 4 0 1 1 0 0 0 0 2 1 6 0 0 0 0 1 0 1 2 4 1 0 1 1 0 0 1 0 3 1 4 0 1 1 0 0 0 0 4 1 2 0 0 1 1 0 0 0 4 1 3 0 1 0 1 0 0 0 4 1 6 0 0 0 0 1 0 1 4 2 1 0 0 1 1 0 1 0 4 2 3 1 0 0 1 0 0 0 4 3 1 0 1 0 1 0 1 0 4 3 2 1 0 0 1 0 0 0 5 6 1 0 1 1 1 0 0 0 5 6 7 0 0 0 0 0 0 0 6 1 2 0 0 1 1 0 0 0 6 1 3 0 1 0 1 0 0 0 6 1 4 0 1 1 0 0 0 0 7 6 1 0 1 1 1 0 0 0 7 6 5 0 0 0 0 0 0 0 2 1 3 0 1 0 1 0 0 0 2 3 1 0 1 0 1 0 1 0 2 3 4 1 1 0 0 0 0 0 2 4 3 1 1 0 0 0 0 0 3 1 6 0 0 0 0 1 0 1 4 1 3 2 0 0 0 1 0 0 0 4 1 3 4 0 1 0 0 0 0 0 2 3 1 6 0 0 0 0 1 0 1 1 2 3 1 0 0 0 1 0 1 0 1 2 3 4 1 0 0 0 0 0 0 1 2 4 1 0 0 1 0 0 1 0 1 2 4 3 1 0 0 0 0 0 0 1 3 2 4 1 0 0 0 0 0 0 1 3 4 1 0 1 0 0 0 1 0 1 3 4 2 1 0 0 0 0 0 0 1 4 2 3 1 0 0 0 0 0 0 1 4 3 2 1 0 0 0 0 0 0 2 1 3 2 0 0 0 1 0 0 0 2 1 3 4 0 1 0 0 0 0 0 2 1 4 2 0 0 1 0 0 0 0 2 1 4 3 0 1 0 0 0 0 0 2 1 6 5 0 0 0 0 0 0 0 2 1 6 7 0 0 0 0 0 0 0 2 3 1 4 0 1 0 0 0 0 0 2 3 4 1 0 1 0 0 0 1 0 2 3 4 2 1 0 0 0 0 0 0 2 4 1 3 0 1 0 0 0 0 0 2 4 1 6 0 0 0 0 1 0 1 2 4 3 2 1 0 0 0 0 0 0 3 1 2 3 0 0 0 1 0 0 0 3 1 2 4 0 0 1 0 0 0 0 3 1 4 2 0 0 1 0 0 0 0 3 1 4 3 0 1 0 0 0 0 0 3 1 6 5 0 0 0 0 0 0 0 3 1 6 7 0 0 0 0 0 0 0 3 1 2 4 0 0 1 0 0 0 0 3 1 2 6 0 0 0 0 1 0 1 3 4 1 2 0 0 1 0 0 0 0 3 4 1 6 0 0 0 0 1 0 1 3 4 2 1 0 0 1 0 0 1 0 4 1 2 3 0 0 0 1 0 0 0 4 1 2 4 0 0 1 0 0 0 0 4 1 6 5 0 0 0 0 0 0 0 4 1 6 7 0 0 0 0 0 0 0 4 2 1 3 0 0 0 1 0 0 0 4 2 1 6 0 0 0 0 1 0 1 4 2 3 1 0 0 0 1 0 1 0 4 2 3 4 1 0 0 0 0 0 0 4 3 1 2 0 0 0 1 0 0 0 4 3 1 6 0 0 0 0 1 0 1 4 3 2 1 0 0 0 1 0 1 0 5 6 1 2 0 0 1 1 0 0 0 5 6 1 3 0 1 0 1 0 0 0 5 6 1 4 0 1 1 0 0 0 0 6 1 2 3 0 0 0 1 0 0 0 6 1 2 4 0 0 1 0 0 0 0 6 1 3 2 0 0 0 1 0 0 0 6 1 3 4 0 1 0 0 0 0 0 6 1 4 2 0 0 1 0 0 0 0 6 1 4 3 0 1 0 0 0 0 0 7 6 1 2 0 0 1 1 0 0 0 7 6 1 3 0 1 0 1 0 0 0 7 6 1 4 0 1 1 0 0 0 0 3 2 4 1 0 0 1 0 0 1 0 3 2 4 3 1 0 0 0 0 0 0 6 5 7 2 1 4 3

  19. Path pTree Continued stride=|V|=7. L is the longest path length, Path pTree is a L+1 level pTree (Levels 0-L) kListEh, E2hk = Ek & M’h M’h forces off bit=h, lest we repeat it.(Note Ek already has k bit turned off.) E4 1243 1324 1341 1342 1423 1432 2132 2134 2142 2143 2165 2167 2314 2316 2341 2342 2413 2416 2432 3123 3124 3142 3143 3165 3167 3124 3126 3241 3243 3412 3416 3421 1231 1234 1241 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 E5 12341 14231 14321 23142 23167 23416 31243 31267 34123 34167 34213 34216 12341 13241 13421 21432 23165 23412 24132 24167 31243 31265 34165 14321 24165 32413 31423 32416 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 234165 234167 342165 342167 324165 324167 E6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E4 4132 4134 4123 4124 4165 4167 4213 4231 4234 4312 4316 4321 5612 5613 5614 6123 6124 6132 6134 6142 6143 7612 7613 7614 4216 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 6 41234 5 41324 7 42134 31267 31265 42314 42316 43167 56132 56142 43165 43214 56123 56134 56143 61234 43124 43216 56124 76123 61243 61324 61342 61423 61432 76124 76132 76134 E5 76142 76143 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 1 432165 432167 561234 561243 561423 561432 761234 761243 761423 761432 561324 561342 761324 761342 423165 423167 E6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 234165 234167 342165 342167 423165 423167 432165 432167 561234 561243 561423 561432 761234 761243 761423 761432 561324 561342 761324 761342 324165 324167 23167 56134 31267 34167 43167 23165 23416 24167 31265 32416 34165 31267 43165 43216 56124 24165 34216 31265 56132 56142 56123 56143 61243 61234 61324 61342 61423 61432 76123 76124 76142 76132 76134 76143 3124 3142 4132 4165 4167 4213 4231 4312 4316 4321 5612 5613 5614 6123 6124 6132 6134 6142 6143 4216 7612 7613 7614 1342 1423 1432 2134 2143 2165 2167 2314 2316 2341 2413 2416 3165 3167 3124 3126 3241 3412 3416 3421 4123 165 143 142 132 134 167 412 413 416 421 423 431 432 561 567 612 613 614 761 765 214 216 241 341 342 412 413 416 421 423 431 432 561 567 612 613 614 761 765 123 124 213 231 234 243 312 314 316 321 324 12 13 14 16 23 24 34 56 67 4 3 The COMBO(7,2)=21 endpoint pairs are: 12 13 14 15 16 17 23 24 25 26 27 34 35 36 37 45 46 47 56 57 67 The shortest paths are of lengths: 1 1 1 2 1 2 1 1 3 2 3 1 3 2 3 3 2 3 1 2 1 So the diameter of the graph is 3. For very big graphs, how can we determine the diameter using pTrees only?