1 / 21

FAUST Analytics X=(X 1 ..X n )R n , |X|=N. X C =(X 1 ..X n ,C}. x.C{C 1 ..C c }=

SPRING PLAN: Develop Treeminer platform and datasets (+ Matt Piehl’s datasets) on pTree1 incl Hadoop (+Ingest procs for new datasets and convert them to pTreeSets.

Download Presentation

FAUST Analytics X=(X 1 ..X n )R n , |X|=N. X C =(X 1 ..X n ,C}. x.C{C 1 ..C c }=

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPRING PLAN: Develop Treeminer platform and datasets (+ Matt Piehl’s datasets) on pTree1 incl Hadoop (+Ingest procs for new datasets and convert them to pTreeSets. Each pick a VDMp TopicSetPPT or book / produce enhanced/up-to-date version / embed audio / Blackboard based AssignmentSets (+solutions) + (TestSets +answers. Md: Ph.D. CS, VDMp operations. Proposal defended (X) Rajat: MS CS, finished coursework. SE Quality Metrics (Dr. Magel) imp api Maninder: Ph.D. CS, (book needs structure) Plan: develop in Treeminer env., Spoorthy: MSE (no thesis) going to change to MS in SE, Damian: Ph.D. in SE, (Dr. Nygard advisor) Arjun: Ph.D. CS. (VDMp algs, 2 tasks s15, proposal, journal paper, Arijit: Ph.D. in CS, VDMp DM of fanacials, proposal defense this term Bryan: Multilevel pTrees FAUST Analytics X=(X1..Xn)Rn, |X|=N. XC=(X1..Xn,C}. x.C{C1..Cc}= TrainSetCl. d=(d1..dn), p=(p1..pn)Rn. F:RnR, F=L, S, R : PTSSPTS. Sp (X-p)o(X-p)=XoX+Xo(-2p)+pop=L-2p+XoX+pop Ld,p (X-p)od=Xod-pod LdXod Rd,p Sp-L2d,p =XoX+L-2p+pop-(Ld)2-2pod*Xod+(pod)d2=L-2p-(2pod)d -(Ld)2+XoX+pop+(pod)2 FAUST Top K OutlierDetector : rankn-1Sx • TODO: • Attribute selection prior to FAUST Oblique (for speed/accuracy; use Treeminer methods, use hi pTree correlation with class?, use hi info_gai/gini_index/other?, … • Implement FAUST Hull Classifier. Enhance/compare it using a good exp. design and real datasets (Treeminer hasn’t gotten to that yet. So we can help a lot there!). • Research pTree density based clustering (https://bb.ndsu.nodak.edu/bbcswebdav/pid-2819939-dt-content-rid-13579458_2/courses/153-NDSU-20309/dmcluster.htm + 1stppt_set) Related papers: • DAYVD: Iterative Density-Based Approach for Clusters with VarYing Density, International Journal of Computers and Their Applications, V17:1, ISSN 1076-5204, B. Wang and W. Perrizo, March, 2010.52. . • A Hierarchical Approach for Clusters in Different Densities”, Proceedings of the International Conference on Software Engineering and Data Engineering, Los Angeles, B. Wang, W. Perrizo, July, 2006. • A Comprehensive Hierarchical Clustering Method for Gene Expression Data Association of Computing Machinery, Symposium on Applied Computing, ACM SAC 2005, Mar., Santa Fe, NM, B. Wang, W. Perrizo. • A P-tree-based Outlier Detection Alg”, International Society of Computer Applications Conference on Applications. in Industry and Engineering., ISCA CAINE 2004, Orlando, FL, Nov., 2004 (with B. Wang, D. Ren) • A Cluster-based Outlier Detection Method with Efficient Pruning”, International Society of Computer Applications Conf. on Applics. in Industry and Eng., ISCA CAINE, Nov., 2004 (with B. Wang, D. Ren) • A Density-based Outlier Detection Alg using Pruning Techniques”, Intl Society of Computer Applications Conf. on Applics. in Industry and Eng., ISCA CAINE 2004, Nov., 2004 (with B. Wang, K. Scott, D. Ren) • Parameter Reduction for Density-based Clustering on large Data Sets”, Intl Society of Computer Applications Conference on Applications in Industry and Engineering, ISCA CAINE 2004, Nov., 2004 (with B. Wang) • Outlier Detection with Local Pruning”, Association of Computing Machinery Conference on Information and Knowledge Management, ACM CIKM 2004, Nov., 2004, Washington, D.C., (with D. Ren). • RDF: A Density-based Outlier Detection Method using Vertical Data Representation”, IEEE International Conference On Data Mining, IEEE ICDM 2004, Nov., 2004, Brighton, U.K., (with D. Ren, B. Wang). • A Vertical Outlier Detection Method with Clusters as a By-Product”, IEEE International Conf. On Tools in Artificial Intelligence, IEEE ICTAI 2004, Nov., 2004, Boca Raton, FL, (with D. Ren). TOTRY: 1 Agnes Clusterer: Start w singles clusters.  C, find closest nbr, C’. If (C,C’) Qualifies, CCC’. Qualifies::always; d(C,C’)<T; dens(CC’)<dens(C)-; minFdis(F(C),F(C’))< (easy computation by comparing cutpoints under each F?) 2 Fuzzy Classifier: fully replicate pTreeSet. Start w all singles classes. Each processor assigned a class, C, do One-classification on C resulting in C’. If C’ qualifies, CC’. FAUST Hull Classifying: Classify yCk iff yHullk{z|minFCk-F(z)maxFCk+}. FAUST Clustering:Start C=X. Uuntil STOP (ClusDens> thres), recursively cut C at F-gaps (mdpt or adjust w variance). Use PCC gaps instead for suspected aberrations. Mark11/25 FAUST text classification capable of accuracy as high as anything out there. Stanford_newsgroup dataset (7.5Kdocs) FAUST got 80% boost by eliminating terms in <= 2 docs. Chi-squared to reduce attrs, 20% (pick 80% best attr.). Vertical allows toss atts easily before we expend CPU. Tossing intelligently improvse accuracy and reduces time!  We eliminated ~70% attribs from TestSet and achieved better accuracy than the classifiers referenced on Stanford NLP site! About to turn this loose on datasets approaching 1TB in size. Mark 11/26 Adjusting midpt as well based on cluster deviation. Gives extra 4% accuracy. The hull is interesting case, as we’re looking at situations. We are already able to predict which members are poor matches to a class. TEST MINING COMMMENTS: For text corpus (d docs (rows) and t terms (cols), so far recorded only a very tiny part of the total info. Wealth of other info, not captured . 2010-2015 notes tried to capture more information than just term existence or wtf info (~2012_07-09).

  2. FAUST Oblique, F(x)=D1ox: Scalar pTreeSet (column of reals) , SPF(X) pTree calculated: mod( int(SP F(X)/(2exp) , 2 ) = SPD1oX = SPD1,iXi SPF(X)-min pD1,0 pD1,-1 pD1,-2 pD1,-3 pe1,0 pe1,-1 pe1,-2 pe1,-3 pe2,0 pe2,-1 pe2,-2 pe2,-3 F(a)= F(b)= F(c)= F(d)= F(e)= F(f)= F(g)= F(h)= 1*1.0 -½*3.0 = 1*1.5 -½*3.0 = 1*1.2 -½*2.4 = 1*0.6 -½*2.4 = 1*2.2 -½*2.1 = 1*2.3 -½*3.0 = 1*2.0 -½*2.4 = 1*2.5 -½*2.4 = 0.1 0.6 0.6 0 1.75 1.4 1.4 1.9 -.5 0 0 -.6 1.15 .8 .8 1.3 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 SPe1oX h1 [.6,1.5] h2 [2,2. 5] SPe2oX h1 [2.4,3] h2 [2.1,3] SPD1oX -mn h1 [0,.6] h2 [1.4,1.9] Idea: Incrementally build clusters one at a time using all F values. E.g., start with one pt, x. Recall F dis dominated, which means actual distance ≥ F difference. If the hull is close to convex hull, max Fdiff approximates distance? Then 1st gap in maxFdiss isolates x-cluster? pe1,1 pe2,1 (1.5,3) b (1,3) a 2.3,3) f 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 c (1.2,2.4) g (2,2.4) d (.6,2.4) h (2.5,2.4) e (2.2,2.1) F(b=F(c)) F(a) F(f)=F(g) F(h) F(d) F(e) D1=(1 , -½)

  3. FAUST Oblique, F(x)=D1ox: Scalar pTreeSet (column of reals) , SPF(X) pTree calculated: mod( int(SP F(X)/(2exp) , 2 ) = SPD1oX = SPD1,iXi SPF(X)-min mxFdf(h) 1.8 1.3 1.3 1.9 .3 .6 .5 0 {e,f,g,h} h-cluster. Gap=.7 mxFdf(g) 1.3 .8 .8 1.4 .35 .6 0 .5 {b,c,e,f,g,h} g-cluster. Gap=.5 mxFdf(f) 1.3 .8 1.1 1.7 .9 0 .6 .6 all in f-cluster. mxFdf(e) 1.65 1.15 1.15 1.75 0 .9 .35 .3 {e,g,h} e-cluster Gap=.55 mxFdf(a) 0 .5 .6 .6 1.65 1.3 1.3 1.8 {a,b,c,d} a-cluster. Gap=.7 mxFdf(b) .5 0 .6 .9 1.15 .8 .8 1.3 All in b-cluster mxFdf(c) .6 .6 0 .6 1.15 1.1 .8 1.3 All in c-cluster. mxFdf(d) .6 .9 .6 0 1.75 1.7 1.4 1.9 {a,b,c,d} d-cluster Gap=.5 F(a)= F(b)= F(c)= F(d)= F(e)= F(f)= F(g)= F(h)= 1*1.0 -½*3.0 = 1*1.5 -½*3.0 = 1*1.2 -½*2.4 = 1*0.6 -½*2.4 = 1*2.2 -½*2.1 = 1*2.3 -½*3.0 = 1*2.0 -½*2.4 = 1*2.5 -½*2.4 = 0.1 0.6 0.6 0 1.75 1.4 1.4 1.9 -.5 0 0 -.6 1.15 .8 .8 1.3 SPe1oX h1 [.6,1.5] h2 [2,2. 5] SPe2oX h1 [2.4,3] h2 [2.1,3] SPD1oX -mn h1 [0,.6] h2 [1.4,1.9] Incrementally build clusters 1 at a time with F values. E.g., start with 1 pt, x. Recall F dis dominated, which means actual separation ≥ F separation. If the hull is well developed (close to convex hull) max Fdiff approximates distance? Then 1st gap in maxFdis isolates x-cluster? (1.5,3) b (1,3) a 2.3,3) f g (2,2.4) c (1.2,2.4) d (.6,2.4) h (2.5,2.4) e (2.2,2.1) F(b=F(c)) F(a) F(f)=F(g) F(h) F(d) F(e) D1=(1 , -½)

  4. http://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmcluster.htmhttp://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmlearn.htmhttp://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmcluster.htmhttp://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmlearn.htm Clustering: Partition; • TODO: • Attribute selection prior to FAUST (for speed/accuracy, clustering/classification; Treeminer methods, hi pTree correlation with class?, hi info_gai/gini_index/other?,… In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence. The expected value of the information gain is the mutual information I(X; A) of X and A – i.e. the reduction in the entropy of X achieved by learning the state of the random variable A. In machine learning, this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree. Usually an attribute with hi mutual info should be preferred to others. In general terms, the expected information gain is the change in information entropy from a prior state to a state that takes some information as given: Formal definition: Let denote a set of training examples, each of the form where is the value of the th attribute of example and is the corresponding class label. The information gain for an attribute is defined in terms of entropy as follows: The mutual information is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case, the relative entropies subtracted from the total entropy are 0. Drawbacks: Although information gain is usually a good measure for deciding the relevance of an attribute, it is not perfect. A notable problem occurs when information gain is applied to attributes that can take on a large number of distinct values. For example, suppose that one is building a decision tree for some data describing the customers of a business. Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high mutual information, because it uniquely identifies each customer, but we do not want to include it in the decision tree: deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen before (overfitting). Information gain ratio is sometimes used instead. This biases the decision tree against considering attributes with a large number of distinct values. However, attributes with very low information values then appeared to receive an unfair advantage. In statistics, dependence is any statistical relationship between two random variables or two sets of data. Correlation refers to any of a class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling; however, statistical dependence is not sufficient to demonstrate the presence of such a causal relationship (i.e., correlation does not imply causation). Formally, dependence refers to any situation in which random variables do not satisfy a mathematical condition of probabilistic independence. In loose usage, correlation can refer to any departure of two or more random variables from independence, but technically it refers to any of several more specialized types of relationship between mean values. There are several correlation coefficients, often ρ or r, measuring the degree of correlation. Most common of these is Pearson correlation coefficient, sensitive only to a linear relationship between 2 variables. Other correlation coeffs have been developed to be more robust than Pearson correlation. Several sets of (x, y) points, with the Pearson correlation coefficient of x and y for each set. Note that the correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case correlation coefficient is undefined because the variance of Y=0 Contents: Pearson's product-moment coefficient: Main article: Pearson product-moment correlation coefficient :The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or "Pearson's correlation coefficient", commonly called simply "the correlation coefficient". It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.[4] The population correlation coefficient ρX,Y between two random variablesX and Y with expected values μX and μY and standard deviations σX and σY is defined as: where E is expected value operator, cov means covariance, and, corr an alternative notation for the correlation coefficient. Pearson correlation is defined only if both standard deviations are finite and nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation coefficient is symmetric: corr(X,Y) = corr(Y,X). The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), −1 in the case of a perfect decreasing (inverse) linear relationship (anticorrelation),[5] and some value between −1 and 1 in all other cases, indicating the degree of linear dependence between variables. As it approaches 0 there is less of a relationship (closer to uncorrelated). The closer coefficient is to either −1 or 1, stronger the correlation between variables. If variables are indep Pearson's correlation coeff is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. E.g., random variable X is symmetrically distributed about 0, Y = X2. Then Y is completely determined by X, so that X and Y are perfectly dependent, but their correlation is zero; uncorrelated. Special case X and Y are jointly normal, uncorrelatedness = independence. If a series of n measmnts of X ,Y written xi and yii = 1...n, sample correlation coefficient can be used to estimate pop Pearson correlation r between X and Y. Sample correlation coeff is written where x and y are the sample means of X and Y, and sx and sy are the sample standard deviations of X and Y.

  5. This can also be written as: If x,y are results of meamnts containing error, realistic limits on correlation coef are not −1 to +1 but a smaller range.. Rank correlation coefficients Main articles: Spearman's rank correlation coefficient and Kendall tau rank correlation coefficientRank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient (τ) measure extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship. It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions. However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient, and are best seen as measures of a different type of association, rather than as alternative measure of the population correlation coefficient.[7][8] To illustrate rank correlation, and its difference from linear correlation, consider 4 pairs (x, y): (0, 1), (10, 100), (101, 500), (102, 2000). As we go from each pair to the next pair x increases, and so does y. This relationship is perfect, in the sense that an increase in x is always accompanied by an increase in y. This means that we have a perfect rank correlation, and both Spearman's and Kendall's correlation coefficients are 1, whereas in this example Pearson product-moment correlation coefficient is 0.7544, indicating that the points are far from lying on a straight line. In the same way if y always decreases when xincreases, the rank correlation coefficients will be −1, while the Pearson product-moment correlation coefficient may or may not be close to −1, depending on how close the points are to a straight line. Although in the extreme cases of perfect rank correlation the two coefficients are both equal (being both +1 or both −1) this is not in general so, and values of the two coefficients cannot meaningfully be compared.[7] For example, for the three pairs (1, 1) (2, 3) (3, 2) Spearman's coefficient is 1/2, while Kendall's coefficient is 1/3. Other measures of dependence among random variables: The info given by a correlation coef is not enough to define the dependence structure between random variables.[9] Correlation coef completely defines dependence structure only in very particular cases, for ex when the distrib is a multivariate normal distribution. In the case of elliptical distributions it characterizes (hyper-) ellipses of equal density, but, it does not completely characterize dependence structure (for ex, a multivariate t-distribution's degrees of freedom determine level of tail dep). Distance correlation and Brownian covariance / Brownian correlation[10][11] were introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance correlation and zero Brownian correlation imply independence. The Randomized Dependence Coefficient[12] is a computationally efficient, copula-based measure of dependence between multivariate random variables. RDC is invariant with respect to non-linear scalings of random variables, is capable of discovering a wide range of functional association patterns and takes value zero at independence. correlation ratio is able to detect almost any functional dependency and entropy-based mutual information, total correlation and dual total correlation are capable of detecting more general dep. These are sometimes referred to as multi-moment correlation measures,[citation needed] in comparison to those that consider only second moment (pairwise or quadratic) dependence. The polychoric correlation is another correlation applied to ordinal data that aims to estimate the correlation between theorised latent variables. One way to capture a more complete view of dependence structure is to consider a copula between them. coefficient of determination generalizes the correlation coefficient for relationships beyond simple linear regression. Sensitivity to the data distribution: The degree of dependence between variables X and Y does not depend on the scale on which the variables are expressed. That is, if we are analyzing the relationship between X and Y, most correlation measures are unaffected by transforming X to a + bX and Y to c + dY, where a, b, c, and d are constants. This is true of some correlation statistics as well as their population analogues. Some correlation statistics, such as the rank correlation coefficient, are also invariant to monotone transformations of the marginal distributions of X and/or Y. Pearson/Spearman correlation coefficients between X and Y are shown when the two variables' ranges are unrestricted, and when the range of X is restricted to the interval (0,1). Most correlation measures are sensitive to manner in which X and Y are sampled. Dependencies tend to be stronger if viewed over a wider range of values. Thus, the correlation coeff between heights of fathers and their sons over all adult males, and compare it to the same correlation coeff calculated when fathers are selected to be between 165 and 170 cm in height, correlation will be weaker in latter case. See also Association (statistics)AutocorrelationCanonical correlationCoefficient of determinationCointegrationConcordance correlation coefficientCophenetic correlationCopulaCorrelation functionCovariance and correlationCross-correlationEcological correlationFraction of variance unexplainedGenetic correlationGoodman and Kruskal's lambdaIllusory correlationInterclass correlationIntraclass correlationModifiable areal unit problemMultiple correlationPoint-biserial correlation coefficientQuadrant count ratioStatistical arbitrageSubindependence

  6. http://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmcluster.htmhttp://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmcluster.htm http://www.cs.ndsu.nodak.edu/~perrizo/saturday/teach/879s15/dmlearn.htm Clustering: Partition; Hierarchical; Density; Grid; Model-based Agnes(Agglomerative Nesting) Kaufmann, Rousseeuw (90). Use Single-Link (distance between two sets is the minimum pairwise dist) meth. Merge nodes most similarity. Eventually all nodes belong to the same cluster Diana (Divisive Analysis) Inverse order of AGNES (start: all objects in 1 cluster; split by some criteria (e.g., max some aggregate or pairwise dissimilarity. Major weakness of agglomerative clusterings: do not scale well: time complexity of at least O(n2), n=number of total objects. Can never undo what was done previously (greedy alg). Integration of hierarchical with distance-based clustering: BIRCH 96: uses Cluster Feature tree (CF-tree).Incr adjusts quality of sub-clusters CURE98: selects well-scattered pts from cluster, shrinks to cluster ctr by fraction CHAMELEON 99: hierarchical clustering using dynamic modeling Density Clustering, Discover clusters of arbitrary shape, Handle noise; One scan; Need density parameters as stop condition. Several interesting studies: DBSCAN: Ester, et al. (KDD’96); OPTICS: Ankerst, et al (SIGMOD’99).; DENCLUE: Hinneburg & D. Keim (KDD’98); CLIQUE: Agrawal, et al. (SIGMOD’98) Decision Tree Classification (A flow-chart-like tree structure) Internal node denotes test on an attrib. Branch represents an outcome of test. Leaf nodes represent class labels or class distribution. Tree pruning (Identify and remove branches that reflect noise or outliers). Class P: buys_computer = “yes” Class N: buys_computer = “no” I(p, n) = I(9, 5) =0.940 Compute the entropy for age: Hence Similarly Assume using attribute A, set S will be partitioned into sets {S1, S2 , …, Sv}. If Si contains piexamples of P and ni examples of N, the entropy, or the expected info needed to classify objects in all subtrees Si is info gained by branching on A Bayesian classifier is based on Bayes theorem:Let X be a data sample whose class label is unknown. Let H be the hypothesis that X belongs to class, H. P(H|X)=condprobof H given X. P(H) is prob of H, P(H|X) = P(X|H)P(H)/P(X) Info Gain (ID3/C4.5) Select attrib with highest info gain. Assume two classes, P and N (positive/negative). Let the set of examples S contain p elements of class P and n elements of class N. Amount of info, needed to decide if arbitrary example in S belongs to P or N is defined: Naïve Bayesian: Given training set, R(A1..An, C) where C={C1..Cm} is the class label attribute. The naive Bayesian Classifier will predict the class of unknown data sample, X, to be the class, Cj having the highest conditional probability, conditioned on X P(Cj|X) ≥ P(Ci|X), i  j. From the Bayes theorem: P(Cj|X) = P(X|Cj)P(Cj)/P(X) P(X) is constant for all classes so we maximize P(X|Cj)P(Cj).. Max P(X|Cj)P(Cj). To reduce comp complexity of calculating all P(X|Cj)'s the naive assumption: class conditional indep

  7. level-1 TermFreqPTrees (E.g., the predicate of tfP0: mod(sum(mdl-stride),2)=1) <--dfP0 ... Term (Vocab) ..doc freq <--dfP2 0 0 1 1 2 0 0 0 0 0 df (cnt) 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 3 1 1 1 8 2 0 1 0 2 1 1 8 1 1 2 3 8 1 1 3 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 3 3 0 3 0 . . . 0 0 0 0 0 . . . 0 . . . 0 . . . 0 . . . ... tf0 0 . . . ... tf1 0 . . . ... tf 0 . . . 0 . . . 0 0 0 0 0 d=3 d=3 d=3 t=a t=again t=all ... doc=3 doc=3 doc=3 term=a trm=again term=all ... doc=1 d=1 d=1 term=a t=again t=all doc=1 doc=1 doc=1 term=a trm=again term=all doc=2 doc=2 doc=2 term=a trm=again term=all d=2 d=2 d=2 t=a t=again t=all 0 0 0 0 0 0 0 0 0 0 5 6 ...doc ...Term Freq ... tf2 ... tf0 ... tf1 ... Term Ex 1 0 0 JSE 0 0 1 0 . . . HHS LMM 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . a apple 1 3 always. 1 1 1 1 April an again all and 1 2 are 3 7 1 2 3 4 Length of this level-1 TermExistencePTree =VocabLen*DocCount pred is NOTpure0 Length of this level-0 pTree= mdl*VocabLen*DocCount . . . 0 0 0 1 2 3 4 5 6 7 mdl reading-positions for doc=1, term=a (mdl = max doc length) 1 2 3 4 5 6 7 mdl reading-positions: doc=1, term=again 1 2 3 4 5 6 7 mdl reading-positions for doc=1, term=all DocTrmPos pTreeSet dfk isn't a level-2 pTree since it's not a predicate on level-1 te strides. Next slides shows how to do it differently so that even the dfk's come out as level-2 pTrees. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining (from 2012_08_04 Data Cube Text Mining ... Position

  8. level-2 PTree, hdfP?? (Hi Doc Feq): pred=NOTpure0 applied to tfP1 <--dfP0 Vocab Terms ..doc freq hdfP <--dfP3 1 0 2 0 1 doc1 doc2 doc3 0 0 0 0 0 df count 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 doc1 doc2 doc3 1 0 1 1 3 0 1 1 1 0 0 0 0 8 8 0 2 1 8 1 0 1 0 1 1 0 1 0 0 0 . . . 0 0 . . . 2 . . . 0 . . . 3 3 3 . . . . . . ... tfP1 . . . ... tf 0 . . . . . . ... tfP0 . . . 0 . . . . . . 0 0 0 0 0 tePt=all d=1 d=2 d=3 t=all t=all t=all ... doc=1 d=2 d=3 term=a t=a t=a d=1 d=2 d=3 t=again t=again t=again 0 0 0 0 0 0 0 0 0 0 tePt=again tePt=a tr=all t=all t=all doc1 doc2 doc3 ... t=again t=again t=again doc1 doc2 doc3 trm=a trm=a term=a doc1 doc2 doc3 5 6 ...doc ... tf2 ... tf0 ...Term Freq ... tf1 ... Term Ex 0 0 0 JSE 0 0 1 0 . . . HHS LMM 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 1 always. 1 apple a 1 again 1 an 1 April 1 3 are 1 2 and all 7 1 2 3 4 These level-2 pTrees, dfPk have len= VocabLength level-1 PTrees, tfPk e.g., pred of tfP0: mod(sum(mdl-stride),2)=1 This one, overall, level-1 pTree, teP, has length = DocCount*VocabLength term=a doc2 term=a doc3 term=a doc1 term=again doc1 ... This one, overall, level-0 pTree, corpusP, has length = MaxDocLen*DocCount*VocabLen Corpus pTreeSet 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining data Cube layout: ... Pos

  9. level-2 PTree, hdfP?? (Hi Doc Feq): pred=NOTpure0 applied to tfP1 <--dfP0 Vocab Terms ..doc freq hdfP <--dfP3 1 2 1 0 0 doc1 doc2 doc3 0 0 0 0 0 df count 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 1 3 1 0 1 2 0 1 0 0 0 0 1 1 1 8 8 8 0 0 0 0 0 1 1 1 0 1 1 0 . . . 3 0 3 2 . . . . . . ... tfP0 . . . ... tf 3 0 . . . 0 . . . . . . ... tfP1 . . . . . . . . . 0 . . . 2 . . . 0 0 0 0 0 0 0 0 0 0 This overall, level-1 pTree, teP, has length = DocCount*VocabLength 0 0 0 0 0 tePt=again tePt=all tePt=a tr=all t=all t=all doc1 doc2 doc3 ... t=again t=again t=again doc1 doc2 doc3 trm=a trm=a term=a doc1 doc2 doc3 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 Pt=a,d=3 0 Pt=a,d=2 5 6 Verb pTree Refrncs pTree EndofSentence Preface pTree LastChpt pTree ...doc ... tf1 ... tf2 ... tf ... tf0 ... te 0 0 0 JSE 0 0 0 0 Pt=a,d=1. . . HHS LMM 0 0 0 0 0 0 0 Pt=again,d=1 0 0 0 0 0 0 0 . . . 1 1 1 a apple always. 1 1 April 1 again an all 2 and 3 1 are 7 1 2 3 4 These level-2 pTrees, dfPk have len= VocabLength doc1 doc2 doc3 level-1 PTrees, tfPk e.g., pred of tfP0: mod(sum(mdl-stride),2)=1 d=1 d=2 d=3 t=all t=all t=all ... doc=1 d=2 d=3 term=a t=a t=a d=1 d=2 d=3 t=again t=again t=again This overall level-0 pTree corpusP length MaxDocLen*DocCount*VocabLen term=again doc1 ... term=a doc3 term=a doc2 term=a doc1 Any of these masks can be ANDed into the Pt= , d= pTrees before they are concatenated as above (or repetitions of the mask can be ANDED after they are concatenated). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining data Cube layout: ... Pos

  10. APPENDIX I have put together a pBase of 75 Mother Goose Rhymes or Stories. Created a pBase of the 15 documents with  30 words (Universal Document Length, UDL) using as vocabulary, all white-space separated strings. Little Miss Muffet Lev1 (term freq/exist) Lev-0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20... pos 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 . . . 182 te tf tf1 tf0 VOCAB Little Miss Muffet sat on a tuffet eating 1 2 1 0 a 0 0 0 0 0 1 0 0 0 0 0 0 again. 0 0 0 0 0 0 0 0 0 0 0 0 all 0 0 0 0 0 0 0 0 0 0 0 0 always 0 0 0 0 0 0 0 0 0 0 0 0 an 0 0 0 0 0 0 0 0 1 3 1 1 and 0 0 0 0 0 0 0 0 0 0 0 0 apple 0 0 0 0 0 0 0 0 0 0 0 0 April 0 0 0 0 0 0 0 0 0 0 0 0 are 0 0 0 0 0 0 0 0 0 0 0 0 around 0 0 0 0 0 0 0 0 0 0 0 0 ashes, 0 0 0 0 0 0 0 0 0 0 0 0 away 0 0 0 0 0 0 0 0 0 0 0 0 away 0 0 0 0 0 0 0 0 1 1 0 1 away. 0 0 0 0 0 0 0 0 0 0 0 0 baby 0 0 0 0 0 0 0 0 0 0 0 0 baby. 0 0 0 0 0 0 0 0 0 0 0 0 bark! 0 0 0 0 0 0 0 0 0 0 0 0 beans 0 0 0 0 0 0 0 0 0 0 0 0 beat 0 0 0 0 0 0 0 0 0 0 0 0 bed, 0 0 0 0 0 0 0 0 0 0 0 0 Beggars 0 0 0 0 0 0 0 0 0 0 0 0 begins. 0 0 0 0 0 0 0 0 1 1 0 1 beside 0 0 0 0 0 0 0 0 0 0 0 0 between 0 0 0 0 0 0 0 0 . . . 0 0 0 0 your 0 0 0 0 0 0 0 0 of curds and whey. There came a big spider and sat down... 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Level-2 pTrees (document frequency) df3 df2 df1 df0 df VOCAB te04 te05 te08 te09 te27 te29 te34 1 0 0 0 8 a 1 1 0 1 0 0 0 0 0 0 1 1 again. 0 1 0 0 0 0 0 0 0 1 1 3 all 0 1 0 0 0 0 0 0 0 0 1 1 always 0 0 0 0 0 1 0 0 0 0 1 1 an 0 0 0 0 0 0 0 1 1 0 1 13 and 1 1 1 1 1 1 1 0 0 0 1 1 apple 0 0 0 0 0 0 0 0 0 0 1 1 April 0 0 0 0 0 0 0 0 0 0 1 1 are 0 0 0 0 0 0 0 0 0 0 1 1 around 0 0 0 0 0 0 0 0 0 0 1 1 ashes, 0 0 0 0 0 0 0 0 0 1 0 2 away 0 0 0 0 0 1 0 0 0 1 0 2 away 0 0 0 0 0 1 0 0 0 0 1 1 away. 1 0 0 0 0 0 0 0 0 0 1 1 baby 0 0 0 0 1 0 0 0 0 0 1 1 baby. 0 0 0 1 0 0 0 0 0 0 1 1 bark! 0 0 0 0 0 0 0 0 0 0 1 1 beans 0 0 0 0 0 0 1 0 0 0 1 1 beat 0 0 0 0 0 0 0 0 0 0 1 1 bed, 0 0 0 0 0 1 0 0 0 0 1 1 Beggars 0 0 0 0 0 0 0 0 0 0 1 1 begins. 0 0 0 0 0 0 0 0 0 0 1 1 beside 1 0 0 0 0 0 0 0 0 0 1 1 between 0 0 1 0 0 0 0 Humpty Dumpty Lev1 (term freq/exist) Lev-0 1 2 3 4 5 6 7 8... pos 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 . . . 182 te tf tf1 tf0 05HDS Humpty Dumpty sat on a wall. Humpt yDumpty 1 2 1 0 a 0 0 0 0 1 0 0 0 1 1 0 1 again. 0 0 0 0 0 0 0 0 1 2 1 0 all 0 0 0 0 0 0 0 0 0 0 0 0 always 0 0 0 0 0 0 0 0 0 0 0 0 an 0 0 0 0 0 0 0 0 1 1 0 1 and 0 0 0 0 0 0 0 0 0 0 0 0 apple 0 0 0 0 0 0 0 0 0 0 0 0 April 0 0 0 0 0 0 0 0 0 0 0 0 are 0 0 0 0 0 0 0 0 0 0 0 0 around 0 0 0 0 0 0 0 0 0 0 0 0 ashes, 0 0 0 0 0 0 0 0 0 0 0 0 away 0 0 0 0 0 0 0 0 0 0 0 0 away 0 0 0 0 0 0 0 0 0 0 0 0 away. 0 0 0 0 0 0 0 0 0 0 0 0 baby 0 0 0 0 0 0 0 0 0 0 0 0 baby. 0 0 0 0 0 0 0 0 0 0 0 0 bark! 0 0 0 0 0 0 0 0 0 0 0 0 beans 0 0 0 0 0 0 0 0 0 0 0 0 beat 0 0 0 0 0 0 0 0 0 0 0 0 bed, 0 0 0 0 0 0 0 0 0 0 0 0 Beggars 0 0 0 0 0 0 0 0 0 0 0 0 begins. 0 0 0 0 0 0 0 0 0 0 0 0 beside 0 0 0 0 0 0 0 0 0 0 0 0 between 0 0 0 0 0 0 0 0 . . . 0 0 0 0 your 0 0 0 0 0 0 0 0

  11. FAUST Clustering1 L-GapClustererCut, C, mid-gap (of F&C) using next (d,p) from dpSet, where F=L|S|R 2-1 separates 7,50 2-2 separates.27s 2^?1 0 -1 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.55 0 0 1 0 0.55 0 0 1 0 3.60 1 1 1 0 D=d35 0 d26 0 d1 0 d27 0 d3 0 d44 0 d16 0 d6 0 d17 0 d47 0 d18 0 d10 0 d43 0 d12 0 d33 0 d14 0 d23 0 d49 0 d25 0 d45 0 d2 0 d29 0 d13 0 d9 0 d32 0.27 d28 0.27 d41 0.27 d42 0.27 d30 0.27 d21 0.27 d22 0.27 d15 0.27 d36 0.27 d11 0.27 d38 0.27 d46 0.27 d5 0.27 d8 0.27 d37 0.27 d48 0.27 d39 0.27 d4 0.55 d50 0.55 d7 3.60 d35 35, 7, 50 outliers D=.27s 0 d9 0 d49 0 d45 0.09 d6 0.09 d3 0.09 d33 0.09 d18 0.09 d44 0.18 d43 0.18 d25 0.18 d22 0.18 d12 0.18 d16 0.18 d2 0.27 d27 0.27 d23 0.27 d42 0.27 d15 0.27 d13 0.27 d47 0.36 d26 0.36 d29 0.36 d36 0.46 d38 0.46 d14 0.46 d48 0.46 d8 0.46 d10 0.46 d37 0.55 d32 0.55 d1 0.55 d5 0.64 d21 0.64 d4 0.64 d11 0.64 d17 0.92 d30 1.01 d41 1.01 d28 1.10 d39 1.29 d46 {28,30,39,41,46} Cluster D=.64s 0 d26 0 d33 0 d3 0 d27 0 d45 0 d2 0 d44 0 d23 0 d9 0 d15 0 d49 0 d16 0 d38 0 d6 0 d18 0 d22 0.25 d1 0.25 d37 0.25 d43 0.25 d8 0.25 d29 0.25 d25 0.25 d42 0.25 d12 0.25 d47 0.25 d48 0.51 d32 0.51 d14 0.51 d4 0.51 d36 0.51 d13 0.51 d5 0.77 d10 1.03 d11 1.29 d17 1.54 d21 the 0's, .25s, .51s are clusters. d10, d11, d17, d21 outliers Going back to D=d35, how close does HOB come? 21, 20 separate 35 C1 (.17  xod  .25)={2,3,6,16,18,22,42,43,49} D=sum of all C31docs 0.63 d17 0.63 d29 0.63 d11 0.84 d50 0.84 d13 0.84 d30 0.95 d26 0.95 d28 0.95 d10 0.95 d41 1.16 d21 C311(..63) ={11,17,29} C312(.84) ={13,30,50} C313(.95) ={10,26,28,41} 21 outlier C2 (.34  xod  .56)={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} C3 (.64xod.86)={10,11,13,17,21,26,28,29,30,39,41,50} Single: 46 (xod=.99); 7 (=1.16); 35 (=1.47) D=sum of allC2docs 0.27 d23 0.36 d25 0.36 d4 0.36 d38 0.45 d15 0.45 d33 0.45 d12 0.45 d36 0.54 d8 0.54 d44 0.54 d47 0.63 d1 0.63 d37 0.63 d5 0.63 d32 0.63 d50 0.72 d27 0.72 d45 0.72 d9 0.81 d14 Next, on each Ck try D=Ck, Thres=.2 D=sum of all C1docs 0.42 d16 0.42 d2 0.42 d3 0.42 d42 0.42 d43 0.42 d22 0.63 d18 0.63 d49 0.85 d6 C11(xod=.42)={231622,42,43} 6,18,49 outliers D=sum of all C11docs 0.57 d2 0.57 d3 0.57 d16 0.57 d22 0.57 d42 0.57 d43 D=sum of all C3docs 0.56 d11 0.66 d17 0.66 d29 0.75 d13 0.85 d30 0.85 d10 0.94 d28 0.94 d26 0.94 d41 0.94 d50 1.03 d21 1.41 d39 C31(.56xod1.03) ={10,11,13,17,21,26,28,29,30,41,50} 39 outlier Other Clustering methods later D=44docs GT=.08 0.17 d22 0.17 d49 0.21 d42 0.21 d2 0.21 d16 0.25 d18 0.25 d3 0.25 d43 0.25 d6 0.34 d23 0.34 d15 0.34 d44 0.34 d38 0.34 d25 0.34 d36 0.38 d33 0.38 d48 0.38 d8 0.43 d4 0.43 d12 0.47 d47 0.47 d9 0.47 d37 0.51 d5 0.56 d1 0.56 d32 0.56 d45 0.56 d14 0.56 d27 0.64 d10 0.64 d17 0.64 d21 0.64 d29 0.64 d11 0.69 d26 0.69 d50 0.69 d13 0.73 d30 0.77 d28 0.82 d41 0.86 d39 0.99 d46 1.16 d7 1.47 d35 C11: 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a little garters to tie his little hose. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. C2: 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 14. If all seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into great sea, what a splish splash it would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. C311: 11. One misty moisty morning when cloudy was weather, I met an old man clothed all in leather. He began to compliment and I began to grin. How do And how do? And how do again 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. C312: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C313: 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three.

  12. FAUST Cluster 1.2 OUTLIER: 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. WS0= 2 3 13 20 22 25 38 42 44 49 50 DS1= | WS1= 2 20 25 46 49 51 46 | DS2 46 DS0=|WS1= 7 10 17 23 25 28 33 34 37 40 43 45 50 35 |---| |DS2| |35 | OUTLIER: 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. Queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. WS0= 2 3 13 32 38 42 44 52 DS1 |WS1= 42(Mother) 7 9 |DS2|WS2=WS1 11 |7 27 |9 27 29 45 29 32 29 41 45 C1: Mother theme 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. WS0 22 38 44 52 DS1 WS1= 27 38 44 {fiddle(32 41) man(11 32) old(11 44) 11 DS2 32 11 41 22 44 C2 fiddle old man theme 11. One misty moisty morning when cloudy was weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do How do you do? How do you do again 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. DS0|WS1 2 9 12 18 19 21 26 27 30 32 38 39 42 44 45 47 49 52 54 55 57 60 1 |DS1| WS2 12 19 26 39 44 10 |10 | DS2| WS3 13 | | 10 | DS3 10 17 37 14 39 21 41 26 44 28 30 47 50 OUTLIER: 10. Jack and Jill went up hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. DS0| WS1=2 9 18 21 30 38 41 45 47 49 52 54 55 57 60 1 | DS1|WS2=2 9 18 30 39 45 55 13 | 39 |DS2 14 | |39 17 21 39 28 41 30 47 37 50 OUTLIER: 39. A little cock sparrow sat on a green tree. He chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie. Oh no, says the sparrow I will not make a stew. So he flapped his wings\,away he flew C3: men three 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. WS0 38 52 DS1 WS1= 38 52 1 ---------- 5 17 23 28 36 48 C4: 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not k 21. Lion and Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep.Little lamb is on green with woolly fleece so soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! WS0=2 5 8 11 14 15 16 22 24 25 29 31 36 41 44 47 48 53 54 57 59 DS1|WS1(17wds)=2 5 11 15 16 22 24 25 29 31 41 44 47 48 54 57 59 4 6 8|DS2=DS1 12 15 18 21 25 26 30 33 37 43 44 47 49 50 DS0|WS1 2 5 8 13 14 15 16 22 24 25 29 36 41 44 47 48 51 54 57 59 13 |DS2|WS2 4 13 47 51 54 14 |13 |DS3 13 21 26 30 37 47 50 OUTLIER: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. OUTLIERS: 2. This little pig went to market. This little pig stayed home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had little husband no bigger than my thumb. Put him in a pint pot, there I bid him drum. Bought a little handkerchief to wipe his little nose, pair of little garters to tie little hose 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. DS0|WS1=6 7 8 14 43 46 48 51 53 57 2 3|DS2=DS1 16 22 42 Each of the 10 words occur in 1 doc, so all 5 docs are outliers real HOB Alternate WS0, DS0 OUTLIER:38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. Notes Using HOB, the final WordSet is the document cluster theme! When the theme is too long to be meaningful (C4) we can recurse on those (using the opposite DS)|WS0?). The other thing we can note is that DS) almost always gave us an outliers (except for C5) and only WS) almost always gave us clusters (excpt for the first one, 46). What happens if we reverse it? What happens if we just use WS0?

  13. real HOB Alternate WS0, DS0 recuring on C3 and C4 FAUST Cluster 1.2.1 DS0|WS1=41 47 57 (on C4) 21|DS2 WS2=41(morn) 57(way) 26| 37 DS3=DS2 30| 47 . 37 47 50 C4.1 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. WS0=2 5 11 15 16 22 24 25 29 31 44 47 54 59 DS1|WS1=2 5 15 16 22 24 25 44 47 54 59 4 DS2 WS2=2 15 16 24 25 44 47 54 59 6 4 DS3 WS3=WS2 8 6 4 12 8 8 15 12 12 18 21 21 21 25 25 25 26 26 26 30 30 30 43 43 43 50 50 49 50 DS0|WS1= 47 (plum) 21 DS2 WS2=WS1 26 21 30 50 50 C4.2.1 word47(plum) 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! WS0= 2 15 16 23 24 27 36 DS1|WS1 = 2 15 16 25 44 59 4 |DS2 WS2=15 16 25 44 59 8 |4 DS3 WS3=15 16 44 59 12 |8 8 DS4 WS4=15 44 59 25 |12 12 12 DS5 WS544 59 26 |25 25 25 12 DS6=DS5 30 |26 26 26 25 C4.2.2 word44(old) word59(woman) 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. Final WordSet is too long. Recurse 4.2 OUTLIER: 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. WS0= 5 11 22 25 29 31 DS1 WS1=5 22 6 1518 49 DS2 6 C4.2.3 (day eat girl) 4. Little Miss Muffet sat on tuffet, eating curd, whey. Came big spider, sat down beside her, frightened Miss Muffet away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was the reason they did go? I can not tell, for I do not know. 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. DS0|WS1=22 25 29 4 DS2 =WS1 8 |4 8 15|15 18Recursing 18|33 49 no change 33 43 49 DS0|WS1=1 2 3 15 16 23 24 27 30 36 49 60 26 |DS1=DS0 30 Doc26 and doc30 have none of the 12 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 26. Sleep baby sleep. Cottage valley is deep.Little lamb is on green with woolly fleece soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! Cat and the fiddle. Cow jumped over moon.Little dog laughed to see such sport, and dish ran away with spoon. DS0=|WS1=21 38 49 52 1 |DS1 |WS2=21 38 49 14 |1 |DS3=DS2 17 |14 28 |17 C31 [21]cut [38]men [49]run 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! C32: [38]men [52] three 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men cannot put Humpty Dumpty together again. 23. How many miles to Babylon? 3 score miles and 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. WS0=38 52 DS1|WS1=WS0 5 | . 23 28 36 48 Doc43 and doc44 have none of the 6 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. recurse on C3:

  14. HOB Alternate WS0, DS0 FAUST Cluster 1.2.2 eat girl day men 33 4 15 5 49 18 36 8 32 men 11 fiddle old 41 1 run cut 17 men 14 28 three 23 three 48 three morn old 37 12 47 25 way woman 16 OUTLIERS: 2 3 6 10 13 16 22 26 30 35 38 39 42 43 44 46 Categorize clusters (hub-spoke, cyclic, chain, disjoint...)? Separate disjoint sub-clusters? Each of the 3 C423 words gives a disjoint cluster!Each of the 2 C32 work gives a disjoint sub-clusters also. C4231 day 15. Great A. little a. This is pancake day. Toss ball high. Throw ball low. Those come after sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was reason they go? I can not tell, I do not know. C4232 eat 4. Little Miss Muffet sat on tuffet, eat curd, whey. Came big spider, sat down beside her, frightened away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. C4233 girl 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. C1: mother 7. Old Mother Hubbard went to cupboard to give her poor dog a bone. When she got there cupboard was bare, so poor dog had none. She went to baker to buy some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. C2: fiddle old men {cyclic} 11. 1 misty moisty morning when cloudy was weather, Chanced to meet old man clothed all leather. He began to compliment,I began to grin. How do you do How do? How do again 32. Jack come give me your fiddle, if ever you mean to thrive. No I'll not give fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many joyous day fiddle and I've had 41. Old King Cole was merry old soul. Merry old soul was he. He called for his pipe, he called for his bowl, he called for his fiddlers 3. And every fiddler, had a fine fiddle, a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. C11 cut men run {cyclic} 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! C321 men 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men can't put Humpty together again. 36. Little Tommy Tittlemouse lived in little house. He caught fishes in other mens ditches. C322 three 23. How many miles to Babylon? 3 score 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have any wool? Yes sir yes sir, 3 bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. C4.1 morn way 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on cold and frosty morn. This is way wash our hands, wash our hands, wash our hands. This is way wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash r clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. C421 plum 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C422 old woman 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. Let's pause and ask "What are we after?" Of course it depends upon the client. 3 main categories for relatioinship mining? text corpuses, market baskets (includes recommenders), bioinformatics? Others? What do we want from text mining? (anomalies detection, cliques, bicliques?) What do we want from market basket mining? (future purchase predictions, recommendations...) What do we want in bioinformatics? (cliques, strong clusters, ...???)

  15. FAUST Cluster 1.2.3 run 1 30 three 1 cut two old plum cut 23 three 10 fall brown 5 14 21 33 33 21 48 48 men three fall crown buy 48 three girl girl 5 14 king buy plum town 28 old three bread old 32 bake men buy back back 7 13 42 6 23 23 49 49 men 11 fiddle tree maid bad 7 bake sing men men old 41 36 day bake bread house town run 1 36 35 15 43 run dog cloth run 37 17 cloth 11 11 hill cut 37 morn way 14 dog men day old nose morn way day 32 22 eat dish 32 three day high fiddle round old son plum thumb king fiddle 18 44 47 bright wife 47 2 bed cock way old 41 41 pig 3 8 8 41 son round men fiddle pie eat merry mother 17 three clean mother 17 bed run cock always child away 4 26 29 4 old woman 12 away away 30 39 25 away 46 mother 29 old woman 12 25 eat cry mother green baby pie eat boy 9 50 50 boy 28 28 16 bag 9 lamb cry money mother baby 27 mother lamb 38 cry 27 baby full mother mother 45 lady 45 eat men word-labeled document graph We have captured only a few of the salient sub-graphs. Can we capture more of them? Of course we can capture a sub-graph for each word, but that might be 100,000. Let's stare at what we got and try to see what we might wish we had gotten in addition. A bake-bread sub-corpus would have been strong. (docs{7 21 35 42) A bake-bread sub-corpus would have been strong. (docs{7 21 35 42) There are many others. Using AVG+1 2 9 10 25 45 47 d21 0 0 1 0 0 1 d35 0 0 1 1 1 0 d39 1 1 0 0 1 0 d46 1 0 0 1 0 0 d50 0 1 0 1 1 1

  16. HOB2 Alt (use other HOBs) FAUST Cluster 1.2.4 run 1 30 three cut two old 10 fall brown 5 14 21 33 21 48 men three fall crown buy girl king buy plum town old bread bread old bake buy back back 7 13 42 6 23 49 men tree maid bad bake sing men 36 day bake bread house town run 35 35 15 43 dog cloth 37 cloth 11 hill 35 morn way dog day old nose 32 22 eat dish three day high fiddle round old son plum plum thumb king old 18 44 47 bright wife 2 bed cock way 41 pig 3 8 son round men fiddle pie pie eat merry mother 17 pie three clean 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. bed run eat cock always child away 26 29 4 old woman 39 12 away away 30 39 25 away away 46 46 eat child 26 mother old woman 12 39 eat eat eat 46 cry pie boy pie boy green eat baby boy pie baby c o w h l o i d m l a d n 15 44 59 d12 1 1 1 9 50 50 boy 28 16 bag 9 50 lamb cry money mother baby 27 baby lamb 38 cry 27 baby full mother 45 39 lady boy boy 50 boy 28 wAvg+1, dAvg+1 a b b e p p w o r a i l a y e t e u y a m d 2 9 10 25 45 47 d21 0 0 1 0 0 1 d35 0 0 1 1 1 0 d39 1 1 0 0 1 0 d46 1 0 0 1 0 0 d50 0 1 0 1 1 1 recurse: wAv+2,dAvg-1 e p a i t e 2 9 10 25 45 d35 0 0 1 1 1 d39 1 1 0 0 1 d46 1 0 0 1 0 d50 0 1 0 1 1 And if we want to pull out a particular word cluster, just turn the word-pTree into a list.: w=baby a b w a b a y 2 3 d9 0 1 d26 1 1 d27 0 1 d45 1 w=boy a b w o a y 2 9 d28 0 1 d39 1 1 d50 0 1 For a particular doc cluster, just turn the doc-pTree into a list:

  17. FAUST HULL Classification 1 Using the clustering of FAUST Clustering1 as classes, we extract 80% from each class as TrainingSet (w class=cluster#). How accurate is FAUST Hull Classification on the remaining 20% plus the outliers (which should be "Other"). C11={2,3,16,22,42,43} C2 ={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} C11={3} C11={2,16,22,42,43} C311= {11,17,29} C312={13,30,50} C313={10,26,28,41} C2 ={4,14,23,45} C2 ={1,5,8,9,12,15,25,27,32,33,36,37,38,44,47,48} Full classes from slide: FAUST Clustering1 20% Test Set C311= {11,17} C312={30,50} C313={10,28,41} C311= {29} C312={13} C313={26} 80% Training Set OUTLIERS {18,49} {6} {39} {21} {46} {7} {35} O={18 49 6 39 21 46 7 35} .305 .439 C312 D11=C11 p=avC11 L MIN MAX CLASS .63 .63 C11 0 .63 C2 0 0 C311 .31 .31 C312 0 .31 C313 0 C311 .31 C312 0 .22 C11 .44 .66 C313 D2=C2 p=avC2 L MIN MAX CLS 0 .22 C11 .44 .77 C2 .66 .66 C311 .11 .22 C312 .44 .66 C313 -.09 .106 C11 D1=TS p=avTS Lpd MIN MAX CLASS -0.09 .106 C11 0.106 .439 C2 0.572 .572 C311 0.305 .439 C312 0.505 .771 C313 .572 C311 .11 .22 C312 .44 .77 C2 0 .31 C313 .63 C11 .106 .439 C2 .505 .771 C313 .66 C311 0 .63 C2 0 C11 .31 C11 .31 C2 .31 C311 .31 C313 D312=C312 p=avC312 L MN MX CLAS 0 .31 C11 0 .31 C2 0 .31 C311 1.58 1.58 C312 0 .31 C313 0 .22 C11 0 .44 C2 0 .44 C311 D311=C311 p=avC311 L MN MX CLAS 0 0 C11 0 0.66 C2 1.33 1.66 C311 0 0.33 C312 0 0.33 C313 D313=C313 p=avC313 L MN MX CLAS 0 .22 C11 0 .44 C2 0 .44 C311 0.22 .22 C312 1.34 1.56 C313 1.3 1.6 C311 0 .33 C312 0 .33 C313 1.34 1.56 C313 .22 C312 1.58 C312 0 .66 C2 D1=TS p=avTS Sp 4.2 C313 5.4 1.9 C11 2.1 2.4 C311 3.4 4.6 C312 4.7 1.8 C2 3.8 Use Lpd, Sp, Rpd with p=ClassAvg and d=unitized ClassSum. All 6 class hulls separated using Lpd, p=CLavg, D=CLsum. D311 separates C311, D312 separates C312 and D313 separates C313 from all others. D2 separates C11 and C2. Now, remove some false positives with S and R using the same p's and d's: D11=C11 p=avC11 Sp [1.6]C11 [3.4 4 4]C311 [5.4 6]C313 [2.4 4.4]C2 [5]C312 D2=C2 p=avC2 Sp [2 2.3]C11 [4.5 5.8]C313 [1.8 3.5]C2 [5 5.1]C312 [2.5 3.5]C311 D313=C313 p=avC313 Sp [3.5 4.2]C11 [6.5]C312 [2.8 6.2]C2 [3.8 6.2]C311 [2.5 3.5]C313 D311=C311 p=avC311 Sp [1.2]C311 [4.2]C11 [6.2 7.2]C312 [2.2 6.2]C2 [6.2 8.2]C313 D312=C312 p=avC312 Sp [3.5 4.5]C11 [6.5 7.5]C313 [4.5 6.5]C2 [2.5]C312 [5.5]C311 Sp removes a lot of the potential for false positives. (Many of the classes lie a single distance from p.) D11=C11 p=avC11 Rpd [1.2]C11 [1.4 2]C2 [1.7 2]C311 [2.2 2.]]C312 [2.2 2.4]C313 D1=TS p=avTS Rpd [1.3 1.4]C11 [1.3 1.9]C2 [1.5 1.8]C311 [2.1]C312 [2.0 2.2]C313 D2=C2 p=avC2 Rpd [1.3 1.4]C11 [1.3 1.8]C2 [1.6 1.8]C311 [2.2]C312 [2.1 2.4]C313 D313=C313 p=avC313 Rpd [1.3 1.4]C11 [1.3 2]C2 [1.6 2]C311 [2.2]C312 [1.5 1.8]C313 D312=C312 p=avC312 Rpd [1.3 1.4]C11 [1.4 2]C2 [1.7 1.9]C311 [1.5]C312 [2.2 2.4]C313 D311=C311 p=avC311 Rpd [1.4]C11 [1.2 2]C2 [1.1]C311 [2.2]C312 [2.2 2.4]C313 Rpd removes even more of the potential for false positives.

  18. Test Set FAUST Hull Classification 2 (TESTING) D1=TS p=avTS Rpd [1.3 1.4]C11 [1.3 1.9]C2 [1.5 1.8]C311 [2.1]C312 [2.0 2.2]C313 D1=TS p=avTS Sp C11={3} [1.9 2.1]C11 [2.4 3.4]C311 [4.2 5.4]C313 D1=TS p=avTS Lpd [4.6 4.7]C312 [1.8 3.8]C2 C2 ={4,14,23, 45} [.57]C311 [.31 .44]C312 [-.09 .11]C11 [.11 .44]C2 [.51 .77]C313 C311= {29} C312={13} C313={26} D11=C11 p=avC11 Rpd [1.2]C11 [1.4 2]C2 [1.7 2]C311 [2.2 2.]]C312 [2.2 2.4]C313 D11=C11 p=avC11 Sp [1.6]C11 [3.4 4 4]C311 [5.4 6]C313 [2.4 4.4]C2 [5]C312 D11=C11 p=avC11 Lpd [0]C311 [.31]C312 O={18 49 6 39 21 46 7 35} [.63]C11 [0 .31]C313 [0 .63]C2 D2=C2 p=avC2 Lpd D2=C2 p=avC2 Sp [2 2.3]C11 [4.5 5.8]C313 [1.8 3.5]C2 [5 5.1]C312 [2.5 3.5]C311 D2=C2 p=avC2 Rpd [1.3 1.4]C11 [2.1 2.4]C313 [1.3 1.8]C2 [1.6 1.8]C311 [2.2]C312 .[44 .66]C313 [0 .22]C11 [.44 .77]C2 [.66] C311 [.11 .22]C312 D311=C311 p=avC311 Sp [1.2]C311 [4.2]C11 [6.2 7.2]C312 [2.2 6.2]C2 [6.2 8.2]C313 D311=C311 p=avC311 Rpd [1.4]C11 [1.2 2]C2 [1.1]C311 [2.2]C312 [2.2 2.4]C313 D311=C311 p=avC311 Lpd [0]C11 [1.3 1.6]C311 [0 .33]C312 [0 .33]C313 [0 .66]C2 D312=C312 p=avC312 Lpd D312=C312 p=avC312 Sp [3.5 4.5]C11 [6.5 7.5]C313 [4.5 6.5]C2 [2.5]C312 [5.5]C311 D312=C312 p=avC312 Rpd [1.3 1.4]C11 [2.2 2.4]C313 [1.4 2]C2 [1.7 1.9]C311 [1.5]C312 .31 C11 .31 C2 .31 C311 .31 C313 1.58 C312 D313=C313 p=avC313 Rpd [1.3 1.4]C11 [1.3 2]C2 [1.6 2]C311 [2.2]C312 [1.5 1.8]C313 D313=C313 p=avC313 Sp [3.5 4.2]C11 [6.5]C312 [2.8 6.2]C2 [3.8 6.2]C311 [2.5 3.5]C313 D313=C313 p=avC313 Lpd [0 .22]C11 [0 .44]C2 [0 .44]C311 [1.3 1.6]C313 [.22]C312 ε=.8 predicted Class 11 2 2 2 311(all 311|2 all) 312(all 312|313 a Other . . . . . . . Other D=TS Rpd Sp Lpd trueCL Predicted____CLASS Final R S L predicted 1.41 2.19 -0.4 11 d3 2 Oth 11 Other 1.40 2.06 -0.3 2 d4 2 2 11 Other 1.92 3.71 0.01 2 d14 Oth 2 11 Other 1.38 1.99 -0.2 2 d23 2|11 2|11 Oth Other 1.97 3.92 -0.1 311 d29 Oth 312|313 11 Other 2.22 4.99 -0.2 312 d13 313 313 11 Other 2.60 6.78 -0.0 313 d26 Oth Oth 11 Other 1.40 2.13 -0.3 d6 2|11 2 11 Other 2.50 6.37 0.34 d7 313 Oth 2 Other 1.40 2.06 -0.3 d18 2|11 2|11 Oth Other 2.42 5.92 -0.1 d21 313 Oth Oth Other 3.46 12.2 0.47 d35 Oth Oth Oth Other 2.60 6.78 -0.0 d39 Oth Oth 11 Other 2.35 5.57 0.14 d46 Oth Oth 2 Other 1.41 2.19 -0.4 d49 2 2 Oth Other 8/15 = 53% correct just with D=TS p=AvgTS Note: It's likely to get worse as we consider more D's. Let's think about TrainingSet quality resulting from clustering. This a poor quality TrainingSet (from clustering Mother Goose Rythmes. MGR is a difficult corpus to cluster since: 1., in MGR, almost every document is isolated (an outlier), so the clustering is vague (no 2 MGRs deal with the same topic so their word use is quite different.). Instead of tightening the class hulls by replacing CLASSmin and CLASSmax by CLASSfpci (fpci=first percipitous count increase) and CLASSlpcd, we might loosen class hulls (since we know the classes somewhat arbitrary) by expanding the [CLASSmin, CLASSmax] interval as follows: Let A = Avg{ClASSmin, CLASSmax} and R (for radius) = A-CLASSmin (=CLASSmax-A also). Use [A-R-ε, A+R+ε]. Let ε=.8 increases accuracy to 100% (assuming all Other stay Other.). Finally, it occurs to me that Clustering to produce a TrainingSet, then setting aside a TestSet gives a good way to measure the quality of the clustering. If the TestSet part classifies well under the TrainingSet part, the clustering must have been high quality (produced a good TrainingSet for classification). This clustering quality test method is probably not new (check the literature?). If it is new, we might have a paper here? (discuss this quality measure and assess using different ε's?)

  19. WP Wed 11/26 Yes, we have discovered also that one has to think about the quality of the training set.   If it is very high quality (expected to fully encompass all borderline cases of all classes) then using exact gap endpoints is probably wise, but if there is reason to worry about the comprehensiveness of the training set (e.g., when there are very few training samples - which is often the case in medical expert systems where getting a sufficient number of training samples is difficult and expensive), then it is probably better to move the cutpoints toward the midpoint (reflecting the vagueness of training set class boundaries).  What does one use to decide how much to move away from the endpoints?  That's not an easy question.  Cluster deviation seems like a useful measure to employ. One last though on how to decide whether to cut at gap midpoint, endpoints, or to move the cut-points away from the endpoints toward the midpoint, If one has a time-stamp on training samples, one might assess the "class endpoint" change rate over time. As the training set gets larger and larger, if an endpoint stops moving much and isn't an outlier, then cutting at the endpoint seems wise.   If an endpt is still changing a lot, then moving away from that endpoint seems wise (maybe based on the rate of change of that endpoint as well as other measures?). A complete subgraph is a clique. A maximal clique is not a proper subset of any other clique. In G=(X,Y,E), a bipartite graph, a clique (Sx, Sy) is a complete bipartite subgraph induced by bipartite vertex set (Sx, Sy). The Consensus Set or clique of Sx, CLQ(Sx) = xSxNy(x), i.e., the set of all y's that are adjacent (edge connected) to every x in Sx. Clearly, (Sx, CLQ(Sx)) is a clique. Thm2:  SyY s.t. CLQ(Sy) ( CLQ(Sy), CLQ(CLQ(Sy)) ) is maximal. Thm1: (Sx, Sy) is a maximal clique iff Sy=CLQ(Sx) and Sx=CLQ(Sy) Find all cliques starting with Sy=singletons. Then examine Sy1y2-doubletons s.t. Px(Sy1y2) Then tripletons etc. Examining MGRs, (x=docs, y=words) all singleton wordsets, Sy, form a nonempty clique. AND pairwise to find all nonempty doubleton wordset cliques, Sy1y2. AND those nonempty doubleton wordset with each other singleton wordset to find all nonempty tripleton wordset cliques, Sy1y2y3... Start w singleton docs, incl another... until . The last nonempty set is a max-clique and all subsets are cliques. Remove them. Iterate. 7 13 w4 7 35 42 w7 7 35 w10 7 13 33 45 w13 7 30 43 w24 7 9 23 29 45 w42 7 10 11 12 25 41 w44 7 13 w4 w13 7 35 w7 w10 7 45 w13 w42 10 11 12 25 41 w44 10 14 w42 10 10 44 w32 10 21 w12 w19 2 37 w57 2 46 w45 2 47 w57 2 37 47 w57 #CLQs #docs #words 13 2 2 1 6 1 7 5 1 9 4 1 23 3 1 48 2 1 4 8 w25 4 29 w2 4 30 w2 4 35 w25 4 39 w2 4 46 w2 w25 4 50 w25 4 8 35 46 50 w25 4 29 30 39 46 w2 1 8 w58 1 14 w21 1 17 w49 1 23 w52 1 28 w52 1 30 w49 1 41 w52 1 46 w49 1 48 w52 1 8 none 1 14 none 1 17 30 46 w49 1 23 28 41 48 w52 1 28 23 41 48 1 30 17 46 1 41 23 28 48 1 46 17 30 1 48 23 28 41 11 12 25 41 w44 11 14 17 32 36 w38 11 35 37 w17 3 13 w51 3 29 w8 3 46 w51 3 47 w8 3 13 46 w51 3 29 47 w8 12 25 41 w44 12 25 w59 12 26 w15 12 25 w44 w59 5 10 14 w26 5 11 17 32 36 w38 5 36 41 w34 5 36 w34 w38 8 26 w16 8 35 46 50 w25 9 26 27 45 w3 9 27 29 45 w42 9 44 w35 9 45 w3 w42 6 15 18 32 w22 6 49 w5 • There is something wrong here. • This does not find all maximal cliques. • Next I try the following logic: • Find all 1WdC (1 Word Cliques). • A kWdC contains each of k (k-1)WdCs, so of a (k-1) wordset is not the wordset of a clique than none of its supersets are either (downward closure property). • Thus, the wordset of any 2WdCs can be composed by unioning the wordsets of two 1WdCs and any k WdCwordset is the union o f a (k-1)WdCwordset with a 1WdCwordset. 18 32 w22 18 44 w10 13 21 50 w47 13 23 w4 13 33 45 w13 13 21 43 w54 13 46 w51 13 21 w47 w54 14 17 32 36 w47 14 39 w55 16 33 37 w48 16 28 w6 23 28 41 48 w52 27 35 w43 27 50 w53 21 35 w10 21 42 w4 21 43 w54 21 50 w47 25 41 w44 17 30 46 w42 17 32 36 w38 17 39 47 w18 17 48 w56 28 35 w28 28 38 46 w20 28 39 50 w9 28 41 48 w52 15 18 32 w22 15 35 w50 15 44 w31 26 27 45 w3 26 28 w60 26 29 w1 26 38 w36 26 39 w30 22 35 w43 22 50 w53 29 45 w28 29 47 w8 29 30 39 46 w2 33 37 w48 33 45 w13 33 49 w29 37 47 w41 w57 45 38 46 w57 46 39 41 w39 39 46 w2 39 47 w18 39 50 w9 w45 47 50 w25 30 32 41 w27 30 35 w23 30 43 w24 30 46 w2 w49 35 36 w33 35 37 w17 35 38 w40 35 39 w45 35 41 w34 35 42 w7 48 49 41 48 w52 32 36 w22 32 41 w27 50 44 42 43 36

  20. 1 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 d1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 d26 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 d27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 d29 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 AND with w1 For wh=w2…w60 ,I show only the 1-counts of wh&wk k>h 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2 2 2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 6 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 7 7 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 3 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 10 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 3 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 13 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 3 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 2 16 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 6 21 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 26 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 27 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 28 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 4 29 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 30 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 3 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 3 33 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1335 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 2 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 7 39 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 42 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 4 44 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 3 45 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 46 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 4 47 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 49 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 5 50 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 & w1 1 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 00 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 & w2 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 2 0 1 0 0 1 0 0 00 0 0 0 0 1 0 0 1 0 0 1 1 0 0 2 0 1 0 0 0 1 0 0 0 0 0 & w3 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 • A first goal of our team might be to implement (in optimized Treeminer parallel code ), that is, a maximal complete subgraph finder (maximal clique finder). The benefits would be substantial! • This would be an exercise in parallel programming (e.g., in a TreeminerHadoop environment). • This is a typical exponential growth case. If you can find an engineering breakthru here, it will be a breakthru for a massive collection of similar existing big data parallel programming problems . & w4 0 0 1 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 & w5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w7 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 & w8 0 0 2 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 & w10 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w11 0 1 1 1 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 & w12 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w13 0 1 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w14 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 What is the next step? We would have to have all the wh&wk, k>h results (not just the 1 counts) but that would have taken about 60 more slides to display ;-( Just looking at the pTee results of w1&wk k>1 above, we see that even though w1&w2 and w1&w3 both have counts=1, their AND (which is w1&w2&w3) has 0 count and therefore we need not consider any combinations of the type w1&w2&w3&… by the downward closure). In fact, the only wh for which we need to look further is h=42. Note that ct(w1&w2&w42)=1 but all other ct(w1&w2&wh)=0 The only maximal clique involving w1 and w2 is {DocSet={29}, WordsSet={1,2,42}, right? Next we would look at the pTrees of w2&wk, k>2. Clearly we only need to consider the 16 WDpTrees {8,9,18,20,23,24,25,27,30,39,42,45,46,49,51,55}, not all 58 of them. And going down to w30 the WDpTreeSet is (30,48} only To appreciate that we need engineering breakthroughs here, recall that a typical vocabulary might be 100,000 words, not just 60. & w15 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w16 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 & w17 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 & w18 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 & w19 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 & w20 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w21 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 1 & w22 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 & w23 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w24 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 & w25 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 & w26 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 2 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 & w27 0 0 0 0 0 1 0 1 0 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 & w28 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 & w29 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 & w30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 & w31 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 & w32 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w33 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w34 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w35 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 & w36 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w37 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 & w38 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 & w39 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 & w40 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 & w41 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w42 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 & w43 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w43 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 & w44 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 & w45 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 & w46 0 0 1 0 1 0 0 0 0 0 1 0 0 0 & w47 0 0 0 1 0 1 2 0 0 0 0 0 0 & w48 0 0 0 0 0 0 0 0 1 0 0 0 & w49 0 1 1 0 0 0 1 0 1 0 0 & w50 0 0 0 0 0 0 0 0 0 0 & w51 0 0 1 0 0 0 0 0 0 & w52 0 0 0 1 0 1 0 1 & w53 0 0 0 0 0 0 0 & w54 0 0 0 0 0 0 & w55 0 0 0 0 0 & w56 0 0 0 0 & w57 0 0 0 & w58 0 0 & w59 0

  21. 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 1 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 d1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 d26 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 d27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 d29 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 1 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1 3 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1 3 15 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1,30 3 15 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1,30 3,36 15 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 1 2 8 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 1 60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 1,30 3,36 15,60 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 25 26 27 28 29 30 32 33 35 36 37 38 39 41 42 43 44 45 46 47 48 49 50 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2 2 2 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 6 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 7 7 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 3 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 10 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 3 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 13 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 3 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 2 16 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 6 21 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 26 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 27 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 28 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 4 29 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 30 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 3 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 3 33 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1335 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 2 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 7 39 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 42 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 4 44 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 3 45 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 46 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 4 47 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 49 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 5 50 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 & w1 1 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 00 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 & w2 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 2 0 1 0 0 1 0 0 00 0 0 0 0 1 0 0 1 0 0 1 1 0 0 2 0 1 0 0 0 1 0 0 0 0 0 & w3 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 & w4 0 0 1 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 & w5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w7 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 & w8 0 0 2 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 & w10 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w11 0 1 1 1 0 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 & w12 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w13 0 1 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w14 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 We can discard those subWordSets that have the same DocSet & w15 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w16 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 & w17 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 & w18 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 & w19 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 W1&wW 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 920 1 2 3 4 5 6 7 8 930 1 2 3 4 5 6 7 8 940 1 2 3 4 5 6 7 8 950 1 2 3 4 5 6 7 8 960 & w20 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 & w21 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 1 & w22 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 25 26 27 28 29 30 32 33 35 36 37 38 39 41 42 43 44 45 46 47 48 49 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w23 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w24 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 & w25 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 & w26 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 2 1 1 0 1 1 1 0 1 0 0 0 0 1 0 0 & w27 0 0 0 0 0 1 0 1 0 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 & w28 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 & w29 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 & w30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 & w31 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 & w32 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w33 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w34 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w35 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 & w36 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 & w37 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 & w38 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 & w39 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 & w40 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 & w41 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 & w42 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 & w43 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 One possible way to process is to start with w1CountVector: & w43 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 & w44 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 & w45 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 & w46 0 0 1 0 1 0 0 0 0 0 1 0 0 0 & w47 0 0 0 1 0 1 2 0 0 0 0 0 0 & w48 0 0 0 0 0 0 0 0 1 0 0 0 & w49 0 1 1 0 0 0 1 0 1 0 0 & w50 0 0 0 0 0 0 0 0 0 0 & w51 0 0 1 0 0 0 0 0 0 & w52 0 0 0 1 0 1 0 1 & w53 0 0 0 0 0 0 0 & w54 0 0 0 0 0 0 & w55 0 0 0 0 0 & w56 0 0 0 0 & w57 0 0 0 & w58 0 0 & w59 0

More Related