Multilinear algebra for analyzing data with multiple linkages
Download
1 / 26

Multilinear Algebra for Analyzing Data with Multiple Linkages - PowerPoint PPT Presentation


  • 268 Views
  • Uploaded on

Multilinear Algebra for Analyzing Data with Multiple Linkages. Tamara G. Kolda In collaboration with: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs MMDS, Stanford, CA, June 21-24, 2006. PageRank Brin & Page (1998) Page, Brin, Motwani, Winograd (1999)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multilinear Algebra for Analyzing Data with Multiple Linkages' - wan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multilinear algebra for analyzing data with multiple linkages l.jpg

Multilinear Algebra for Analyzing Data with Multiple Linkages

Tamara G. Kolda

In collaboration with: Brett Bader, Danny Dunlavy, Philip Kegelmeyer

Sandia National Labs

MMDS, Stanford, CA, June 21-24, 2006


Linear algebra plays an important role in graph analysis l.jpg

PageRank Linkages

Brin & Page (1998)

Page, Brin, Motwani, Winograd (1999)

HITS (hubs and authorities)

Kleinberg (1998/99)

Latent Semantic Indexing (LSI)

Dumais, Furnas, Landauer, Deerwester, and Harshman (1988)

Deerwester, Dumais, Landauer, Furnas, and Harshman (1990)

Terms

d1

car

d2

service

d3

military

Documents

repair

Linear Algebra plays an important role in Graph Analysis

One Use of LSI: Maps terms and documents to the “same” k-dimensional space.


Multi linear algebra can be used in more complex graph analyses l.jpg

Nodes (one type) connected by multiple types of links Linkages

Node x Node x Connection

Two types of nodes connected by multiple types of links

Node A x Node B x Connection

Multiple types of nodes connected by a single link

Node A x Node B x Node C

Multiple types of nodes connected by multiple types of links

Node A x Node B x Node C x Connection

Etc…

Multi-Linear Algebra can be used in more complex graph analyses


Analyzing publication data term x doc x author l.jpg

term Linkages

author

doc

Analyzing Publication Data:Term x Doc x Author

1999-2004 SIAM Journal Data (except SIREV)

Terms must appear in at least 3 documents and no more than 10% of all documents. Moreover, it must have at least 2 characters and no more than 30.

Form tensor X as:

Element (i,j,k) is nonzero only if author k wrote document j using term i.

6928 terms

4411 documents

6099 authors

464645 nonzeros


A tensor is a multidimensional array l.jpg

J Linkages

An I£J£K tensor

K

xijk

I

J

A tensor is a multidimensional array

An I x J matrix

  • Other names for tensors…

    • Multi-way array

    • N-way array

  • The “order” of a tensor is the number of dimensions

  • Other names for dimension…

    • Mode

    • Way

  • Example

    • The matrix A (at left) has order 2.

    • The tensor X (at left) has order 3 and its 3rd mode is of size K.

aij

I

Notation

scalar

vector

matrix

tensor


Tensor fibers generalize the concept of rows and columns l.jpg
Tensor “fibers” generalize the concept of rows and columns

“Slice”

Column Fibers

Row Fibers

Tube Fibers

Note

There’s no naming scheme past 3 dimensions; instead, we just say, e.g., the 4th-mode fibers.


Tucker decomposition l.jpg
Tucker Decomposition columns

K x T

C

I x J x K

I x R

J x S

A

=

B

R x S x T

  • Proposed by Tucker (1966)

  • Also known as: Three-mode factor analysis, three-mode PCA, orthogonal array decomposition

  • A, B, and C may be orthonormal (generally assume they have full column rank)

  • G is not diagonal

  • Not unique


Candecomp parafac l.jpg
CANDECOMP/PARAFAC columns

  • CANDECOMP = Canonical Decomposition (Carroll and Chang, 1970)

  • PARAFAC = Parallel Factors (Harshman, 1970)

  • Columns of A, B, and C are not orthonormal

  • If R is minimal, then R is called the rank of the tensor (Kruskal 1977)

  • Can have rank(X) > min{I,J,K}

K x R

C

I x J x K

I x R

J x R

B

A

=

+ …

=

+

+

I

R x R x R


Combining tucker and parafac l.jpg

Want: columns

Have:

Combining Tucker and PARAFAC

Step 1: Choose orthonormal compression matrices for each dimension:

¼

Step 2: Form reduced tensor (implicitly)

Step 3: Compute PARAFAC on reduced tensor

¼

+

+

Step 4: Convert to PARAFAC of full tensor


Matricize x n l.jpg

5 7 columns

6 8

1 32 4

Matricize: X(n)

The nth-mode fibers are rearranged to be the columns of a matrix


Tucker and parafac matrix representations l.jpg
Tucker and PARAFAC Matrix Representations columns

Fact 1:

Fact 2:

Khatri-Rao Matrix Product (Columnwise Kronecker Product):

Special pseudu-inverse structure:


Implicit compressed parafac als l.jpg

Want: columns

Have:

with

Implicit Compressed PARAFAC ALS

Consider the problem of fixing the 2nd and 3rd factors and solving just for the 1st.

Update columnwise


Back to the problem term x doc x author l.jpg

term columns

author

doc

Back to the Problem:Term x Doc x Author

Terms must appear in at least 3 documents and no more than 10% of all documents. Moreover, it must have at least 2 characters and no more than 30.

Form tensor X as:

6928 documents

4411 terms

6099 authors

464645 nonzeros

Element (i,j,k) is nonzero only if author k wrote document j using term i.


Original problem is overly sparse l.jpg
Original problem is “overly” sparse columns

Result: Resulting tensor has just a few nonzero columns in each lateral slice.

term

author

doc

Experimentally, PARAFAC seems to overfit such data and not do a good job of “mixing” different authors.


Compression matrices parafac l.jpg
Compression Matrices columns& PARAFAC

(rank 100)

(rank 100)

Run rank-100 PARAFAC on compressed tensor.

Reassemble results.


Three way fingerprints l.jpg
Three-Way Fingerprints columns

  • Each of the Terms, Docs, and Authors has a rank-k (k=100) fingerprint from the PARAFAC approximation

  • All items can be directly compared in “concept space”

  • Thus, we can compare any of the following

    • Term-Term

    • Doc-Doc

    • Term-Doc

    • Author-Author

    • Author-Term

    • Author-Doc

  • The fingerprints can be used as inputs for clustering, classification, etc.


Matlab results l.jpg
MATLAB Results columns

  • Go to MATLAB


Wrap up l.jpg
Wrap-Up columns

  • Higher-order LSI for term-doc-author tensor

  • Tucker-PARAFAC combination for sparse tensors

    • Spasre Tensor Toolbox (release summer 2006)

  • Mathematical manipulations

    • Kolda, Tech. Rep. SAND2006-2081

  • Thanks to Kevin Boyack for journal data

  • For more info: Tammy Kolda, tgkolda@sandia.gov

Dunlavy, Kolda, Kegelmeyer, Tech. Rep. SAND2006-2079

Kolda, Bader, Kenny, ICDM05