Visualization and navigation of document information spaces using a self organizing map
Download
1 / 33

Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map. Daniel X. Pape Community Architectures for Network Information Systems dpape@canis.uiuc.edu www.canis.uiuc.edu CSNA’98 6/18/98. Overview. Self-Organizing Map (SOM) Algorithm

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map' - rico


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Visualization and navigation of document information spaces using a self organizing map

Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map

Daniel X. Pape

Community Architectures for Network Information Systems

dpape@canis.uiuc.edu

www.canis.uiuc.edu

CSNA’98

6/18/98


Overview
Overview Using a Self-Organizing Map

  • Self-Organizing Map (SOM) Algorithm

  • U-Matrix Algorithm for SOM Visualization

  • SOM Navigation Application

  • Document Representation and Collection Examples

  • Problems and Optimizations

  • Future Work


Basic som algorithm
Basic SOM Algorithm Using a Self-Organizing Map

  • Input

    • Number (n) of Feature Vectors (x)

    • format:

      vector name: a, b, c, d

    • examples:

      1: 0.1, 0.2, 0.3, 0.4

      2: 0.2, 0.3, 0.3, 0.2


Basic som algorithm1
Basic SOM Algorithm Using a Self-Organizing Map

  • Output

    • Neural network Map of (M) Nodes

    • Each node has an associated Weight Vector (m) of the same dimensionality as the input feature vectors

    • Examples:

      m1: 0.1, 0.2, 0.3, 0.4

      m2: 0.2, 0.3, 0.3, 0.2


Basic som algorithm2
Basic SOM Algorithm Using a Self-Organizing Map

  • Output (cont.)

    • Nodes laid out in a grid:


Basic som algorithm3
Basic SOM Algorithm Using a Self-Organizing Map

  • Other Parameters

    • Number of timesteps (T)

    • Learning Rate (eta)


Basic som algorithm4
Basic SOM Algorithm Using a Self-Organizing Map

SOM() {

foreach timestep t {

foreach feature vector fv {

wnode = find_winning_node(fv)

update_local_neighborhood(wnode)

}

}

}

find_winning_node() {

foreach node n {

compute distance of m to feature vector

}

return node with the smallest distance

}

update_local_neighborhood(wnode) {

foreach node n {

m = m + eta [x - m]

}

}


U matrix visualization
U-Matrix Visualization Using a Self-Organizing Map

  • Provides a simple way to visualize cluster boundaries on the map

  • Simple algorithm:

    • for each node in the map, compute the average of the distances between its weight vector and those of its immediate neighbors

  • Average distance is a measure of a node’s similarity between it and its neighbors


U matrix visualization1
U-Matrix Visualization Using a Self-Organizing Map

  • Interpretation

    • one can encode the U-Matrix measurements as greyscale values in an image, or as altitudes on a terrain

    • landscape that represents the document space: the valleys, or dark areas are the clusters of data, and the mountains, or light areas are the boundaries between the clusters


U matrix visualization2
U-Matrix Visualization Using a Self-Organizing Map

  • Example:

    • dataset of random three dimensional points, arranged in four obvious clusters


U matrix visualization3
U-Matrix Visualization Using a Self-Organizing Map

Four (color-coded) clusters of three-dimensional points


U matrix visualization4
U-Matrix Visualization Using a Self-Organizing Map

Oblique projection of a terrain derived from the U-Matrix


U matrix visualization5
U-Matrix Visualization Using a Self-Organizing Map

Terrain for a real document collection


Current labeling procedure
Current Labeling Procedure Using a Self-Organizing Map

  • Feature vectors are encoded as 0’s and 1’s

  • Weight vectors have real values from 0 to 1

  • Sort weight vector dimensions by element value

    • dimension with greatest value is “best” noun phrase for that node

  • Aggregate nodes with the same “best” noun phrase into groups


Umatrix navigation
Umatrix Navigation Using a Self-Organizing Map

  • 3D Space-Flight

  • Hierarchical Navigation


Document data
Document Data Using a Self-Organizing Map

  • Noun phrases extracted

  • Set of unique noun phrases computed

    • each noun phrase becomes a dimension of the data set

  • Each document represented by a binary vector with a 1 or a 0 denoting the existence or absence of each noun phrase


Document data1
Document Data Using a Self-Organizing Map

  • Example:

    • 10 total noun phrases:

      alexander, king, macedonians, darius, philip, horse, soldiers, battle, army, death

    • each element of the feature vector will be a 1 or a 0:

      • 1: 1, 1, 0, 0, 1, 1, 0, 0, 0, 0

      • 2: 0, 1, 0, 1, 0, 0, 1, 1, 1, 1


Document collection examples
Document Collection Examples Using a Self-Organizing Map


Problems
Problems Using a Self-Organizing Map

  • As document sets get larger, the feature vectors get longer, use more memory, etc.

  • Execution time grows to unrealistic lengths


Solutions
Solutions? Using a Self-Organizing Map

  • Need algorithm refinements for sparse feature vectors

  • Need a faster way to do the find_winning_node() computation

  • Need a better way to do the update_local_neighborhood() computation


Sparse vector optimization
Sparse Vector Optimization Using a Self-Organizing Map

  • Intelligent support for sparse feature vectors

    • saves on memory usage

    • greatly improves speed of the weight vector update computation


Faster find winning node
Faster find_winning_node() Using a Self-Organizing Map

  • SOM weight vectors become partially ordered very quickly


Faster find winning node1
Faster find_winning_node() Using a Self-Organizing Map

U-Matrix Visualization of an Initial, Unordered SOM


Faster find winning node2
Faster find_winning_node() Using a Self-Organizing Map

Partially Ordered SOM after 5 timesteps


Faster find winning node3
Faster find_winning_node() Using a Self-Organizing Map

  • Don’t do a global search for the winner

  • Start search from last known winner position

  • Pro:

    • usually finds a new winner very quickly

  • Con:

    • this new search for a winner can sometimes get stuck in a local minima


Better neighborhood update
Better Neighborhood Update Using a Self-Organizing Map

  • Nodes get told to “update” quite often

  • Weight vector is made public only during a find_winner() search

  • With local find_winning_node() search, a lazy neighborhood weight vector update can be performed


Better neighborhood update1
Better Neighborhood Update Using a Self-Organizing Map

  • Cache update requests

    • each node will store the winning node and feature vector for each update request

  • The node performs the update computations called for by the stored update requests only when asked for its weight vector

  • Possible reduction of number of requests by averaging the feature vectors in the cache


New execution times
New Execution Times Using a Self-Organizing Map


Future work
Future Work Using a Self-Organizing Map

  • Parallelization

  • Label Problem


Label problem
Label Problem Using a Self-Organizing Map

  • Current Procedure not very good

  • Cluster boundaries

  • Term selection


Cluster boundaries
Cluster Boundaries Using a Self-Organizing Map

  • Image processing

  • Geometric


Cluster boundaries1
Cluster Boundaries Using a Self-Organizing Map

  • Image processing example:


Term selection
Term Selection Using a Self-Organizing Map

  • Too many unique noun phrases

    • Too many dimensions in the feature vector data

  • “Knee” of frequency curve