Learning the Relative Importance of Features in Image Data

Learning the Relative Importance of Features in Image Data Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007

Introduction • Scientific domains • Images from phenomena • Image Features • Visual Features • Metadata Features • Comparison of Images • Based on features Silicon Nanopore Herb Leaf

Motivation • Consider a similarity search process • Some features more important than others • Experts have subjective notions of comparison • Need to learn feature-based distance function Target Image Source Images

Goals • Given • Training data on images and their applicable features • Learn • Distance function for image comparison • Function should preserve relative importance of features in the domain

Proposed Approach: FeaturesRank • Input • Training samples: pairs of images • Level of similarity for each pair • Distance function: weighted sum of features • Process: Iterative approach • Cluster images in levels using distance function • Error: difference between similarity levels in clusters and samples • Adjust distance function based on error • Output • Distance function giving minimal error

Process of Learning • Use a clustering algorithm • Notion of distance • Δ = ∑f=1 to F αf Δf • Features given as inputs • Guess initial weights • Cluster images in L levels • L = number of levels in samples Clusters

Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • Error pair: level of similarity in clusters not equal to level of similarity in samples • Error: ratio of number of error pairs over total number of pairs • Error threshold: fraction of total number of pairs allowed to be error pairs Training Samples Clusters

Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • If level of similarity of pair in clusters greater than in samples • Images considered closer to each other in clusters than they should be • To push them apart, increase weights of some features in distance function Training Samples Clusters

Process of Learning • Step: Difference between similarity levels • |Level of similarity in training samples – Level of similarity in clusters| • Step = | LT (Ia, Ib) – LC (Ia,Ib) | • Blame: Responsibility of a feature for error • Distance due to feature f / Total distance between images • Blame = Δf (Ia, Ib) / Δ (Ia, Ib) • Feature Weight Heuristic • To increase weights • New weight of feature f = Old weight + Step*Blame • Conversely, to decrease weights • New weight = Old weight – Step*Blame

Process of Learning • Consider effect of each error pair and adjust weights • Use adjusted distance function for another iteration of clustering • Repeat until error below threshold or maximum number of iterations reached • Output the distance function giving lowest error

Experimental Evaluation • Real Images from Nanotechnology and Bioinformatics used for evaluation • Parameters: error threshold 0.1 to 0.05, maximum number of iterations = 1000, clustering seeds altered • Training Data • Nanotechnology: 60 images, 3 levels of similarity • Bioinformatics: 40 images, 2 levels of similarity • User Study with Test Data • Similarity search performed using learned distance function • Experts evaluate effectiveness of results

Learning Behavior: Nanotechnology • Convergence to error below threshold in less than 300 iterations • Experiments with 5% threshold take longer to converge than 10% • Not much difference in behavior with random and equal initial weights Random Initial Weights Equal Initial Weights

Learning Behavior: Bioinformatics • Error in bioinformatics data fluctuates more than in nanotechnology data • Possible reasons • Fewer images were used as training samples • Fewer levels of similarity were used • Other observations similar to nanotechnology data Random Initial Weights Equal Initial Weights

Similarity Search • Using learned distance function, target image compared with source images in distinct test set • Top 4 matches ranked in order of similarity • Experts verify that ranking is accurate Nanotechnology Target Image Top 4 Matches among Source Images Bioinformatics

Conclusions • Contributions of this work • FeaturesRank approach proposed to learn distance function for relative importance of features in images • Learned distance function assessed by ranking images for similarity search with real data from nanotechnology and bioinformatics • Ongoing work • Defining objective measures for accuracy • Performing comparative studies with state-of-the-art

Learning the Relative Importance of Features in Image Data

Learning the Relative Importance of Features in Image Data

Presentation Transcript

The Importance of Learning Strategies in ELT

Accounting for the relative importance of objects in image retrieval

The Importance of Learning Strategies in ELT

The importance of phase in image processing

Image Features

Accounting for the relative importance of objects in image retrieval

The Importance of Data

Image Classification: Features, Algorithms or Data?

The Importance of Relationships in Early Learning

Image Features

The relative importance of DOTS

Relative importance

The Importance of Learning Strategies in ELT

Applications of Relative Importance

Importance of Image Restoration

The Importance of Learning Statistics

The importance of Learning Mathematics

Importance of Knowledge In Learning

Applications of Relative Importance

The Importance of Books in Early Learning