1 / 15

Learning the Relative Importance of Features in Image Data

Learning the Relative Importance of Features in Image Data. Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007. Introduction. Scientific domains Images from phenomena Image Features Visual Features

gustav
Download Presentation

Learning the Relative Importance of Features in Image Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning the Relative Importance of Features in Image Data Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007

  2. Introduction • Scientific domains • Images from phenomena • Image Features • Visual Features • Metadata Features • Comparison of Images • Based on features Silicon Nanopore Herb Leaf

  3. Motivation • Consider a similarity search process • Some features more important than others • Experts have subjective notions of comparison • Need to learn feature-based distance function Target Image Source Images

  4. Goals • Given • Training data on images and their applicable features • Learn • Distance function for image comparison • Function should preserve relative importance of features in the domain

  5. Proposed Approach: FeaturesRank • Input • Training samples: pairs of images • Level of similarity for each pair • Distance function: weighted sum of features • Process: Iterative approach • Cluster images in levels using distance function • Error: difference between similarity levels in clusters and samples • Adjust distance function based on error • Output • Distance function giving minimal error

  6. Process of Learning • Use a clustering algorithm • Notion of distance • Δ = ∑f=1 to F αf Δf • Features given as inputs • Guess initial weights • Cluster images in L levels • L = number of levels in samples Clusters

  7. Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • Error pair: level of similarity in clusters not equal to level of similarity in samples • Error: ratio of number of error pairs over total number of pairs • Error threshold: fraction of total number of pairs allowed to be error pairs Training Samples Clusters

  8. Process of Learning P1: (I1,I16), LT(P1) = 2 P2: (I5,I14), LT(P2) = 1 P3: (I2,I3), LT(P3) = 0 P4: (I6,I18), LT(P4) = 1 P5: (I7,I9), LT(P5) = 0 P6: (I12,I19), LT(P6) = 2 P7: (I17,I20), LT(P7) = 1 P8: (I4,I11), LT(P8) = 3 P9: (I8,I10), LT(P9) = 2 P10: (I13,I15), LT(P10) = 3 • If level of similarity of pair in clusters greater than in samples • Images considered closer to each other in clusters than they should be • To push them apart, increase weights of some features in distance function Training Samples Clusters

  9. Process of Learning • Step: Difference between similarity levels • |Level of similarity in training samples – Level of similarity in clusters| • Step = | LT (Ia, Ib) – LC (Ia,Ib) | • Blame: Responsibility of a feature for error • Distance due to feature f / Total distance between images • Blame = Δf (Ia, Ib) / Δ (Ia, Ib) • Feature Weight Heuristic • To increase weights • New weight of feature f = Old weight + Step*Blame • Conversely, to decrease weights • New weight = Old weight – Step*Blame

  10. Process of Learning • Consider effect of each error pair and adjust weights • Use adjusted distance function for another iteration of clustering • Repeat until error below threshold or maximum number of iterations reached • Output the distance function giving lowest error

  11. Experimental Evaluation • Real Images from Nanotechnology and Bioinformatics used for evaluation • Parameters: error threshold 0.1 to 0.05, maximum number of iterations = 1000, clustering seeds altered • Training Data • Nanotechnology: 60 images, 3 levels of similarity • Bioinformatics: 40 images, 2 levels of similarity • User Study with Test Data • Similarity search performed using learned distance function • Experts evaluate effectiveness of results

  12. Learning Behavior: Nanotechnology • Convergence to error below threshold in less than 300 iterations • Experiments with 5% threshold take longer to converge than 10% • Not much difference in behavior with random and equal initial weights Random Initial Weights Equal Initial Weights

  13. Learning Behavior: Bioinformatics • Error in bioinformatics data fluctuates more than in nanotechnology data • Possible reasons • Fewer images were used as training samples • Fewer levels of similarity were used • Other observations similar to nanotechnology data Random Initial Weights Equal Initial Weights

  14. Similarity Search • Using learned distance function, target image compared with source images in distinct test set • Top 4 matches ranked in order of similarity • Experts verify that ranking is accurate Nanotechnology Target Image Top 4 Matches among Source Images Bioinformatics

  15. Conclusions • Contributions of this work • FeaturesRank approach proposed to learn distance function for relative importance of features in images • Learned distance function assessed by ranking images for similarity search with real data from nanotechnology and bioinformatics • Ongoing work • Defining objective measures for accuracy • Performing comparative studies with state-of-the-art

More Related