1 / 90

Visual Computing Theory and Engineering

Visual Computing Theory and Engineering. Topic: Descriptors. Group Members: 马悦 郭超世 胡欢武 刘国超 宋志超 王丹 肖勖 徐阳 杨一诚 朱璐瑶 白立勋. Descriptors.

pfred
Download Presentation

Visual Computing Theory and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Computing Theory and Engineering Topic: Descriptors Group Members:马悦 郭超世 胡欢武 刘国超 宋志超 王丹 肖勖 徐阳 杨一诚 朱璐瑶 白立勋

  2. Descriptors • In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. • They describe elementary characteristics such as the shape, the color, the texture or the motion, among others.

  3. MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching 马悦

  4. Introduction • MatchNet:a patch matching system • Propose and evaluate a unified approach for patch based image matching that jointly learns: • A deep convolutional neural network for local patch representation • A network for robust feature comparison

  5. Contributions • A new state-of-the-art system for patch-based matching using deep convolutional networks that significantly improves on the previous results. • Improved performance over the previous state of the art using smaller descriptors. • Provide a public release of MatchNet trained using our own large collection of patches.

  6. Network Architecture

  7. Result

  8. Local Convolutional Features with Unsupervised Training for Image Retrieval 115034910135 王丹

  9. Summary • Aim stereo-matching content based image retrieval. • Contribution patch-level descriptors —— patch-CKN unsupervised framework • Dataset RomePatches

  10. RomePatches • Top: examples of matching patches • Bottom: Images of the same bundle

  11. Image Retrieval Pipeline • Interest point detection same with changes in viewpoint or illumination Hessian-Affine detector • Interest point description a normalized patch M feature representation φ(M) in a Euclidean space • Patch matching fixed-length image descriptor VLAD representation

  12. Convolutional Descriptors • Convolutional Neural Networks three convolutional and one fully connected layers input 64x64 patches produces a 512 dimensional output • Convolutional Kernels Networks input: CNK-white CNK-grad CNK-raw

  13. Result

  14. Comparison • Comparison with state-of-the-art image retrieval results.

  15. End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression Speaker:bailixun

  16. What’s DPM, ConvNets and non-maximum suppressionrespectively? • Deformable Parts Models and Convolutional Networks each have achieved notable performance in object detection. • DPMs are well-versed in object composition, modeling fine-grained spatial relationships between parts • ConvNets are adept at producing powerful image features, having been discriminatively trained directly on the pixels.

  17. What’s the goal of this article? • They propose a new model that combines these two approaches, obtaining the advantages of each. Howtoachievethismodel? • They train this model using a new structured loss function that considers all bounding boxes within an image, rather than isolated object instances. • This enables the non-maximal suppression (NMS) operationto be integrated into the model.

  18. overview

  19. What’s the advantages to do so? • They use a DPM for detection, but replace the HoG features with features learned by a convolutional network. This allows the use of complex image features, but still preserves the spatial relationships between object parts during inference.

  20. Deep Multi-Patch Aggregation Networkfor Image Style, Aesthetics, and Quality Estimation Reporter:Zhichao Song

  21. The problems to be investigated • Image style recognition • Aesthetic quality categorization • Image quality estimation

  22. Drawbacks of traditional mathods • Ignored fine-grained high resolution details inimages • The performance of the single-column neural networks remains to improve

  23. Improved methods • multiple image resolutions • Multi-column neural network

  24. Multi-patch aggregation networks

  25. Statistics aggregation structure

  26. Fully-connected sorting aggregation

  27. The Application of Two-level Attention Models in Deep Convolutional NeuralNetwork for Fine-grained Image Classificationlecturer: Chaoshi Guo Student Id:115034910120

  28. Destination: Fine-grained Image Classification Difficulty:Intra-class variance can be larger than inter-class, so fine-grained classification aretechnically challenging Method: Two-level Attention Models in Deep Convolutional Neural Network

  29. Most fine-grained classification systems follow the pipeline: finding foreground object or object parts (where) to extract discriminative features (what). So, a method called bottom-up process is appeared. • Advantage: the method has high recall • Disadvantage: low precision and the most strongest supervision • Because of the above question: we find foreground object and object parts can be regarded as a two-level attention processes, one at object-level and another at part-level

  30. Two-level Attention Models: 1) Object-Level Attention Model • Patch selection using object-level attention • Training a DomainNet • Classification using object-level attention

  31. Two-level Attention Models: 2) Part-Level Attention Model • Building the part detector • Building the part-based classifier

  32. Two-level Attention Models: 3) The Complete Pipeline

  33. Results of the comparison between methods :

  34. Conclusions we propose a fine-grained classification pipeline combining bottom-up and two top-down attentions. The object-level attention feeds the network with patches relevant to the task domain with different views and scales. Both levels of attention can bring significant gains, and they compensate each other nicely with late fusion. One important advantage ofour method is that, the attention is derived from the CNN trained with classification task, thus it can be conducted under the weakest supervision setting where only class label is provided.

  35. Fully Connected Object Proposals for Video Segmentation 肖勖 115034910141

  36. Fully Connected Object Proposals for Video Segmentation • Introduction: a novel approach to video segmentation using multiple object proposals • Method: combine appearance with long-range point tracks • Advantage: ensure robustness with respect to fast motion and occlusions over longer video sequences

  37. Fully Connected Object Proposals for Video Segmentation • (1) • (2) • (3) • we demonstrate robustness to challenging situations typical of unconstrained videos such as: • fast motion and motion blur (1), • color ambiguities between fore- and background (2), • and partial occlusions (3).

  38. Fully Connected Object Proposals for Video Segmentation • Algorithm: • First: a rough classification and subsampling of the data is performed using a self-trained Support Vector Machine (SVM) classifier • Next: maximum a posteriori (MAP) inference is performed on a fully connected conditional random field (CRF) • Finally: each labeled proposal casts a vote to all pixels that it overlaps. The aggregate result yields the final foreground-background segmentation.

  39. Fully Connected Object Proposals for Video Segmentation • Process: • Object Proposal Generation • Candidate Proposal Pruning • Feature Extraction and Training • Classification and Resampling • Fully Connected Proposal Labeling

  40. Fully Connected Object Proposals for Video Segmentation • Left: Precision-Recall curves and F-score isolines for the SVM and CRF classification of object proposals into foreground and background. • Right: Average, maximum and minimum F-score.

  41. Fully Connected Object Proposals for Video Segmentation • Limitations: • It requires a sufficiently high video resolution such that the computation of proposals using existing techniques produces meaningful results.

  42. Image Based Relighting Using Neural Networks 徐阳 XU Yang 1150349101421

  43. OUTLINE • Relight the image • Neural network algorithm • results

  44. Light Transport Reconstruction • Brute force methods • directly sample all the entries of the lighttransport matrix from the scene • Sparsity based methods • model light transport using a sparserepresentation that is recovered from images of the scene lit with designed illumination patterns • Coherence based methods • exploit the data coherence in light transport to reconstruct the light transport matrix from a subset of rows/columns sampled from the scene

  45. Neural Networks for Light Transport • Formulate the light transport matrix as discrete samples of a continuous light transport function • Approximate the transport function using neural networks

  46. Light transport function Model the light transport matrix as discrete samples of a continuous light transport functionΨ(p,l): M(i, j) = Ψ(p(i),l( j)) where M(i, j) is an element in the light transport matrix that corresponds to pixel i and light source j, p(i) denotes the image coordinates of pixel i, and l( j) is the position of light source j in the 2D light domain. By expressing the 2D light transport matrix as a continuous 4D light transport function, the coherence of light transport in both the image domain and light domain can be more readily exploited

  47. Neural network approximation • We approximate the light transport function with multilayer acyclic feed-forward neural networks

  48. Light Transport Reconstruction • To reconstruct the light transport matrix, we recover the light transport function through neural network regression on captured images. • A light transport matrix element is approximated by averaging the outputs of all the base neural networks Φn: where Ne is the number of base neural networks in the ensemble, and wn is the weight vector of base neural network Φn.

  49. Adaptive Fuzzy Clustering • Fuzzy clustering • Adaptive fuzzy clustering

More Related