Geometric graph-based methods in image processing, networks, and machine learning

Students: Ekaterina Murkerjev, TijanaKostic, Huiyi Hu, Cristina Garcia (CGU) Postdoc: Yves van Gennip, Braxton Osting, Nestor Guillen Navy Collaborator: ArjunaFlenner Other collaborators: AllonPercus (CGU), Mason Porter (Oxford), Thomas Laurent (Loyola Marymount), Inspiration: earlier work of Stan Osher, Chris Anderson, LuminitaVese, and Tony Chan Andrea Bertozzi University of California, Los Angeles Geometric graph-based methods in image processing, networks, and machine learning Thanks to NSF, ONR, AFOSR for support.

Variational Functionals for Image Segmentation - sharp interfaces with penalty function restricting regularity of interface Mumford-Shah segmentation model 1989 CPAM Terzopoulos snakes, Lagrangian curve attracted to edges, F is an environmental function that attracts to edges, Kass-Witkin-Terzopoulos IJCV 1987 Chan-Vese Segmentation – binary with sharp interface Gamma between regions, IEEE Trans. Imag. Proc. 2001. Solved using level sets and the TV functional via a gradient flow.

Diffuse interface methods Total variation Ginzburg-Landau functional W is a double well potential with two minima Total variation measures length of boundary between two constant regions. GL energy is a diffuse interface approximation of TV for binary functionals

Diffuse interface Equations and their sharp interface limit Allen-Cahn equation – L2 gradient flow of GL functional Approximates motion by mean curvaure - useful for image segmentation and image deblurring. Cahn-Hilliard equation – H-1 gradient flow of GL functional Approximates Mullins-Sekerka problem (nonlocal): Pego; Alikakos, Bates, and Chen. Conserves the mean of u. Used in image inpainting – fourth order allows for two boundary conditions to be satisfied for inpainting.

Examples of Allen-Cahn equation in image processing Selim Esedoglu : Blind deconvolution of bar codes Inverse Problems 2004 – K is a blurring kernel which can be identified as part of the process. Uses a gradient flow method to minimize E. Results in a modified Allen-Cahn equation with forcing. Threshold Dynamics Merriman-Bence-Osher show that AC can approximated by repeated thresholding and solution of heat equation- leads to a numerical solution of Motion by Mean Curvature. Esedoglu-Tsai (JCP 2006) generalize this to the solution of the piecewise constant Mumford-Shah problem.

Fast Cahn-Hilliard Inpainting US Patent No. 7,840,086 Bertozzi, Esedoglu, Gillette, IEEE Trans. Image Proc. 2007, SIAM MMS 2007 Transitioned to NGA for road inpainting. Transitioned to InQtel for document exploitation. Continue edges in the same direction – higher order method for local inpainting. Fast method using convexity splitting and FFT H-1 gradient flow for diffuse TV L2 fidelity with known data

The wavelet Laplacian and diffuse interfaces – sharper interfaces Total variation Ginzburg-Landau functional Dobrosotskaya and Bertozzi IEEE Trans. Imag. Proc. 2008

Wavelet Allen-Cahn Image PrOcessing • Dobrosotskaya, Bertozzi, IEEE Trans. Image Proc. 2008, Interfaces and Free Boundaries 2011. • Transitioned to NGA for road inpainting. • Transitioned to InQtel for document exploitation. • Nonlocal wavelet basis replaces Fourier basis in classical diffuse interface method. • Analysis theory in Besov spaces. • Gamma convergence to anisotropic TV. H-1 gradient flow for diffuse TV L2 fidelity with known data

Diffuse interfaces on Graphs Joint work with ArjunaFlenner, China Lake Paper MMS 2012

Weighted graphs for “big data” In a typical application we have data supported on the graph, possibly high dimensional. The above weights represent comparison of the data. Examples include: voting records of US Congress– each person has a vote vector associated with them. Nonlocal means image processing – each pixel has a pixel neighborhood that can be compared with nearby and far away pixels.

Graph Cuts and Total Variation Minimum cut Maximum cut Total Variation of function f defined on nodes of a weighted graph: Min cut problems can be reformulated as a total variation minimization problem for binary/multivalued functions defined on the nodes of the graph.

Nonlocal means graphs and total variation • BuadesColl and Morel (2006)– introduced the NL Means functional for imaging applications – patch comparisons between pixels • Osher and Gilboa (2007-8)– developed the Nonlocal TV functional for imaging applications- very effective for image inpainting applications with texture • Drawback with Osher-Gilboa is slowness of algorithm • We will accomplish these results with much faster run time and extend to general Machine Learning problems • Suggests an alternative to the NL means calculus of Gilboa-Osher

Diffuse interface methods on graphs Bertozzi and Flenner MMS 2012. ArjunaFlenner China Lake (Navy)

Convergence of graph GL functional van Gennip and ALB Adv. Diff. Eq. 2012

Diffuse interfaces on graphs Joint work with ArjunaFlenner Replaces Laplace operator with a weighted graph Laplacian in the Ginzburg Landau Functional Allows for segmentation using L1-like metrics due to connection with GL Comparison with Hein-Buehler 1-Laplacian 2010.

US House of Representatives voting Record classification of party affiliation from voting record 98th US Congress 1984 Assume knowledge of party affiliation of 5 of the 435 members of the House Infer party affiliation of the remaining 430 members from voting records Gaussian similarity weight matrix for vector of votes (1, 0, -1)

Machine learning identification of similar regions in images High dimensional fully connected graph – use Nystrom extension methods for fast computation methods.

Recall Convex Splitting Schemes Schoenlieb and Bertozzi, Comm. Math. Sci. 2011 Basic idea: Project onto Eigenfunctions of the gradient (first variation) operator For the GL functional the operator is the graph Laplacian

E. Merkurjev, T. Kostic and A.L. Bertozzi, SIAM J. Imaging Sci. 2013 An MBO scheme on graphs for segmentation and image processing Instead of minimizating the GL functional Apply MBO scheme involving a simple algorithm alternating the heat equation with thresholding.

1) propagation by graph heat equation + forcing term 2) thresholding Simple! And often converges in just a few iterations (e.g. 4 for MNIST dataset) Two-Step Minimization Procedure based on classical MBO scheme for motion by mean curvature (now on graphs)

I) Create a graph from the data, choose a weight function and then create the symmetric graph Laplacian. • II) Calculate the eigenvectors and eigenvalues of the symmetric graph Laplacian. It is only necessary to calculate a portion of the eigenvectors*. • III) Initialize u. • IV) Iterate the two-step scheme described above until a stopping criterion is satisfied. • *Fast linear algebra routines are necessary – either Raleigh-Chebyshev procedure or Nystrom extension. Algorithm

Two Moons Segmentation Second eigenvector segmentation Our method’s segmentation

Image segementation Original image 1 Original image 2 Handlabeled grass region Grass label transferred

Image Segmentation Handlabeled cow region Cow label transferred Handlabeled sky region Sky label transferred

Generalization MULTICLASS Machine Learning Problems (MBO) Garcia, Merkurjev, Bertozzi, Percus, Flenner, IEEE TPAMI, 2014 Semi-supervised learning Instead of double well we have N-class well with Minima on a simplex in N-dimensions

MNIST Database Comparisons Semi-supervised learning Vs Supervised learning We do semi-supervised with only 3.6% of the digits as the Known data. Supervised uses 60000 digits for training and tests on 10000 digits.

Performance on Coil WebKB

Hyperspectral Video Segmentation Merkurjev, Sunu, and Bertozzi, 2014, preprint Eigenfunctions computed using Nystrom “ground truth obtained from thresholdingeigenfunctions; random initialization otherwise Four class hyperspectral pixel segmentation of gas plume, ground, mountain, and sky

Nystrom Extension FowlkesBelongie Chung and Malik, IEEE PAMI 2004.

Community Detection – modularity Optimization Joint work with Huiyi Hu (UCLA), Thomas Laurent (Loyola Marymount), and Mason Porter (Oxford) to appear in SIAP The modularity of a partition measures the fraction of total edge weight within each community minus the edge weight expected if edges were placed randomly using some null model. Newman, Girvan, Phys. Rev. E 2004. [wij] is graph adjacency matrix P is probability nullmodel (Newman-Girvan) Pij=kikj/2m ki = sumjwij (strength of the node) Gamma is the resolution parameter gi is group assignment 2m is total volume of the graph = sumiki= sumijwij This is an optimization (max) problem. Combinatorially complex – optimize over all possible group assignments. Very expensive computationally.

Bipartition of a graph Given a subset A of nodes on the graph define Vol(A) = sum i in A ki Then maximizing Q is equivalent to minimizing Given a binary function on the graph f taking values +1, -1 define A to be the set where f=1, we can define:

Equivalence to L1 compressive sensing Thus modularity optimization restricted to two groups is equivalent to This generalizes to n class optimization quite naturally Because the TV minimization problem involves functions with values on the simplex we can directly use the MBO scheme to solve this problem.

Modularity optimization moons and clouds

MNIST 4-9 digit segmentation 13782 handwritten digits. Graph created based on similarity score between each digit. Weighted graph with 194816 connections. Modularity MBO performs comparably to Genlouvain but in about a tenth the run time. Advantage of MBO based scheme will be for very large datasets with moderate numbers of clusters.

4-9 MNIST Segmentation

MNIST 4-9 digit segmentation 13782 handwritten digits. Graph created based on similarity score between each digit. Modularity MBO performs comparably to Genlouvain but in about a tenth the run time. Advantage of MBO based scheme will be for very large datasets with moderate numbers of clusters.

LFR Benchmark Lancichinetti, Fortunato, Radicchi, Phys Rev E 2008 Synthetic graphs with powerlaw distribution of community size Mixing parameter – fraction of edges shared with other communities vs own community

Conclusions and future work Nestor Braxton Yves van Gennip, Nestor Guillen, Braxton Osting, and Andrea L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, 2013. Diffuse interface formulation provides competitive algorithms for machine learning applications including nonlocal means imaging Extends PDE-based methods to a graphical framework Future work includes community detection algorithms (very computationally expensive) Speedup includes fast spectral methods and the use of a small subset of eigenfunctions rather than the complete basis Competitive or faster than split-Bregman methods and other L1-TV based methods Extension of work to convex optimization methods (joint work with EgilBaeand Ekaterina Merkurjev)

A. L. Bertozzi and A. Flenner, Multiscale Modeling and Simulation, 10(3), 2012. TijanaKostic and Andrea Bertozzi, J. Sci Comp., 2012 Y van Gennip and ALB Adv. Diff. Eq. 2012 H. Hu, Y. van Gennip, B. Hunter, A.L. Bertozzi, M.A. Porter, IEEE ICDM'12, 2012. Y. van Gennip et al SIAP (spectral clustering gang data) 2013 E. Merkurjev, T. Kostic, and A. L. Bertozzi, An MBO Scheme on Graphs for Segmentation and Image Processing , accepted SIAM J. Imag. Proc. 2013. Huiyi Hu, Thomas Laurent, Mason A. Porter, Andrea L. Bertozzi, A Method Based on Total Variation for Network Modularity Optimization using the MBO Scheme, SIAM J. Appl. Math., 2013. C. Garcia-Cardona, E. Merkurjev, A. L. Bertozzi, A. Flenner and A. G. Percus, Fast Multiclass Segmentation Using Diffuse Interface Methods on Graphs, IEEE PAMI 2014 Y. van Gennip, N. Guillen, B. Osting, and A. L. Bertozzi, Mean curvature, threshold dynamics, and phase field theory on finite graphs, Milan J. Math. 2014. Preprints and Reprints

Geometric graph-based methods in image processing, networks, and machine learning

Geometric graph-based methods in image processing, networks, and machine learning

Presentation Transcript

Methods for Digital Image Processing

Graph-based Methods for Natural Language Processing and Information Retrieval

Motion Editing ( Geometric and Constraint-Based Methods )

Geometric methods in image processing, networks, and machine learning

Machine Learning and Neural Networks

Machine Vision: Image Processing Algorithms

LING 696B: Graph-based methods and Supervised learning

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Graph-based Pattern Learning

Graph-Based Concept Learning

Graph-Based Image Segmentation

EEG-based Machine Learning Methods for Applications in Psychiatry

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Digital Image Processing Lecture 7: Geometric Transformation

Machine Learning Methods

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Graph-Based Concept Learning

EEG-based Machine Learning Methods for Applications in Psychiatry

Melanoma Skin Cancer Detection using Image Processing and Machine Learning

Machine learning based image compressor