Vincent Duval (INRIA Paris, France):
The BLASSO: continuous dictionaries for sparser reconstructions

In this talk, I will give an overview of the main properties of the Beurling LASSO (BLASSO), a sparse reconstruction method which has drawn a lot of attention since the pioneering works of De Castro and Gamboa, Bredies and Pikkarainen, Candès and Fernandez-Granda… The method consists in performing an analogue of the ℓ1 minimization in the space of Radon measures. Using this continuous framework instead of introducing an artificial finite grid for sparse recovery is not only relevant when modelling many physical problems, but it also provides interesting properties such as support stability, sparsity of the solutions, and efficient minimization algorithms.In this talk, I will give an overview of the main properties of the Beurling LASSO (BLASSO), a sparse reconstruction method which has drawn a lot of attention since the pioneering works of De Castro and Gamboa, Bredies and Pikkarainen, Candès and Fernandez-Granda… The method consists in performing an analogue of the ℓ1 minimization in the space of Radon measures. Using this continuous framework instead of introducing an artificial finite grid for sparse recovery is not only relevant when modelling many physical problems, but it also provides interesting properties such as support stability, sparsity of the solutions, and efficient minimization algorithms.

Mark Plumbley (Centre for Vision, Speech and Signal Processing, University of Surrey, UK):
AI for Sound: From Independent Component Analysis and Sparse Representations to Deep Learning

Imagine you are standing on a street corner in a city. Close your eyes: what do you hear? Perhaps some cars and busses driving on the road, footsteps of people on the pavement, beeps from a pedestrian crossing, rustling and clonks from shopping bags and boxes, and the hubbub of talking shoppers. You can do the same in a kitchen as someone is making breakfast, or as you are travelling in a vehicle. Now, following the success of machine learning technologies for speech and image recognition, we are beginning to build computer systems to tackle this challenging task: to automatically recognize real-world sound scenes and events. In this talk, I will discuss some of the techniques and approaches that we have been using to analyze and recognize different types of sounds, including independent component analysis, nonnegative matrix factorization, sparse representations and deep learning. I will also discuss how we are using data challenges to help develop a community of researchers in recognition of real-world sound scenes and events, explore some of the work going on in this rapidly expanding research area, and touch on some of the key issues for the future, including privacy for sound sensors and the need for low-complexity models. We will discuss some of the potential applications emerging for sound recognition, from home security and assisted living to exploring sound archives, and we will close with some pointers to more information about this research area.

Christopher J Rozell (Georgia Institute of Technology, USA):
Keep your eye on the ball: measurement and tracking of dynamical systems models with low-dimensional structure

Low-dimensional signal models such as sparsity and manifolds have been very powerful in modern signal processing and machine learning methods. Despite many advances based around these modeling approaches, much of the associated research in algorithms and analysis has focused on test cases that are static scenarios such as single images. However, many areas in science and engineering now have access to technologies enabling rapid collection of increasing volumes of time-varying data that is best described with some type of dynamical systems model. While classic notions from statistic and physics have similar notions of low-dimensional signal models, it has remained relatively uncommon to integrate the modern tools of optimization and dimensionality reduction into algorithms and analysis for inference in dynamical signal models.

In this lecture, I will overview our recent advances in building dynamical filtering algorithms and establishing fundamental observability guarantees for dynamical systems models with low-dimensional structure, drawing upon tools such as reweighted optimization and optimal transport. I will highlight the utility of these approaches with a range of applications, including building robotic systems to perform neuroscience experiments, target tracking in remote sensing data, and training brain machine interfaces.

Karin Schnass (University of Innsbruck, Austria) :
The landscape of dictionary learning

In this talk we will visit the landscape of dictionary learning via iterative thresholiding and K residual means. For a given generating dictionary we will have a look at the basin of attraction, the regions of contraction, and spurious attractive points.

Time permitting we will also discuss heuristics how to use escape from spurious attractive points and jump directly into the basin of attraction.

Irene Waldspurger (CNRS, University Paris-Dauphine, France):
Rank optimality for the Burer-Monteiro factorization

The Burer-Monteiro factorization is a classical heuristic used to speed up the solving of large scale semidefinite programs when the solution is expected to be low rank: One writes the solution as the product of thinner matrices, and optimizes over the (low-dimensional) factors instead of over the full matrix. Even though the factorized problem is non-convex, one observes that standard first-order algorithms can often solve it to global optimality. This has been rigorously proved by Boumal, Voroninski and Bandeira, but only under the assumption that the factorization rank is large enough, larger than what numerical experiments suggest. We will describe this result, and investigate its optimality. More specifically, we will show that, up to a minor improvement, it is optimal: without additional hypotheses on the semidefinite problem at hand, first-order algorithms can fail if the factorization rank is smaller than predicted by current theory.

Rebecca Willett (University of Chicago, USA) :
A function space view of overparameterized neural networks

Contrary to classical bias/variance tradeoffs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of the weights of the network. This raises the question: What functions are well-approximated by neural networks whose weights are bounded in norm? In this talk, I will give some partial answers to this question. In particular, I will give a precise characterization of the space of functions realizable as a two-layer (i.e., one hidden layer) neural network with ReLU activations having an unbounded number of units, but where the Euclidean norm of the weights in the network remains bounded. Surprisingly, this characterization is naturally posed in terms of the Radon transform as used in computational imaging, and I will show how tools from Radon transform analysis yield novel insights about learning with two and three-layer ReLU networks. This is joint work with Greg Ongie, Daniel Soudry, and Nati Srebro.

David Wipf (Visual Computing Group, Microsoft Research, Beijing, China) :
On the underappreciated role of sparsity in deep variational autoencoder models

This talk will trace the progression of Bayesian-inspired models for finding low-dimensional structure in data, from simple frameworks like robust PCA and Bayesian compressive sensing, to more complex heirs such as the variational autoencoder (VAE).  The latter represents a popular, flexible form of deep generative model that can be stochastically fit to observed samples from a given random process using an a variational bound on the underlying log-likelihood.  Although originally motivated as a way of generating new samples that approximate an unknown distribution, the VAE can also be leveraged to find low-dimensional manifold structure in training data.

Despite the lack of a canonical sparsity-promoting penalty as commonly adopted by classical methods, I will highlight how parsimony naturally emerges from the VAE and its predecessors, often with distinct provable advantages over deterministic alternatives.  For example, subtle mechanisms will be discussed that allow such models to robustly dismiss outliers and smooth away bad local minima all while adapting to an unknown inlier manifold of arbitrary dimension.  And as a byproduct of this process, in certain settings the VAE in particular can also generate realistic samples that mirror the data distribution within such manifolds devoid of outliers.