Tutorial #1: Deep learning for music separation
List of presenters:
- Antoine Liutkus (INRIA Nancy – Grand Est, France)
- Fabian-Robert Stöter (INRIA, LIRMM, University of Montpellier, France)
In this tutorial, we present the recent state of the art on the topic of music separation, which is a cornerstone problem for many applications in the entertainment industry. The particularity of this tutorial will be to approach the topic from both, a theoretical perspective, as well as an interactive demonstration regarding how to implement the described ideas in practice.
In a first introductory part, we will summarize mandatory signal processing concepts and will review the model-based methods that were proposed in the last 40 years. This notably includes sinusoidal modeling, nonnegative matrix factorization, or kernel methods. Then, we will overview the related datasets that were proposed, as well as the metrics used for evaluation. From a practical perspective, we will see how to browse and use data in Python.
In the second part, we will present deep neural networks, and most particularly the models and methods that are adequate for time series with long-term structures as in music. This involves a quick crash course on DNN and dynamic models (LSTM, CNN and wavenet), and a recap of the related vocabulary: datasets, samples, epoch, batch, loss, gradient, generative, discriminative, adversarial, etc.
In the third part, we will show the importance of design and will bring a basic LSTM separation model to state-of-the-art performance. We will analyze the impact of many design choices: input representation, dimensionality reduction, depth, hidden size, context length, skip connections.
This tutorial is intended to be an introduction of this topic for Ph.D. students and engineers, explaining how to obtain state of the art performance in audio separation, along with the required technical background. Second, it will hopefully be of interest to researchers wondering how to do actual investigations on audio with DNNs, without being just users of high-level black-box systems.
Tutorial #2: Portfolio Optimization in Financial Markets
List of presenters:
- Daniel P. Palomar (Hong Kong University of Science and Technology)
Financial engineering and electrical engineering are seemingly different areas that share strong underlying connections. Both areas rely on statistical analysis and modelling of systems and the underlaying time series. Either modeling price fluctuations in the financial markets or modelling, say, channel fluctuations in wireless communication systems. A core part of financial engineering is portfolio optimization, which shares a great resemblance with many problems in other areas in signal processing. This allows the use of similar mathematical tools from signal processing, optimization theory, and estimation theory like beamforming design, Kalman filtering, covariance matrix estimation, etc.
Modern portfolio theory started with Harry Markowitz’s 1952 seminal paper “Portfolio Selection,” for which he would later receive the Nobel prize in 1990. He put forth the idea that risk-adverse investors should optimize their portfolio based on a combination of two objectives: expected return and risk. Until today, that idea has remained central in portfolio optimization. However, the vanilla Markowitz portfolio formulation does not seem to behave as expected in practice and most practitioners tend to avoid it. During the past half century, researchers and practitioners have reconsidered the Markowitz portfolio formulation and have proposed countless of improvements and variations, namely, robust optimization methods, alternative measures of risk (e.g., CVaR or ES), regularization via sparsity, improved estimators of the covariance matrix via random matrix theory, robust estimators for heavy tails, factor models, mean models, volatility clustering models, risk-parity formulations, etc.
This tutorial will explore different portfolio formulations, starting from the basic signal model and the Markowitz portfolio formulation and slowly increasing the level of complexity, exploring robust portfolios, index tracking portfolios, risk-parity portfolios, CVaR-based portfolios, and pairs-trading portfolios.
Tutorial #3: Determinantal Point Processes in Signal Processing and Machine Learning
List of presenters:
- Simon Barthelme (CNRS, University of Grenoble, France)
- Nicolas Tremblay (CNRS, University of Grenoble, France)
Determinantal point processes (DPPs) are a class of point processes that generate high-diversity subsets through repulsion. The first examples arose in statistical physics in the work of Wigner and Dyson, and were first theorised by O. Macchi during her PhD, resulting in a landmark 1975 paper. DPPs then waited thirty years to come back into the light, first at the hands of probabilists, and around 2010 in the works of some statisticians and computer scientists. The usefulness of DPPs in statistics and machine learning is based on their repulsiveness and their appealing analytic properties. “Repulsiveness”, in the context of point processes, means that points cannot be too close to one another. It is an important feature of some natural or man-made point processes, e.g. the position of trees in a forest; or the position of antennas in communication networks. In machine learning, repulsiveness is used to enforce diversity, and sampling DPPs can be used to produce representative sets of items from a large database. On a more theoretical side, DPPs possess attractive analytic properties. The intensity functions (a.k.a. inclusion probabilities in the discrete case) are known at any order (contrary to e.g. Gibbsian point processes), and can be evaluated as determinants of positive definite matrices. The last five years have seen an explosion of studies in machine learning where DPP are used and/or studied. Their use in signal processing is in its infancy. Some applications have been reported in subsampling of graphs; links between DPPs and zeros of the spectrogram of Gaussian processes exist and begin to be revealed; applications in Monte Carlo integration and survey sampling have been reported; applications in communications are appearing.
The tutorial intends to give an impetus to research on DPPs in Signal Processing.
Tutorial #4: Misspecified and Semiparametric lower bounds and their application to inference problems with Complex Elliptically Symmetric distributed data
List of presenters:
- Stefano Fortunati (University of Pisa, Italy)
- Fulvio Gini (University of Pisa, Italy)
Inferring information from a set of acquired data is the main objective of any signal processing (SP) method. Developing an estimation algorithm often begins by assuming a statistical model for the measured data, i.e., a set of probability density functions (pdfs), which, if correct, fully characterizes the behaviour of the collected data/measurements. However, a certain amount of mismatch between the true and the assumed data model is often inevitable in practice, leading to the model misspecification problem.
The first part of the tutorial provides a comprehensive overview of the misspecified estimation framework with a particular focus on the Misspecified Cramér-Rao bound (MCRB). Misspecified bounds generalize the classical bounds on the Mean Squared Error (MSE) by allowing the true, but possibly unknown, data model and the model assumed to derive the estimator of the parameter vector to differ, yet establishing performance limits in a way that makes the SP practitioner aware of the potential losses due to the model misspecification.
The second part of the tutorial introduces a possible approach to minimize the misspecification risk. Specifically, we address the more general semiparametric characterization of the statistical behaviour of the collected data. A semiparametric model is a set of pdfs characterized by a finite-dimensional parameter vector along with some infinite-dimensional parameter vector, i.e. a function. We show how to obtain a MSE inequality for the estimation of the finite-dimensional parameter vector in presence of nuisance functions. Furthermore, we introduce the Semiparametric CRB (SCRB) and describe its relation with the classical CRB and the MCRB.
Throughout the whole tutorial, as common thread and example to clarify the theoretical findings, inference problems in Complex Elliptically Symmetric (CES) distributed data will be analysed. Particular attention will be devoted to two radar applications: the direction of arrival estimation in CES distributed noise and the covariance matrix estimation for adaptive radar detection in heavy-tailed disturbance.
Tutorial #5: Proximal Gradient Algorithms Applications in Signal Processing
List of presenters:
- Niccolò Antonello (KU Leuven, Belgium)
- Panagiotis Patrinos (KU Leuven, Belgium)
- Toon van Waterschoot (KU Leuven, Belgium)
- Lorenzo Stella (Amazon Development Center, Berlin, Germany)
Many signal processing tasks are more and more often relying on numerical optimization. For example, in the context of compressed sensing (CS), signals can be reconstructed by means of sparse representations that are typically realized solving large-scale optimization problems. In this and many others frameworks, it is of fundamental importance to rely on efficient and reliable optimization algorithms. One of the most popular optimization algorithms used in CS is the proximal gradient (PG) algorithm. This is a first-order optimization algorithm, suited for large-scale problems and capable of addressing nonsmooth cost functions that typically appear in CS problems.
This tutorial will focus on the PG algorithm and will illustrate how this algorithm is not only useful in the context of CS but also in many other frameworks that involve structured optimization problems. In particular, the PG algorithm has several favourable properties: its iterations consist of computationally cheap operations that can often avoid matrix factorizations and can be easily combined with fast transformations. Additionally, convergence to local minima is guaranteed even in the case of nonconvex problems. A recent accelerated variant of the PG algorithm will be presented that substantially improves its convergence rate using quasi-Newton methods, yet without altering the aforementioned favourable properties.
The versatility of the PG algorithm and the efficiency of its accelerated variant represent an attractive alternative in many signal processing tasks where numerical optimization is necessary. To illustrate this, different examples such as line-spectral estimation, video background removal, audio declipping will be presented with the aid of StructuredOptimization.jl, a new open-source software package based on the Julia language. The package offers not only the possibility to efficiently solve optimization problems using the most recent variants of the PG algorithm, but also to fully access their versatility thanks to a high-level modeling language.
Tutorial #6: Communication Networks Design: Model-Based, Data-Driven, or Both?
List of presenters:
- Alessio Zappone (University of Cassino and Southern Lazio, Italy)
- Marco Di Renzo (Paris-Saclay University, France)
- Merouane Debbah (Paris-Saclay University, France)
Recently, deep learning has received signiﬁcant attention as a technique to design and optimize wireless communication systems and networks. The usual approach to use deep learning consists of acquiring large amount of empirical data about the system behavior and employ it for performance optimization (data-driven approach). We believe, however, that the application of deep learning to communication networks design and optimization offers more possibilities. As opposed to other ﬁelds of science, such as image classiﬁcation and speech recognition, mathematical models for communication networks optimization are very often available, even though they may be simpliﬁed and inaccurate. We believe that this a priori expert knowledge, which has been acquired over decades of intense research, cannot be dismissed and ignored. In this tutorial, in particular, we put forth a new approach that capitalizes on the availability of (possibly simpliﬁed or inaccurate) theoretical models, in order to reduce the amount of empirical data to use and the complexity of training artiﬁcial neural networks (ANNs). We provide several recent examples to show that synergistically combining prior expert knowledge based on analytical models and data-driven methods constitutes a suitable approach towards the design and optimization of communication systems and networks with the aid of deep learning based on ANNs.
Tutorial #7: Connecting the Dots: Identifying Network Structure of Complex Data via Graph Signal Processing
List of presenters:
- Gonzalo Mateos, University of Rochester, USA
- Santiago Segarra, Rice University, USA
Under the assumption that the signals are related to the topology of the graph where they are supported, the goal of graph signal processing (GSP) is to develop algorithms that fruitfully leverage this relational structure, and can make inferences about these relationships even when they are only partially observed. Many GSP efforts to date assume that the underlying network is known, and then analyze how the graph’s algebraic and spectral characteristics impact the properties of the graph signals of interest. However, such an assumption is often untenable in practice and arguably most graph construction schemes are largely informal, distinctly lacking an element of validation. In this tutorial we offer an overview of network topology inference methods developed to bridge the aforementioned gap, by using information available from graph signals along with innovative GSP tools and models to infer the underlying graph structure. It will also introduce the attendees to challenges and opportunities for SP research in emerging topic areas at the crossroads of modeling, prediction, and control of complex behavior arising with large-scale networked systems that evolve over time. Accordingly, this tutorial stretches all the way from (nowadays rather mature) statistical approaches including correlation analyses to recent GSP advances in a comprehensive and unifying manner. Through rigorous problem formulations and intuitive reasoning, concepts are made accessible to SP researchers not well versed in network-analytic issues. A diverse gamut of network inference challenges and application domains will be selectively covered, based on importance and relevance to SP expertise, as well as the authors’ own experience and contributions.
Tutorial #8: Multi-Microphone Source Localization on Manifolds
List of presenters:
- Sharon Gannot (Bar-Ilan University, Israel)
- Bracha Laufer-Goldshtein (Bar-Ilan University, Israel)
- Ronen Talmon (Technion, Israel)
Speech enhancement is a core problem in audio signal processing, with commercial applications in devices as diverse as mobile phones, hands-free systems, human-car communication, smart homes or hearing aids. An essential component in the design of speech enhancement algorithms is acoustic source localization. Speaker localization is also directly applicable to many other audio related tasks, e.g. automated camera steering, teleconferencing systems and robot audition.
Driven by its large number of applications, the localization problem has attracted significant research attention, resulting in a plethora of localization methods proposed during the last two decades. Nevertheless, robust localization in adverse conditions, namely in the presence of background noise and reverberations, still remains a major challenge. Until recently, the main paradigm in localization research was based on certain statistical and physical assumptions regarding the propagation of sound sources, and mainly focused on robust methods to extract the direct-path. However, valuable information on the source location can also be extracted from the reflection pattern in the enclosure. We will show in this tutorial that the intricate reflection patterns of the sound source define a fingerprint, uniquely characterizing the source location, and that meaningful location information can be inferred from the data by harnessing the principles of manifold learning.
In this tutorial, we will expose the audience to a new data-driven paradigm for analyzing reverberant environments, and demonstrate its applicability to the source localization problem. This tutorial will be an opportunity to get familiarize with novel mathematical tools from the field of manifold learning, including diffusion maps theory, semi-supervised learning, optimization in reproducing kernel Hilbert spaces, Gaussian process inference and more. We will harness these mathematical and algorithmic models to the source localization problem, covering the following applications: localization with a single pair of microphones, localization using ad hoc microphone arrays and tracking of moving sources.
Tutorial #9: Point cloud coding: The status quo
List of presenters:
- Joao Ascenso (Instituto Superior Técnico, Portugal)
- Fernando Pereira (Instituto Superior Técnico, Portugal)
Recently, 3D visual representation models such as light fields and point clouds are becoming popular due to their capability to represent the real world in a more complete and immersive way, paving the road for new and more advanced visual experiences. The point cloud (PC) representation model is able to efficiently represent the surface of objects/scenes by means of a set of 3D points and associated attributes and is increasingly being used from autonomous cars to augmented reality. Emerging imaging sensors have made easier to perform richer and denser point cloud acquisitions, notably with millions of points, making impossible to store and transmit these very high amounts of data. This bottleneck has raised the need for efficient point cloud coding solutions that can offer immersive visual experiences and a good quality of experience.
This tutorial will provide a complete survey and summary of the main PC compression techniques, along with the most relevant foundations of this field. Regarding the content of this tutorial is important to highlight: 1) a new classification taxonomy for PC coding solutions to more easily identify and abstract their differences, commonalities and relationships; 2) representative static and dynamic PC coding solutions available in the literature, such as octree, transform and graph based PC coding among others; 3) two MPEG PC coding solutions which have been recently developed, notably Video-based Point Cloud Coding (V-PCC), for dynamic content, and Geometry-based Point Cloud Coding (G-PCC), for static and dynamically acquired content; 4) point cloud quality subjective and objective evaluation procedures which are fundamental to evaluate the impact and performance of several processing steps in a PC based system, notably denoising, coding, streaming and so on.
Tutorial # 10: Random Matrix Advances in Large Dimensional Statistics, Machine Learning and Neural Nets
List of presenters:
- Romain Couillet (Paris–Saclay University, France)
- Mohamed Seddik (Paris–Saclay University, France)
- Malik Tiomoko (Paris–Saclay University, France)
Machine learning algorithms, starting from elementary yet popular ones, are difficult to theoretically analyze as (i) they are data-driven, and (ii) they rely on non-linear tools (kernels, activation functions). These theoretical limitations are exacerbated in large dimensional datasets where standard algorithms behave quite differently than predicted, if not completely fail.
In this tutorial, we will show how random matrix theory (RMT) answers all these problems at once. After a brief introduction to the elementary required RMT tools, we will show that RMT provides a new understanding and various directions of improvements for kernel methods, semi-supervised learning, SVMs, community detection on graphs, spectral clustering, etc. Besides, we will show that RMT can explain observations made on real complex datasets in as advanced methods as deep neural networks.
The outline of the tutorial is as follows:
- Part 1: Basics of Random Matrix Theory for Sample Covariance Matrices (~1h): the Stieltjes transform method; the Marcenko–Pastur law; basics of G-estimation; spiked models.
- Part 2: Applications to Large Dimensional Statistical Inference (~1h): covariance eigenvalue and eigenvector retrieval; blind source power estimation; covariance distance estimation.
- Part 3: Applications to Machine Learning and Neural Nets (~1h): kernel random matrices and applications (analysis and improvement) to sampling, kernels and sparse covariance estimation; spectral clustering; semi-supervised learning; random feature maps; neural networks; universality considerations.