Personal web page : https://www.ipht.fr/Phocea/Membres/Annuaire/index.php
Laboratory link : https://www.ipht.fr
There is a long history of statistical physics bringing ideas to machine learning. Some of the commonly
used terms such as Boltzmann machine or Gibbs sampling are witnesses of that. Notably the 80s-90s were
very fruitful period where research in statistical physics brought a range of theoretical results about mod-
els of neural networks, see e.g. [AGS85,GD88,EVB01]. Those results concentrate on probabilistic models
of data (both the data distribution and the map to labels is modelled) in a way complementary to the
mainstream learning theory. Nowadays, the wide use of deep learning brings up a range of open theoreti-
cal questions and challenges that will likely require a synergy of theoretical ideas from several areas in-
cluding theoretical physics. In terms of modelling artificial neural networks, existing physics literature
mostly considers fully connected feedforward neural networks for supervised learning, and restricted
Boltzmann machines for unsupervised learning, it models data as random iid vectors. LZ currently holds
an ERC Starting grant focusing on statistical physics study of fully connected feedforward neural net-
works and auto-encoders, and related theoretical and algorithmic questions.
This PhD project will apply the statistical physics analysis to two classes of neural networks that, as far as
we know, have not yet been studied within this framework (and are not part the above ERC project):
Convolutional neural networks (for supervised learning) and Generative Adversarial Learning (for unsu-
pervised learning). We will evaluate analytically the optimal performance of such networks in the mod-
elled situations and compare it to the performance of efficient algorithm (message passing, and gradient
based algorithms), studied analytically and numerically. The main goal is to understand the behaviour,
advantages and limitations of such networks and improve the algorithmic procedures used for training.
Convolutional neural networks (ConvNets) stand at the basis of majority of modern state-of-the-art im-
age processing systems. Compared to fully connected networks the hidden neurons in ConvNets are con-
nected to only a subset of variables in the previous layer (a so-called receptive field) and usually many of
the hidden neurons connected to different receptive fields share the corresponding vector of weights thus
leading to computational speed up. The ConvNets architectures are also chosen for their ability to impose
the symmetries (e.g. the translational one). The model for ConvNet we have in mind is related to the
committee machine. The two supervisors recently worked on the committee machine neural networks,
while LZ studied its fully connected version [AMB18], PU studied a tree version where weights of each
hidden neuron are independent [FHU18]. The tree committee machine with weights of each hidden neu-
ron being the same is a simple (possibly simplest) model for convolutional neural network previously
studied in theoretical statistics and computer science literature, see e.g. [DLT17]. This model has not been
analyzed yet using the statistical physics methods that can access a broad range of open questions. Its so-
lution should not be more complicated than what is already done in [AMB18, FHU18], thus as ideal sub-
ject for a PhD student. With the solution at hand, many questions about ConvNets, their performance and
learning with efficient algorithms, will become analytically accessible and will be investigated. Exten-
sions to models where the receptive fields overlap will be the next case to study.
Generative adversarial networks (GANs) [GPM14] are often cited as the most influential idea in deep
learning in the past five years. Their purpose is to generate data samples that look statistically indistin-
guishable from sample in the training set. The principle of GANs is very simple: A generator-neural-net-
work is being trained to minimise the accuracy of a discriminator-neural-networks, that is in turn trying
maximise the number of samples classified correctly as coming from the true data versus from the genera-
tor-neural-network. The min-max nature of the training problem, however, causes serious problems to the
training algorithms and it is not understood mathematically when such learning is reliable and leads to
convergence and when it does not. A very recent work [WHL18] proposed a very elegant simple model of
GANs and analyzed the behavior of online learning in the model. The goal of this thesis project is to ana-
lyze the batch learning in this model of GANs, the underlying algorithms, their convergence and proper-
ties. From the point of view of existing research the above model of GANs can be seen as a combination
of a low-rank matrix factorisation model, and a perceptron problems with structured disorder. Both these
models were studied extensively by both of the supervisors, e.g. [LKZ17, FPSUZ17], and we are positive
that their combination corresponding the the above model of GANs can also be analyzed in a closed
form.
The longer term goal of this project is to understand the underlying principles of why the current deep
learning methods work well, what are their limitations and how they can be further improved. The two
supervisors combine expertise of the powerful methodology coming from statistical physics of disordered
systems, that is applicable to study of high-dimensional non-convex problems, and experience in multi-
disciplinary applications of this methodology. The student shall be trained in this vibrant field having ap-
plications in modern data analysis and machine learning which should be a great asset for his/her future
career prospects.
References
[AGS85] Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glass models of neural networks. Phys. Rev. A, 32(2), 1007.
[AMB18] Aubin, B., Maillard, A., Barbier, J., Krzakala, F., Macris, N., & Zdeborová, L. (2018). The committee machine: Com-
putational to statistical gaps in learning a two-layers neural network. arXiv preprint arXiv:1806.05451.
[DLT17] Du, S.S., Lee, J.D., Tian, Y., Poczos, B., & Singh, A. (2017). Gradient Descent Learns One-hidden-layer CNN: Don’t be
Afraid of Spurious Local Minima. arXiv:1712.00779
[EVB01] Engel A., Van Den Broeck C. (2001), Statistical Mechanics of Learning, Cambridge.
[FPSUZ17] Franz, S., Parisi, G., Sevelev, M., Urbani, P. & Zamponi, F. (2017) Universality of the SAT-UNSAT (jamming)
threshold in non-convex continuous constraint satisfaction problems, SciPost Phys. 2, 019.
[FHU18] Franz, S., Hwang, S., & Urbani, P. (2018). Jamming in multilayer supervised learning models. ArXiv:1809.09945.
[GD88] Gardner, E., & Derrida, B. (1988). Optimal storage properties of neural network models. J. Phys. A: Math. and Gen.
[GPM14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative
adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[LKZ17] Lesieur, T., Krzakala, F., & Zdeborová, L. (2017). Constrained low-rank matrix estimation: Phase transitions, approxi-
mate message passing and applications. J. Stat. Mech.: Th. and Exp. 073403
[WHL18] Wang, C., Hu, H., & Lu, Y. M. (2018). A Solvable High-Dimensional Model of GAN. Preprint arXiv:1805.08349.