THEORETICAL GUARANTEES FOR GENERALIZATION
This topic is the most challenging one and a lot of effort is ongoing all over the word.
A part of the effort will be dedicated to prospective research and the other part will focus on state of the art and benchmarking new solutions arising from the large scientific community working actively on the subject.
Faculty members : Sébastien Gerchinovitz, François Malgouyres, Jean-Michel Loubes, Mathieu Serrurier, Edouard Pauwels
Datascientists : TBD
Students : 5 PhD
SCOPE
The neural networks have demonstrated an impressive capability of generalization despite a high number of parameters. This capacity for generalization is still poorly understood from the theoretical point of view, so that it is not possible to ensure in any way this generalization.
The goal of this challenge will be to define new methods, conditions and tools to have more guaranties on the behavior of neural networks.
These new methods will allow under given assumptions to have better confidence in the robustness of the deep learning algorithms while proposing several indicators. These indicators could be used in order to persuade the competent authorities during the certification process.
The typical neural networks used in applications have a number of parameters that is larger or comparable to the number of learning samples. This property leads indeed to a loss function whose landscape is sufficiently simple so-that descent algorithms provide a low cost solution. This property also guarantees that the expressivity of the network is sufficient to fit the data. Classical statistical learning theory tells us that such a configuration should lead to overfitting and a large generalization error but this does not occur in practice.
Understanding the details of the phenomena behind the success of the optimization, supervised and unsupervised (GAN), the expressivity of the networks, the stability of the features of the network and the size of the generalization error is key to design network topologies and regularization strategies leading to robust neural networks.
These guaranties will be very helpful to certify the neural networks embedded in critical systems. Both neural networks as well as machine learning methods could be investigated in this challenge.
Examples of industrial use cases
All applications requiring trust or even certification can be concerned. Benchmark will be defined along the project, based on the available and sharable topics.
State of the art and limitations
Scientific approach to solve the challenge
Expected scientific outcomes
- Bounds of the generalization error with the weakest possible assumptions
- Architecture and dataset conditions for stable neural networks design
- A new tractable algorithms with proofs of convergence / finite-time error bounds for GAN
Data required to implement the challenge
To be defined along the project.
Success criteria for the challenge
Challenge will be successful if we have developped:
- A methodology and conditions (architecture and dataset) to design stable neural networks with
theoretical bound - Benchmarks demonstrating the performance of these algorithms on various datasets
- A more stable training technics for GAN
- A tool providing indicators to evaluate robustness within a certification perspective
References
[1] M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations, Cambridge University Press, 2009
[2] D. Yarotsky. Error bounds for approximations with deep ReLU networks, Neural Networks, vol. 94, pp. 103-114, 2017
[3] D. Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks, Proceedings of COLT’18, pp. 639-649, 2018
[4] R. Eldan and O. Shamir. The Power of Depth for Feedforward Neural Networks, Proceedings of COLT’16, pp. 907-940, 2016
[5] P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian. Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks, arXiv:1703.02930, 2017
[6] Kenji Kawaguchi. Deep learning without poor local minima. In Advances in Neural Information Processing Systems (NIPS), 2016
[7] P. Bartlett, D. J. Foster, M. Telgarsky. Spectrally-normalized margin bounds for neural networks, Advances in Neural Information Processing Systems 30 (Proceedings of NIPS 2017), 2017
[8] Noah Golowich, Alexander Rakhlin, Ohad Shamir. Size-Independent Sample Complexity of Neural Networks, Proceedings of COLT’18, pp. 297-299, 2018
[9] J. Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function, arXiv:1708.06633, 2017
[10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, Generative adversarial nets – 2014 npis
[11] Diederik P Kingma, Max Welling – Auto-Encoding Variational Bayes – 2014
[12] Diederik P. Kingma, Prafulla Dhariwal Glow: Generative Flow with Invertible 1×1 Convolutions, 10 Jul 2018. arXiv:1807.03039
[13] François Malgouyres, Joseph Landsberg – Multilinear compressive sensing and an application to convolutional linear networks – https://www.math.univtoulousefr/~fmalgouy/tmp/identifiable_long.pdf
[14] Peter Buehlmann and Sara van de Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer, 2011
[15] Florentina Bunea. Honest variable selection in linear and logistic regression models via L1 and L1+L2 penalization, Electron. J. Statist, vol. 2, pp. 1153-1194, 2008
[16] Maryia Kabanava, Holger Rauhut. Cosparsity in compressed sensing, Compressed Sensing and Its Applications, pp. 315–339, 2015
[17] H. Ge, Y. Xia, X. Chen, R. Berry and Y. Wu. Fictitious GAN: Training GANs with Historical Models Computer Vision, In Proceedings of ECCV’18, 122-137, 2018
[18] P. Grnarova, K. Y. Levy, A. Lucchi, T. Hofmann and A. Krause. An Online Learning Approach to Generative Adversarial Networks, In Proceedings of International Conference on Learning Representations (ICLR’18), 2018
[19] Y.-P. Hsieh, C. Liu and V. Cevher. Finding Mixed Nash Equilibria of Generative Adversarial Networks, Proceedings of International Conference on Learning Representations (ICLR’19), 2019
[20] A. Athalye, N. Carlini and D. Wagner (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML
[21] Z Kolter, Madry (2018). Adversarial Robustness: Theory and Practice. NIPS tutorial, adversarial-ml-tutorial.org
[22] J.B. Lasserre (2001). Global optimization with polynomials and the problem of moments. SIAM Journal on optimization, 11(3), 796-817
[23] V. magron, X. Allamigeon, S.Gaubert and B.Werner (2015). Formal Proofs for Nonlinear Optimization. Journal of Formalized Reasoning Vol, 8(1), 1-24
[24] A. Raghunathan, J. Steinhardt and P. S. Liang (2018). Semide_nite relaxations for certifying robustness to adversarial examples. NEURIPS
[6] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In International Conference on Learning Representations
[26] E. Wong and J. Z. Kolter (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML
[27] E. Pauwels and J.-B. Lasserre (2019). The empirical Christo_el function with appication in data analysis. To appear in Advances in Computational Mathematics