ROBUSTNESS VIA DETECTION AND ADAPTATION TO UNKNOWN DATA

Faculty members : Sébastien Gerchinovitz, Jean-Michel Loubes, Mathieu Serrurier

Datascientists : David Bertoin, David Vigouroux, Franck Mamalet

Students : M2 internship

SCOPE

A main issue of machine learning algorithms is their incapacity to process experiences outside the training domain. The challenge will be to detect unknown observations, and to be able to process take into account these new observations to update and increase capacity of the algorithms (including Deep neural networks).

The whole machinery of machine learning technics relies on the assumption that the learning sample conveys all the information of the data. Hence the algorithm is only able to learn what is observed in the data used to train the algorithm. So no novelty can be forecast using standard methods, while in practice new behavior may appear in the observed data.

 

For instance this situation may occur when dealing with:

  • an unknown-unknown example (not inside the test dataset) whose processing is not required
  • an example close to the training distribution (like adversarial examples) which must be proceed
  • new observations acquired by the validation team or the final user which has to be learnt. For example, for autonomous car, traffic signs classification networks must be able to adapt to new ones without a long and expensive retraining

In all cases, the system should handle these events to avoid the system to be over confident. The neural network should be able to adapt following these unknown observations.

Examples of industrial use cases

This challenge aims to be able to identify an unknown observation, create a new class with this observation and update the classifier with new classes.

Industrial applications of novelty detection are quite obvious, whatever the domain, enabling to reject input data, or avoid wrong decisions. It may also enable to raise alarms for anomaly detection.

Robustness to adversarial examples is also a kind of obvious application for safety critical systems, in order to be able to raise alarms when confidence on the output is low.

The capacity to learn from few data, or incrementally update a system with new class or data has many advantages from an industrial point of view. First in many domains, the whole dataset may not be available at the same time, or the distribution of data can shift with time. Thus maintenance of the system would require simple methods to upgrade the neural networks.

Example of unknown observation could be:

  • a new traffic sign (80 km/h)
  • a new type of turn lights for automotive systems
  • a new type of objects in remote sensing image processing

State of the art and limitations

Flow-based models [11], [12], [13], models able to estimate the density function of the dataset, have been developed. Flow based-model could be seen as good tool to estimate experiences out of the distribution. However, recently, it has been show that some flow based-model [14] could estimate with a high probability that an experience outside of the distribution belongs to this distribution. Additionally, the features generated by flow based models have no meanings except for faces datasets generation. When the unknown data is identified, it is necessary to handle it. This problematic enters in a classical research topic: Zero/one/few-shot learning and many approaches have been proposed: [16] and [17] propose to retrain a new model on very small dataset using knowledge from other networks (transfer learning) trained on images [16] or text [17]. Use of Siamese networks have also been proposed [18] to solve this problem. If these various strategies have demonstrated good results on certain type of images, especially on faces [19], their performances on unseen characters, or general object types has to be improved. Furthermore, the “Incremental learning” [20] field aims to study the capacity of the system to adapt the network structure in order to be able to handle new data or classes. The machine learning models has to be robust to this introduction of new data or new classes, with no access to the whole initial dataset. The main challenge is to avoid “catastrophic forgetting” [21], i.e. to give guarantees of nonregression on previous acquired knowledge. A lot of work has been proposed in this field, combining weak learners [22] [23], using hierarchical system based on confusion matrix with previous classes [5], or fine tuning part of the network [25], or keeping representative subset of data for each class [26]. Many solutions in current state of the art suffer from difficulties to learn new data, or performance decrease on former data. Another limitation can be the ability to reject new classes with former networks. In the domain of the robustness of the neural networks, different methods have been developed in order to be robust against adversarial attacks. Some strategies try to project the adversarial examples in the original distribution as in [1], [2], [3]. Other approaches propose to add adversarial examples into the training dataset [4], [5], [6], similar to data augmentation. Finally, new methods like [7] transform the input data such as it is less sensible to perturbations. The problem with those methods is that they are attack-specific and then not efficient against new or simply different attacking method. Recently new methods proposed to learn the attacking and defending concept using neural networks. Some methods use Generative Adversarial Networks (GANs) to generate adversarial examples as in [8], [9], [10].

Scientific approach to solve the challenge

This topic is a core issue in machine learning. There are several methods to tackle this crucial issue for AI certification. Hereafter we present some new ways of dealing with novelty detection. The first direction of the research program (Robustness Unknown Data with flow based models) is to be able use flow base models to identify unknown data. To resolve the issue raise by [14], we propose to introduce two regularizations for the loss to obtain three properties. First – Compression: Concentrate the whole information of the distribution on a sub-set of features. Second – Expansion: The others features are used to fill the “remaining space”. Third – Orthogonality: The features of an observation close to the distribution must be close to the distribution boundary. The second direction of the research program is to study the technics of Zero/One/Few shot learning to propose new solutions for quick adaptation of the system in front of unknown data. A M2 internship will be done to explore this way. The third direction of the proposed research will focus on finding new strategies for incremental learning to leverage the tradeoff between learning new data, forgetting of previous knowledge, and proportion of data to be kept for future learning. working on dataset distributional implications, or new models that can maintain performances. We plan to explore Siamese networks [27], or triplet loss [28] learning to introduce the ability to reject unseen classes.

Success criteria for the challenge

Challenge will be successful if we have developped:

  1. Algorithms able to detect unknown data
  2. Algorithms able to classify unknown objects with few labelled examples
  3. Methodology to update incrementally the architecture of a networks to handle new observations of new classes

Dataset required for the challenge

  1. Standard datasets: for comparison with state of the art methods, we will use standard datasets such as MNIST, Cifar10 or Cifar100 to learn on part of the datasets and be able to detect unknown inputs or to upgrade with new classes
  2. Traffic sign:
    We can either use standard datasets such as GTSRB (German Traffic Sign Recognition Benchmark), or propose such challenge on industrial datasets (such as the Airbus taxiway traffic sign dataset)
  3. Turn sign:
    Renault may provide a dataset of cars with annotated turn sign activity. The challenge can cope with either new kind of turn sign (such as turn signs with chaser effect), or either with warning class detection
  4. Remote sensing:
    We could use the same datasets as those used in 0 challenge, with the aim to be able to detect a new object, or to adapt the system to a new kind of sensor

References

[1] Dongyu Meng and Hao Chen. MagNet: A Two-Pronged Defense against AdversarialExamples.InProceedings of the 2017 ACM SIGSAC Conference on Computer and 15 Communications Security – CCS ’17, pages 135–147, Dallas, Texas, USA, 2017. ACMPres

[2] Pouya Samangouei, Maya Kabkab, and Rama Chellappa.DEFENSEGAN:  PROTECTING CLASSIFIERS AGAINST ADVERSARIAL ATTACKS USING GENERATIVE MODELS. page 17, 2018

[3] Gokula Krishnan Santhanam and Paulina Grnarova.Defending Against AdversarialAttacks by Leveraging an Entire GAN.arXiv:1805.10652 [cs, stat], May 2018. arXiv:1805.10652

[4] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv:1312.6199[cs], December 2013. arXiv: 1312.6199

[5] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and HarnessingAdversarial  Examples.arXiv:1412.6572 [cs, stat], December 2014. arXiv: 1412.6572

[6] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physicalworld.arXiv:1607.02533 [cs, stat], July 2016. arXiv: 1607.02533

[7] Vishaal Munusamy Kabilan, Brandon Morris, and Anh Nguyen.VectorDefense:Vectorization as a Defense to Adversarial Examples. April 2018

[8] Nicholas Carlini and David Wagner. Defensive Distillation is Not Robust to AdversarialExamples.arXiv:1607.04311 [cs], July 2016. arXiv: 1607.04311

[9] Shiwei Shen, Guoqing Jin, Ke Gao, and Yongdong Zhang.APE-GAN: AdversarialPerturbation Elimination with GAN.arXiv:1707.05474 [cs], July 2017.arXiv:1707.05474

[10] Hyeungill Lee, Sungyeob Han, and Jungwoo Lee. Generative Adversarial Trainer: Defenseto Adversarial Perturbations with GAN.arXiv:1705.03387 [cs, stat], May 2017. arXiv:1705.03387

[11] Diederik P. Kingma, Prafulla Dhariwal Glow: Generative Flow with Invertible 1×1 Convolutions, 10 Jul 2018. arXiv:1807.03039

[12] Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud – FFJORD: Free-Form Continuous Dynamics For Scalable Reversible Generative Models – 11/2018 ICLR 2019

[13] Emilien Dupont, Arnaud Doucet, Yee Whye – The Augmented Neural ODEs. 2 april 2019. Arxiv: 1904.01681

[14] Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan – Do deep generative models know what they don’t know? 02/2019 – ICLR 2019

[15] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard.DeepFool:a simple and accurate method to fool deep neural networks.arXiv:1511.04599 [cs],November 2015. arXiv: 1511.04599

[16] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, et D. Wierstra, « Matching Networks for One Shot Learning », arXiv:1606.04080 [cs, stat], juin 2016

[17] Y. Xian, C. H. Lampert, B. Schiele, et Z. Akata, « Zero-Shot Learning – A Comprehensive Evaluation of the Good, the Bad and the Ugly », arXiv:1707.00600 [cs], juill. 2017

[18] G. Koch, R. Zemel, et R. Salakhutdinov, « Siamese Neural Networks for One-shot Image Recognition », p. 8

[19] F. Schroff, D. Kalenichenko, et J. Philbin, « FaceNet: A Unified Embedding for Face Recognition and Clustering », 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 815 823, juin 2015

[20] R. Polikar, L. Upda, S. S. Upda, et V. Honavar, « Learn++: an incremental learning algorithm for supervised neural networks », IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 31, no 4, p. 497􀌻 508, nov. 2001

[21] M. McCloskey et N. J. Cohen, « Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem », in Psychology of Learning and Motivation, vol. 24, G. H. Bower, Éd. Academic Press, 1989, p. 109-165

[22] Y. Freund et R. E. Schapire, « A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting », Journal of Computer and System Sciences, vol. 55, no 1, p. 119􀌻 139, août 1997

[23] D. Medera et Š. Babinec, « Incremental Learning of Convolutional Neural Networks », p. 7, 2009

[24] T. Xiao, J. Zhang, K. Yang, Y. Peng, et Z. Zhang, « Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification », in Proceedings of the ACM International Conference on Multimedia – MM ’14, Orlando, Florida, USA, 2014, p. 177􀌻 186

[25] Z. Li et D. Hoiem, « Learning without Forgetting », arXiv:1606.09282 [cs, stat], juin 2016 

[26] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, et C. H. Lampert, « iCaRL: Incremental Classifier and Representation Learning », in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, p. 5533􀌻 5542

[27] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR

[28] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS. MIT Press, 2006. 2, 3