Outlier Detection as a Safety Measure for Safety Critical Deep Learning

Jens Henriksson

Summary

The past decade has included groundbreaking algorithmic improvements on the back of immense amount of data. For computer vision and perception, the field of deep learning has significantly improved the segmentation and classification performance. However, as automation is increasing, testing and verification of the deep learning models have been seen as a difficult task, due to increased complexity in model development and scenario testing. Both the field of deep learning and safety lack ways of presenting realistic system design and safety case fragments for the usage of deep learning in safety critical applications.

During the duration of this thesis Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS) and ISO 21448 - Safety Of The Intended Functionality (SOTIF) has been researched. Both documents provide a starting ground to validate performance of machine learning based systems in safety automotive applications in a structured manner. However, both are purposely vague with regards to implementation details and testing evaluation metrics. From SOTIF, a key message is to consider a confusion matrix with four states: 1) known safe states, 2) known unsafe states, 3) unknown unsafe states and 4) unknown safe states. This objective share similarities with one studied issue from the deep learning field: handling of uncertain or input samples outside of the scope of the model. The issue stems from the fact that every input sample provides an output prediction, and several sub fields exist to mitigate the impact of undesired inputs e.g. generative models and adversarial training, outlier training and outlier detection.

This thesis has focused on utilizing the knowledge from the field of outlier detection as a method to estimate uncertainty and as a rejection strategy of predictions when operating on data that are too far from the training domain. The technique has been constructed and tested on several image-based datasets with various difficulty ranging from simple black and white digits to real automotive front looking cameras and shows how a safety measure can be applied to a deep learning function to reduce the risk of misclassifications. The beneficial safety argumentation of this approach is that it provides meta data of confidence to subsequent elements in the system. The approach is referred to as supervisor,or safety cage, throughout the papers where it has the role to act on incoming data (cf. monitoring in safety design that is incorporated as an observer, but not with an acting role). The supervisor is motivated through the study of nominal performance through AUROC measures (Area on the receiver operating characteristics curve) and risk-versus-coverage metric that allows a user defined accepted risk threshold as a connection to safety requirements.

Thesis included papers

  1. Automotive safety and machine learning: Initial results from a study on how to adapt the ISO 26262 safety standard, Jens Henriksson, Markus Borg, Cristofer Englund. IEEE/ACM International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS) 2018
  2. Towards Structured Evaluation of Deep Neural Network Supervisors, Jens Henriksson, Christian Berger, Markus Borg, Lars Tornberg, Cristofer Englund, Sankar Raman Sathyamoorthy, Stig Ursing. IEEE International Conference On Artificial Intelligence Testing (AITest) 2019
  3. Performance Analysis of Out-of-Distribution Detection on Trained Neural Networks, Jens Henriksson, Christian Berger, Markus Borg, Lars Tornberg, Sankar Raman Sathyamoorthy, Cristofer Englund. In Journal of Information and Software Technology 2020
  4. Understanding the Impact of Edge Cases from Occluded Pedestrians for ML Systems, Jens Henriksson, Christian Berger, Stig Ursing. In 47th Euromicro Conference on Software Engineering and Advanced Applications 2021
  5. Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System, Markus Borg, Jens Henriksson, Kasper Socha, Olof Lennartsson, Elias Sonnsjö Lönegren, Thanh Bui, Piotr Tomaszewski, Sankar Raman Sathyamoorthy, Sebastian Brink and Mahshid Helali Moghadam. In Software Quality Journal 2023
  6. Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets, Jens Henriksson, Christian Berger, Stig Ursing and Markus Borg. In Software Quality Journal 2023

Additional papers