Deep Neural Networks (DNNs) have achieved near human and in some cases super human accuracies in tasks such as machine translation, image classification, speech processing and so on. However, despite their enormous success these models are often used as black-boxes with very little visibility into their working. This opacity of the models often presents hindrance towards the adoption of these models for mission-critical and human-machine hybrid networks.
In this paper, we will explore the role of influence functions towards opening up these black-box models and for providing interpretability of their output. Influence functions are used to characterize the impact of training data on the model parameters. We will use these functions to analytically understand how the parameters are adjusted during the model training phase to embed the information contained in the training dataset. In other words, influence functions allows us to capture the change in the model parameters due to the training data. We will then use these parameters to provide interpretability of the model output for test data points.
|