[OpenReview] [arXiv] [Code]
TL;DR: The most important principal components provide more complete and interpretable explanations than the most important neurons.
High quality explanations of neural networks (NNs) should exhibit two key properties. Completeness ensures that they accurately reflect a network's function and interpretability makes them understandable to humans. The most complete explanation would be to simply display the equation for a layer's forward pass. However, this explanation has poor interpretability. At the opposite extreme, many popular DNN explanation methods make choices that increase interpretability at the expense of completeness. Many popular methods provide explanations of individual neurons within a network. We provide evidence that for AlexNet, neuron-based explanation methods sacrifice both completeness and interpretability compared to activation principal components (PCs). Neurons are a poor basis for AlexNet embeddings because they don't account for the distributed nature of these representations.
The problem of explaining a NN can be decomposed into understanding the nonlinear transformation applied by each layer in terms of the NN's input space. To facilitate this for a particular layer, we sample activations, fit a basis for activation space, and visualize points along each basis vector.
Using the interface below, you can interpret visualizations of the following:
Explanation completeness is an abstract concept that could be measured in a variety of ways. We use two complementary measures of subspace completeness below.
One measure of completeness is the fraction of activation variance explained by a set of basis vectors. Below, we plot the cumulative explained variance ratio of the top-k basis vectors. Much of the activation variance is concentrated in the most important PCs (blue line) whereas explained variance is far less concentrated in the neuron basis (orange line). For example, to explain 80% of the activation variance for fc1, one could either study the first 42 PCs, or the 2782 highest variance neurons.
Another measure of completeness is to cumulatively ablate basis vectors and observe how much accuracy degrades. Basis vectors more important for a network's function should degrade accuracy rapidly compared to less important basis vectors. For most layers in AlexNet, ablating the highest variance PCs (solid blue line) damages accuracy more than ablating the highest variance neurons (solid orange line).
Nolan Dey, Eric Taylor, Alexander Wong, Bryan Tripp, Graham Taylor. Neuron-based explanations of neural networks sacrifice completeness and interpretability. Transactions on Machine Learning Research, 2025.
@article{dey2025neurons, author = {Dey, Nolan and Taylor, Eric and Wong, Alexander and Tripp, Bryan and Taylor, Graham}, title = {Neuron-based explanations of neural networks sacrifice completeness and interpretability}, year = {2025}, journal = {Transactions on Machine Learning Research}, url = {https://openreview.net/forum?id=UWNa9Pv6qA} }
This research was supported by funding from BMO Bank of Montreal through the the Waterloo Artificial Intelligence Institute (SRA #081648). The authors thank Thomas Fortin for helping to run experiments with ResNet and ViT. This research was supported, in part, by the Province of Ontario and the Government of Canada through the Canadian Institute for Advanced Research (CIFAR), and companies sponsoring the Vector Institute. GWT is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs program, and the Canada CIFAR AI Chairs program. This research was conducted with approval from the University of Guelph Research Ethics Board (REB #20-12-003).