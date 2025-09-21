Abstract and 1. Introduction

5 DISCUSSION

The empirical results we present demonstrate that our loss term is effective in its goal of boosting consensus among explainers. As with any first attempt at introducing a new objective to neural network training, we see modest results in some settings and evidence that hyperparameters can likely be tuned on a case-by-case basis. It is not our aim to leave practitioners with a how-to guide, but rather to begin exploring how practitioners can control where a model lies along the accuracy-agreement trade-off curve.

\ We introduce a loss term measuring two types of correlation between explainers, which unfortunately adds more complexity to the machine learning engineer’s job of tuning models. But, we show conclusively that there are settings in which using both types of correlation is better than using only one when encouraging explanation consensus.

\ Another limitation of these experiments as a guide on how to train for consensus is that we only trained with one pair of explainers. Our loss is defined for any pair and perhaps another choice would better suit specific applications.

\ In light of the contentious debate on whether deep models or decision-tree-based methods are better for tabular data [10, 31, 38], we argue that developing new tools for training deep models can help promote wider adoption for tabular deep learning. Moreover, with the results we present in this work, it is our hope that future work improves these trends, which could possibly lead to neural models that have more agreement (and possibly more accuracy) than their tree-based counterparts (such as XGBoost).

5.1 Future Work

Armed with the knowledge that training for consensus with PEAR is possible, we describe several exciting directions for future work. First, as alluded to above, we explored training with only one pair of explainers, but other pairs may help data scientists who have a specific type of target agreement. Work to better understand how a given pair of explainers in the loss affects the agreement of other explainers at test time could lead to principled decisions about how to use our loss in practice. Indeed, PEAR could fit into larger learning frameworks [22] that aim to select user- and task-specific explanation methods automatically.

\ It will be crucial to study the quality of explanations produced with PEAR from a human perspective. Ultimately, both the efficacy of a single explanation and the efficacy of agreement between multiple explanations is tied to how the explanations are used and interpreted. Since our work only takes a quantitative approach to demonstrate improvement when regularizing for explanation consensus, it remains to be seen whether actual human practitioners would make better judgments about models trained with PEAR vs models trained naturally.

\ In terms of model architecture, we chose standard sized MLPs for the experiments on our tabular datasets. Recent work proposes transformers [35] and even ResNets [10] for tabular data, so completely different architectures could also be examined in future work as well.

\ Finally, research into developing better explainers could lead to an even more powerful consensus loss term. Recall that IntGrad integrates the gradients over a path in input space. The designers of that algorithm point out that a straight path is the canonical choice due to its simplicity and symmetry [37]. Other paths through input space that include more realistic data points, instead of paths of points constructed via linear interpolation, could lead to even better agreement metrics on actual data.

5.2 Conclusion

In the quest for fair and accessible deep learning, balancing interpretability and performance are key. It is known that common explainers may return conflicting results on the same model and input, to the detriment of an end user. The gains in explainer consensus we achieve with our method, however modest, serve to kick start others to improve on our work in aligning machine learning models with the practical challenge of interpreting complex models for real-life stakeholders.

ACKNOWLEDGEMENTS

We thank Teresa Datta and Daniel Nissani at Arthur for their insights throughout the course of the project. We also thank Satyapriya Krishna, one of the authors of the original Disagreement Problem paper, for informative email exchanges that helped shape our experiments.

:::info Authors:

(1) Avi Schwarzschild, University of Maryland, College Park, Maryland, USA and Work completed while working at Arthur (avi1umd.edu);

(2) Max Cembalest, Arthur, New York City, New York, USA;

(3) Karthik Rao, Arthur, New York City, New York, USA;

(4) Keegan Hines, Arthur, New York City, New York, USA;

(5) John Dickerson†, Arthur, New York City, New York, USA ([email protected]).

:::

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

