2021__KDD__TimeSHAP - Explaining Recurrent Models through Sequence Perturbations

## Memo ## Abstract Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients. リカレントニューラルネットワーク（[[RNN]]）は、数多くの逐次的な意思決定タスクにおいて最先端を行っているが、その予測値を説明する研究はほとんど行われていない。本研究では、KernelSHAPをベースにしてシーケンシャル領域に拡張した、モデルに依存しないリカレント説明器であるTimeSHAPを発表する。TimeSHAPは、特徴、タイムステップ、およびセルレベルの帰属を計算する。シーケンスは任意の長さになる可能性があるため、我々はさらに、計算コストと帰属の分散の両方を劇的に減少させることが示される刈り込み法を提案する。我々はTimeSHAPを用いて、実世界の銀行口座乗っ取り詐欺検出RNNモデルの予測を説明し、その説明から重要な洞察を得た。i) このモデルは、詐欺師のアナリストが口座乗っ取りの手がかりと考えているものと一致する重要な特徴とイベントを特定している。 ii) 古いイベントには帰属価値が残っているため、正の予測シーケンスは元の長さのわずか10%にまで刈り込むことができる。iii) 正の予測の最新の入力イベントは、モデルのスコアの平均41%にしか寄与していない。 ## 1. Introduction