異常検知データセット - yuuk1's Digital Garden

- Yahoo：S5 - A Labeled Anomaly Detection Dataset [A Benchmark Dataset for Time Series Anomaly Detection](https://yahooresearch.tumblr.com/post/114590420346/a-benchmark-dataset-for-time-series-anomaly) - Numenta：Numenta Anomaly Benchmark (NAB) [Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark](https://ieeexplore.ieee.org/document/7424283) [numenta/NAB](https://github.com/numenta/NAB) - 既存のデータセットには欠陥があるという主張する論文が2020年に出ている - 本論文では，「[UCR Time Series Anomaly Datasets](https://wu.renjie.im/research/anomaly-benchmarks-are-flawed/arxiv/)」を提案している [Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress](https://arxiv.org/abs/2009.13807) > Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo [5], Numenta [6], NASA [2] or Pei’s Lab (OMNI) [3], etc. > The majority of the individual exemplars in these datasets suffer from one or more of four flaws. These flaws are triviality, unrealistic anomaly density, mislabeled ground truth and run-to-failure bias. - AIOps Challenge - 清華大学のNetMan Labが「AIOps Challenge」と称して毎年データ解析コンペを開催しているみたい - 題材としては，KPI（と書いているがメトリクスと同等）の異常検知．なので，多変量時系列データの異常検知のベンチマークとして活用できそうである． - コンペページ見るからにリーダーボードとかディスカッションとかもあってほぼKaggleと同じようなサイトを作っていてすごい（全て中国語だが） [首页](http://iops.ai/) - 上のサイトからは登録しないとデータをダウンロードできないが，一部のデータはNetMan LabのGitHub経由でダウンロードできそう． [https://github.com/NetManAIOps/AIOps-Challenge-2020-Data](https://github.com/NetManAIOps/AIOps-Challenge-2020-Data) [https://github.com/NetManAIOps/MultiDimension-Localization](https://github.com/NetManAIOps/MultiDimension-Localization) - また，コンペとは別にNetMan Labが出した異常検知手法の提案論文に紐づくデータセットがリポジトリに置いてあるのでこれらも使えそう [https://github.com/NetManAIOps/OmniAnomaly](https://github.com/NetManAIOps/OmniAnomaly) これらのデータセットを使ったベンチマークでは事前知識を入れないことを想定している．異常検知にシステム構成など事前知識を導入することを想定する場合は，上記のデータセットを用いたベンチマークの結果と大きく異なることが想定される．ただ，IBIS2020のKaggleチュートリアルの登壇者が「新しい手法やOptimizerが出たらベンチマークをとって良さそうであれば実践で試すようにしている」と言っていたように，これらのデータセットを使っていろいろな異常検知手法のベンチマークをとることは面白そうだし，それらを把握しておくことは自分たちのデータに適合するかどうかは別として有用かもしれない．