| Name | Task | Application | Data Type | data size | Injected Failure | num trial | Fault time | Workload | | ------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | ------------------------------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | --------- | ---------- | ---------------------------------------- | | [[2018__ICSOC__Microscope―Pinpoint Performance Issues with Causal Graphs in Micro-service Environments\|Microscope]] | RCA | [[Sock Shop]] (8 services) | Metrics, dependencies | 1sec sampling | CPU hog, Network jam, Container Pause | 5 (240) | 1 min | 5000 QPS | | [[2020__WWW__AutoMAP - Diagnose Your Microservice-based Web Application\|AutoMAP]] | RCA | [[Pymicro]] (16 services), IBM Clouud (1732 services) | Metrics | 1.5M series, -1/+1 hour, Pymicro(2sec sampling), IBM Cloud(5 sec sampling) | container shutdown, DoS attack, 20 incidents | 20 | | NA | | [[2018__CCGRID__CloudRanger―Root Cause Identification for Cloud Native Systems\|CloudRanger]] | RCA | [[Pymicro]] (16 services), IBM Bluemix (1000+ services) | Metrics | +10M series, 2hours | NA | NA | NA | NA | | [[2020__NOMS__MicroRCA - Root Cause Localization of Performance Issues in Microservices\|MicroRCA]] | RCA | [[Sock Shop]] | Metrics, dependencies(edge metrics) | 5sec sampling | Network latency, CPU hog, Memory leak | 5 (95) | 1 min | 500 users, 600QPS | | [[2020__ICSOC__Localization of Operational Faults in Cloud Applications by Mining Causal Dependencies in Logs using Golden Signals]] | RCA | [[TrainTicket]] (41 services) | Logs, dependencies (edge metrics) | 164,740 log lines (average) | HTTP request error | NA | NA | NA | | [[2020__IWQoS__Localizing Failure Root Causes in a Microservice through Causality Inference\|MicroCause]] | RCA | Online Shopping (400 services) | Metrics | | 86 online incidents | Na | NA | NA | | [[2019__ESEC-FSE__Latent Error Prediction and Fault Localization for Microservice Applications by Learning from System Trace Logs\| MEPFL]] | RCA | [[Sock Shop]] [[TrainTicket]] | Traces | | [[Sock Shop]] 32 faults, [[TrainTicket]]142 faults, 10 cases (total 20 cases) | NA | NA | NA | | [[2020__UCC__MicroRAS - Automatic Recovery in the Absence of Historical Failure Data for Microservice Systems\|MicroRAS]] | AD,Recovery | [[Sock Shop]] | Metrics | 5 mins time range | CPU hog, Memory leak | 3-5 (23) | 3 min | 500 users, 600QPS (2min running nomrmal) | | [[2020__PRDC__Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing]] | Recovery | [[Cassandra]] | Metrics | | | NA | NA | [[YCSB]] | | [[2020__ISSRE__Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks\| TraceAnomaly]] | AD | [[TrainTicket]] | Traces | training set: 1 day traces, 380k traces, 40 peaks, test set: 30,356 normal, 26,999 anomaly(response), 2380 anomaly (route) | training set: not injected, test set: network latency -> (30s wait) -> pod delete -> restart system -> wait 10min | 1 | 5 min | NA | | [[2020__Applied Science__A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications]] | RCA | [[Sock Shop]] | Metrics | | CPU burnout, memory overload, Disk I/O block, Network Jam | 20 | NA | [[Locust]] | | [[2017__CCS__DeepLog - Anomaly Detection and Diagnosis from System Logs through Deep Learning\|DeepLog]] | AD | [[OpenStack]] | Logs | HDFS 11,197,954 entries, 100 EC2 2.9% anomaly, OpenStack 1,335,318 entries (7% anomaly) | VM create timeout, VM destroy error, VM cleanup error | NA | NA | | | [[2020__ICWS__Root-Cause Metric Location for Microservice System via Log Anomaly Detection]] | RCA | [[TrainTicket]] | Logs, Metrics | | Resource exhaustion(CPU,memory,disk...), HTTP network tw delay, HTTP network tw abortion, 16 failure cases | NA | NA | NA | | [[2021__CODS-COMAD__Evaluation of Causal Inference Techniques for AIOps]] | RCA | [[TrainTicket]] | Logs | | NA | NA | NA | NA | | [[2018__ICST__Localizing Faults in Cloud Systems\|LOUD]] | RCA | | | | | | | | ## Others - [[2018__Middleware__Sieve Actionable Insights from Monitored Metrics|Sieve]] - [[2020__ACCESS__A Framework of Virtual War Room and Matrix Sketch-Based Streaming Anomaly Detection for Microservice Systems]]