## SLI・SLOに関する提案
- [[2024__SOSE__Diffusing High-level SLO in Microservice Pipelines]]
- [[2024__ApPLIED__DeepSLOs for the Computing Continuum]]
- [[2024__IEEE Internet Computing__On Causality in Distributed Continuum Systems]]
- [[2023__SSE__Towards a Prime Directive of SLOs]]
- SLO関連の論文のレビュー
- [[2021__CLOUD__A Novel Middleware for Efficiently Implementing Complex Cloud-Native SLO]]
- [[2021__ICWS__SLO Script - A Novel Language for Implementing Complex Cloud-Native Elasticity-Driven SLOs]]
- [[2020__NSDI__Meaningful Availability|Meaningful Availability]]
- G Suiteに導入されたUser Uptimeと呼ばれるSLIの話。
- [[2019__HotOS__Nines are Not Enough Meaningful Metrics for Clouds]]
- クラウド事業者におけるSLOの定義の難しさについて述べている。
- 著者らは、[[Site Reliability Engineering - Google|srebook]]のchapter 4の共著者。
- [[2017__HotOS__Thinking about Availability in Large Service Infrastructures]]
## インシデントレスポンスにおける原因診断
いずれも、SLOの違反を契機として、原因診断のためのデータ分析アルゴリズムが実行されるというフレームワークにそっている。
- [[2021__CLOUD__Causal Modeling based Fault Localization in Cloud Systems using Golden Signals]]
- [[2021__ISSRE__Identifying Root-Cause Metrics for Incident Diagnosis in Online Service Systems]]
- [[2020__WWW__AutoMAP - Diagnose Your Microservice-based Web Application]]
- 旧名 WWW のWeb系トップカンファレンスの論文
- 論文中にSite Reliability Engineerがでてくる。
- [[2020__NOMS__MicroRCA - Root Cause Localization of Performance Issues in Microservices]]
- [[2020__Applied Science__A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications]]
- [[2018__CCGRID__CloudRanger―Root Cause Identification for Cloud Native Systems]]
- [[2018__ICSOC__Microscope―Pinpoint Performance Issues with Causal Graphs in Micro-service Environments|Microscope]]
- [[2014__INFOCOM__CauseInfer―Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems]]
## SLIを目的関数とした自動制御
- [[2025__arXiv__Tempo - Application-aware LLM Serving with Mixed SLO Requirements]]
- [[2025__CLOUD__SLO-Aware Container Orchestration on Kubernetes Clusters]]
- MSARS: [[2024__arXiv__MSARS - A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices]]
- Octopus: [[2024__CLOUD__Intent-Driven Multi-Engine Observability Dataflows for Heterogeneous Geo-Distributed Clouds]]
- [[2024__DSN__When Green Computing Meets Performance and Resilience SLOs]]
- [[2018__ATC__SLAOrchestrator - Reducing the cost of performance SLAs for cloud data analytics]]
- RedShiftなどのデータ分析用のOLAPを対象に、マルチテナントでクエリが発行される環境で、クエリ実行時間などのSLAを保証しながら、インスタンス数やインスタンスサイズを調整する。