SREのMap of Contentsページ。 ## 用語・総論 - [[notes/sre/SRE]] - [[信頼性]] - [[SREの信頼性の定義]] - [[SLO]] - [[根本原因 - SRE]] - [[Ironies of Automation]] - [[DevOps]] - [[State of DevOps Report]] - [[SRE in the Third Age - SREcon19EMEA]] - [[ソフトウェア異常の用語とプロセス]] - [[SRE vs Platform Engineering]] - [[ウェブオペレーション]] ## Papers - [[2023__OSDI__Defcon - Preventing Overload with Graceful Feature Degradation]] - [[2023__SIGCSE__Teaching Site Reliability Engineering as a Computer Science Elective]] - [[2020__NSDI__Meaningful Availability]] - [[2019__HotOS__Nines are Not Enough Meaningful Metrics for Clouds]] - [[2010__SoCC__Characterizing Cloud Computing Hardware Reliability]] ## SLI/SLO - [[The Art of SLOs]] - [[Adopting SLOs]] - [[Quality as an SLI]] - [[Mackerel SLO API Quolity]] - [[SLOの品質の改善]] - [[SLOの起源]] - [[SLOによる本番投入や切り戻し基準の設定]] - [[2020__NSDI__Meaningful Availability]] - [[メルカリのSLO運用]] - [[はてなSLOモデル]] - [[SLOツール]] ## [[Observability]] - [[Telemetry - MOC]] ## Incident Management - [[Incident Management - MOC]] ## [[Infrastructure as Code]] ## Conferences ### SRECon - [[SREcon25 Americasまとめ]] - [[SREcon24 Americas]] - [[SREcon23 EMEA Watch List]] - [[SREcon23 Americas Watch List]] - [[SRECon22 America Watch List]] - [[SRECon21 Watch List]] - [[SRECon20 America]] ### [[SRE NEXT]] - [[SRE NEXT 2024]] - [[SRE NEXT 2023]] - [[SRE NEXT 2022]] ## Software Reliability Engineering - [[Software Reliability Engineering]] ## Reliability Engineering - [[信頼性工学]] - [[レジリエンス]] - [[レジリエンス工学]] - [[Resilience Engineering - Learning to Embrace Failure]] ## Books - [[Site Reliability Engineering - Google]] - [[📘Site Reliability Workbook]] - [[Seeking SRE]] - [[SREの格言]] - [[Reliable Machine Learning - Applying SRE Principles to ML in Production]] ## Case Studies - [[はてなのSRE - MOC]] - [[SRE in Nikkei]] ## [[AIOps]] - [[AIOps - MOC]] ## Others - [[Principles of Software Engineering, Part 1]] - [[2023__CLOSER__Semi-Automated Smell Resolution in Kubernetes-Deployed Microservices]] - [[2021 SRE Report - Catchpoint]] - [[The Morning Paper on Operability]] - [[Platform Engineering]] - [[Awesome Load Management]] - [[Systems Empirical Study Papers]] - [[A Conceptual Framework for System Fault Tolerance]]