## [[SREcon24 Americas]] - [[Measuring Reliability Culture to Optimize Tradeoffs - Perspectives from an Anthropologist - SREcon24 Americas]] - [Hard Choices, Tight Timelines: A Closer Look at Skip-level Tradeoff Decisions during Incidents](https://www.usenix.org/conference/srecon24americas/presentation/maguire) - [Defence at the Boundary of Acceptable Performance](https://www.usenix.org/conference/srecon24americas/presentation/hatch) - [System Performance and Queuing Theory - Concepts and Application](https://www.usenix.org/conference/srecon24americas/presentation/poole) - [Cross-System Interaction Failures: Don't Fail through the Cracks](https://www.usenix.org/conference/srecon24americas/presentation/xu) - [Gray Failure: The Achilles’ Heel of Cloud-Scale Systems](https://www.usenix.org/conference/srecon24americas/presentation/li) - [Real Talk: What We Think We Know — That Just Ain’t So | USENIX](https://www.usenix.org/conference/srecon24americas/presentation/allspaw) - [[ウェブオペレーション]] の著者 ## SREcon24 EMEA - [Improving Kafka Resilience - Gray Failures Mitigation](https://www.usenix.org/conference/srecon23emea/presentation/valentinova) ## SREcon23 Americas - [We're Still Down: A Metastable Failure Tale](https://www.usenix.org/conference/srecon23americas/presentation/lexmond) - [Turning an Incident Report into a Design Issue with TLA+](https://www.usenix.org/conference/srecon23americas/presentation/hackett) - [Far from the Shallows: The Value of Deeper Incident Analysis](https://www.usenix.org/conference/srecon23americas/presentation/nash) - [An Organizational Response to Incidents: Designing for Smooth Coordination in High Tempo, Large Scale Software Incident Response | USENIX](https://www.usenix.org/conference/srecon23americas/presentation/maguire) ## SREcon22 Apac - [Move Fast and Learn Things: Principles of Cognition, Teaming, and Coordination to Support High Performance and Resilient Site Reliability Engineering](https://www.usenix.org/conference/srecon22apac/presentation/maguire) - [The Math behind the Incident Aftermath: A Practical Guide to Measuring Incident Impacts](https://www.usenix.org/conference/srecon22apac/presentation/patel) ## SREcon 22 Americas - [The Scientific Method for Resilience](https://www.usenix.org/conference/srecon22americas/presentation/yakomin) - [Tales from the VOID: The Scary Truth about Incident Metrics](https://www.usenix.org/conference/srecon22americas/presentation/nash) ## SREcon 22 EMEA - [Principled Identification of "Root Causes" Using Techniques from Safety Engineering](https://www.usenix.org/conference/srecon22emea/presentation/devesine) - [Honey, I Broke the Things: Debugging Gray Failures in Production!](https://www.usenix.org/conference/srecon22emea/presentation/kumari) - [Over Nine Billion Dollars of SRE Lessons - the James Webb Space Telescope](https://www.usenix.org/conference/srecon22emea/presentation/barron) - [The Math of Scalability](https://www.usenix.org/conference/srecon22emea/presentation/ish-shalom) - [Knowledge and Power: A Sociotechnical Systems Discussion on the Future of SRE | USENIX](https://www.usenix.org/conference/srecon22emea/presentation/maguire) - [Statistics for Engineers | USENIX](https://www.usenix.org/conference/srecon22emea/presentation/hartmann) ## SREcon21 - [Tales from the VOID: The Scary Truth about Incident Metrics](https://www.usenix.org/conference/srecon22americas/presentation/nash) - [[A Political Scientist's View on Site Reliability - SREcon21]] - [A Political Scientist's View on Site Reliability | USENIX](https://www.usenix.org/conference/srecon21/presentation/krax) - [You've Lost That Process Feeling: Some Lessons from Resilience Engineering](https://www.usenix.org/conference/srecon21/presentation/woods) - [Need for SPEED: Site Performance Efficiency, Evaluation and Decision | USENIX](https://www.usenix.org/conference/srecon21/presentation/chow) - [[Spike Detection in Alert Correlation at LinkedIn - SREcon21]] - [[User Uptime in Practice - SREcon21]] ## SREcon20 Americas - [Avoiding Goodhart's Law - Use SLO's as Tools Not Cudgels](https://www.usenix.org/conference/srecon20americas/presentation/coulter) - [Weeks of Debugging Can Save You Hours of TLA+](https://www.usenix.org/conference/srecon20americas/presentation/kuppe) - [The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams | USENIX](https://www.usenix.org/conference/srecon20americas/presentation/maguire) - [[2020__Dissertation__Controlling the Costs of Coordination in Large-scale Distributed Software Systems]] ## SREcon19 EMEA - [The Unmonitored Failure Domain: Mental Health | USENIX](https://www.usenix.org/conference/srecon19emea/presentation/woo) - [Control Theory for SRE](https://www.usenix.org/conference/srecon19emea/presentation/hahn) - [Support Operations Engineering: Scaling Developer Products to the Millions | USENIX](https://www.usenix.org/conference/srecon19emea/presentation/ali) ## SREcon19 Asia - [Ironies of Automation: A Comedy in Three Parts](https://www.usenix.org/conference/srecon19asia/presentation/lund-comedy) ## SREcon19 Americas - [The Curse of SRE Autonomy and How to Manage It](https://www.usenix.org/conference/srecon19americas/presentation/bondi) - [Fault Tree Analysis Applied to Apache Kafka | USENIX](https://www.usenix.org/conference/srecon19americas/presentation/falko) - [Resilience Engineering Mythbusting | USENIX](https://www.usenix.org/conference/srecon19americas/presentation/gallego) ## SREcon18 Asia - [PV Monitoring Based on Linear Regression](https://www.usenix.org/conference/srecon18asia/presentation/bo) ## SREcon15 Europe - [Distributed Consensus Algorithms for Extreme Reliability](https://www.usenix.org/conference/srecon15europe/program/presentation/nolan)