[SREcon24 Americas Conference Program | USENIX](https://www.usenix.org/conference/srecon24americas/program)
## General remarks (Opinions)
- 20 Years of SRE: Highs and Lows
- Build vs. Buy in the Midst of Armageddon
- When Your Open Source Turns To The Dark Side
- Real Talk: What We Think We Know — That Just Ain’t So
- Sustainable Reliability Engineering
- Cloudy with a Chance of Operational Excellence
- Frontend Design in SRE
- What Can You See from Here?
## Case Studies
- Product Reliability for Google Maps
## [[Observability]]
- Using Generative AI Patterns for Better Observability
- The Ticking Time Bomb of Observability Expectations
- Synthesizing Sanity with, and in Spite of, Synthetic Monitoring
- [[99.99% of Your Traces are (Probably) Trash - SREcon24 Americas]]
- Kube, Where’s My Metrics? The Challenges of Scaling Multi-Cluster Prometheus
- Workshop: Cloud-Native Observability with OpenTelemetry
- The Invisible Door: Reliability Gaps in the Front End
## [[Incident Response]]
- Thawing the Great Code Slush
- Autopsy of a Cascading Outage from a MySQL Crashing Bug
- "Logs Told Us It Was Kernel – It Wasn't"
- What Is Incident Severity, but a Lie Agreed Upon?
- Hard Choices, Tight Timelines: A Closer Look at Skip-level Tradeoff Decisions during Incidents
- [[Storytelling as an Incident Management Skill]]
## FinOps
- Scam or Savings? A Cloud vs. On-Prem Economic Slapfight
## Distributed systems
- Capacity Constraints Unveiled: Navigating Cloud Scaling Realities
- Kubernetes: The Most Graceful Termination™
- System Performance and Queuing Theory - Concepts and Application
- It Is OK to be Metastable
- Cross-System Interaction Failures: Don't Fail through the Cracks
- Gray Failure: The Achilles’ Heel of Cloud-Scale Systems
- From Chaos to Clarity: Deciphering Cache Inconsistencies in a Distributed Environment
## Database
- Sharding: Growing Systems from Node-scale to Planet-scale
- Migrating a Large Scale Search Dataset in Production in a Highly Available Manner
- The Sins of High Cardinality
- Strengthening Apache Pinot's Query Processing Engine with Adaptive Server Selection and Runtime Query Killing
## CI/CD
- OIDC and CICD: Why Your CI Pipeline Is Your Greatest Security Threat
## Security
- What We Want Is 90% the Same: Using Your Relationship with Security for Fun and Profit
## Migration
- Optimizing Resilience and Availability by Migrating from JupyterHub to the Kubeflow Notebook Controller
- Handling the Largest Domains Migration, Ever!
- Navigating the Kubernetes Odyssey: Lessons from Early Adoption and Sustained Modernization
- Taming the Linux Distribution Sprawl: A Journey to Standardization and Efficiency
## Data management
- Quash: Patterns for Data Lifecycle Management
## Technical debt
- Patching Your Way to Compliance with a Small Team and a Pile of Technical Debt
## Goverment
- Demystifying FedRAMP
## Manegement
- Meeting the Challenge of Burnout
- Triage with Mental Models
- Defence at the Boundary of Acceptable Performance
- Teaching SRE
- [[Measuring Reliability Culture to Optimize Tradeoffs - Perspectives from an Anthropologist - SREcon24 Americas]]