[[notes/sre/SRE|サイト信頼性工学]]に関する既存の書籍が、最初から書かれていたらと思いませんか? SRE になるにはどうすればよいか、SRE のような考え方をするにはどうすればよいか、組織で SRE 機能を構築し、成長させるにはどうすればよいか、このようなニーズに対応するために、SRE と SRE 文化を理解するための基本的な基礎知識、SRE になるための個人的なアドバイス、組織で SRE を成功させ、発展させるためのガイダンスという 3 つのセクションを設けています。 著者のデイビッド・ブランク・エデルマン(David Blank-Edelman)は、あなたの個人的なガイドとして、SRE の考え方、SRE の文化、SRE の提唱といったテーマについて説明します。 SRE の仕事を始め、採用されるために必要なこと、採用された後の仕事内容 SRE を組織に導入するために必要なこと、SRE が組織で成功するために必要なこと SRE を取り巻くビジネス関係者や経営陣との協働の仕方 SRE が組織で時間をかけて成長し、成熟していく方法 SRE になる準備、あるいは組織に SRE を導入する準備はできていますか? 本書がそのお手伝いをします。 ## 目次 Preface Where Are You Right Now? Navigating This Book We Are Going to Need a Bigger Boat I’m Not the Lorax Ready? Convention Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments Coping I. Introduction to SRE 1. First Things First What Is SRE? Reliability Appropriate Sustainable (Other Words) Origin Story SRE and Its Relationship to DevOps Part 1: SRE Implements Class DevOps Part 2: SRE Is to Reliability as DevOps Is to Delivery Part 3: It’s All About the Direction of Attention Onward to SRE Fundamentals 2. SRE Mindset Zooming Out to Maintain a Systems Perspective Creating and Nurturing Feedback Loops Keeping the Focus on the Customer Relationships (to People and Things) SRE’s Relationship to (Other) People SRE’s Relationship to Failure and Errors The Mindset in Motion 3. SRE Culture Happy Fish, um, People How to Create a Supportive Culture for SRE Culture as a Vehicle or a Lever What Do You Want SRE to Be/Do? Thinking About Assembling the Culture You Want and Need I Still Don’t Know Where to Start Nurturing Your Nascent SRE Culture Keep On Keeping On 4. Talking About SRE (SRE Advocacy) Why It Matters, Even Early in Your Experience with SRE When It Matters Get Your Story (and Audience) Straight Some Story Ideas Other People’s Stories Secondary Stories The Challenges the Stories Present One Last Tip II. Becoming SRE for the Individual 5. Preparing to Become an SRE Do You Need to Know How to Code? Do You Need a Computer Science Degree? Fundamentals Single/Basic Systems (and Their Failure Modes) Distributed Systems (and Their Failure Modes) Statistics and Data Visualization Storytelling Be a Good Person Bonus Round Non-Abstract Large System Design (NALSD) Resilience Engineering Chaos Engineering and Performance Engineering Machine Learning and Artificial Intelligence What Else? 6. Getting to SRE from… Are You Already an SRE? From Student to SRE From Dev/SWE to SRE From Sysadmin/IT to SRE Generic Advice Technical Role X to SRE Nontechnical Role X to SRE Track Your Progress to Keep On Keeping On 7. Hints for Getting Hired as an SRE Scrutinizing the Job Posting Preparing for an SRE Interview What to Ask at the SRE Interview Win! 8. A Day in the Life of an SRE Modes of an SRE’s Day Incident/Outage Mode Postincident Learning Mode Builder/Project/Learn Mode Architecture Mode Management Mode Planning Mode Collaboration Mode Recovery and Self-Care Mode Balance Make a Day in the Life a Good Day 9. Establishing a Relationship to Toil Defining Toil with More Precision Whose Toil Are We Talking About? Why Do SREs Care About Toil? The Dynamics of Toil: Early Versus Established Dealing with Toil Intermediate to Advanced Toil Reduction What Are You Going to Do About It? 10. Learning from Failure Talking About Failure Postincident Reviews Postincident Reviews: The Basics Postincident Reviews: The Process Postincident Reviews: Common Traps Learning from Failure Through Resilience Engineering Learning from Failure via Chaos Engineering Learning from Failure: Next Steps III. Becoming SRE for the Organization 11. Organizational Factors for Success Contributing Factor 1: What’s the Problem? Contributing Factor 2: What Is the Org Willing to Do to Get There? Contributing Factor 3: Does the Org Have the Requisite Patience? Contributing Factor 4: Can We Collaborate? Contributing Factor 5: Does the Org Make Decisions Based on Data? Contributing Factor 6: Can the Org Learn and Act on What It Learns? Contributing Factor 7: Can You Make a Difference? Contributing Factor 8: Can You See (and Address) the Friction in the System? The Fine Print It’s All About Organizational Values 12. How SRE Can Fail Contributing Factor 1: Title Flipping to Create SREs Contributing Factor 2: Converting Tier 3 Support to SRE Contributing Factor 3: On Call and That’s All Contributing Factor 4: Wrong Org Chart Contributing Factor 5: SRE by Rote Contributing Factor 6: Gatekeeping Contributing Factor 7: Death Through Success Contributing Factor 8: A Collection of Smaller Factors How to “SRE” Your SRE Failure 13. SRE from a Business Perspective Communicating About SRE Talking to the Business About Reliability Selling SRE Communicating Success Back to the Business Proving the Success of an SRE Group to Others Budgeting for SRE First Budget Request Talking About Funding Re-Up Conversations Funding Models SRE Alignment Models for Engagement Why Not the Embedded Model? Why a Separate Org? Avoiding the Pager Monkey or Toil Bucket Traps SRE Teams Choosing Headcount Sizes How Do You Know When an SRE Team Might Be in Trouble? Alert Noise as a Signal of Team Health SRE Promotions Turning Teams Down From the Author: I Would Like to Hear from You 14. The Dickerson Hierarchy of Reliability (A Good Place to Start) The Dickerson Hierarchy of Reliability Level 1: Monitoring/Observability Level 2: Incident Response Level 3: Postincident Review Level 4: Testing/Release (Deployment) Level 5: Provisioning/Capacity Planning Levels 6 and 7: Development Process and Product Design Wrong Turns You Know You’ve Taken a Wrong Turn When… Positive Signs 15. Fitting SRE into Your Organization Pre-role and Pre-team Practices Integration Models Centralized/Partnered Model Distributed/Embedded Model Hybrid Model How to Choose Between These Models Creating and Nurturing the Right Feedback Loops Feedback Loops and Data Feedback Loops and Iteration Feedback Loops and Planning for Iteration How and Where to Insert These Feedback Loops into the Organization Signs of Success 16. SRE Organizational Evolutionary Stages Stage 1: The Firefighter Stage 2: The Gatekeeper Stage 3: The Advocate Stage 4: The Partner Stage 5: The Engineer Caveat Implementer 17. Growing SRE in Your Org How Do You Know When to Scale? Scaling 0 to 1 Scaling 1 to 6 Scaling 6 to 18 Scaling 18 to 48 Scaling 48 to 108 (and Beyond) Growing SRE’s Leadership Representation 18. Conclusion A. Letters to a Young SRE (Apologies to Rilke) John Amori Fred Hebert Aju Tamang Daniel Gentleman Joanna Wijntjes Fabrizio Waldner Graham Poulter Jamie Wilkinson Andrew Howden Pedro Alves Balasundaram N Eduardo Spotti Ian Bartholomew Olivier Duquesne Ralph Pritchard David Caudill Alex Hidalgo Effie Mouzeli B. Advice from Former SREs Dina Levitan Sara Smollett Andrew Fong Scott MacFiggen C. SRE Resources Core Books “SRE and…” Books Events SREcon Vendor SRE Single-Day Events DevOps Event Tracks/Sessions SRE-Adjacent Niche Events SRE Video Content SRE-Specific Podcasts SRE-Specific Email Newsletters Online Forums Historical Document Curated Link Collections