[Book: Handbook of Software Reliability Engineering](https://www.cse.cuhk.edu.hk/~lyu/book/reliability/)
[[Software Reliability Engineering]]の書籍。
## Table of Contents
Preface xxiii
### Chapter 1. Introduction
#### 1.1 The Need for Reliable Software 3
#### 1.2 Software Reliability Engineering Concepts 5
#### 1.3 Book Overview 8
#### 1.4 Basic Definitions 12
![[Pasted image 20220706070541.png]]
-
#### 1.5 Technical Areas Related to the Book 19
1.5.1 Fault Prevention 19
1.5.2 Fault Removal 20
1.5.3 Fault Tolerance 20
1.5.4 Fault/Failure Forecasting 21
1.5.5 Scope of this Handbook 21
1.6 Summary 22
Problems 22
Chapter 2. Software Reliability and System Reliability
Jean-Claude Laprie and Karama Kanoun (LAAS-CNRS, France)
2.1 Introduction 27
2.2 The Dependability Concept 28
2.2.1 Basic Definitions 28
2.2.2 On the Impairments to Dependability 28
2.2.3 On the Attributes of Dependability 32
2.2.4 On the Means for Dependability 33
2.3 Failure Behavior of an X-Ware System 35
2.3.1 Atomic Systems 35
2.3.2 Systems Made up of Components 41
2.4 Failure Behavior of an X-Ware System with Service Restoration 49
2.4.1 Characterization of System Behavior 50
2.4.2 Maintenance Policies 51
2.4.3 Reliability Modeling 53
2.4.4 Availability Modeling 60
2.5 Situation with Respect to the State-of-the-Art in Reliability Evaluation 64
2.6 Summary 68
Problems 68
Chapter 3. Software Reliability Modeling Survey
William Farr (Naval Surface Warfare Center)
3.1 Introduction 71
3.2 Historical Perspective and Implementation 72
3.2.1 Historical Background 72
3.2.2 Model Classification Scheme 73
3.2.3 Model Limitations and Implementation Issues 76
3.3 Exponential Failure Time Class of Models 77
3.3.1 Jelinski-Moranda "De-Eutrophication" Model 77
3.3.2 Nonhomogeneous Poisson Process Model 80
3.3.3 Schneidewind's Model 82
3.3.4 Musa's Basic Execution Time Model 87
3.3.5 Hyperexponential Model 90
3.3.6 Others 92
3.4 Weibull and Gamma Failure Time Class of Models 93
3.4.1 Weibull Model 93
3.4.2 S-Shaped Reliability Growth Model 95
3.5 Infinite Failure Category Models 98
3.5.1 Duane's model 98
3.5.2 Geometric Model 99
3.5.3 Musa-Okumoto Logarithmic Poisson 102
3.6 Bayesian Models 104
3.6.1 Littlewood-Verrall Reliability Growth Model 105
3.6.2 Other Bayesian Models 109
3.7 Model Relationships 109
3.7.1 Generalized Exponential Model Class 109
3.7.2 Exponential Order Statistic Model Class 111
3.8 Software Reliability Prediction in Early Phases of the Life Cycle 111
3.8.1 Phase-Based Model 111
3.8.2 Predicting Software Defects from Ada Design 112
3.8.3 Rome Laboratory Work 113
3.9 Summary 114
Problems 115
Chapter 4. Techniques for Prediction Analysis and Recalibration
Sarah Brocklehurst, Bev Littlewood (City University of London)
4.1 Introduction 119
4.2 Examples of Model Disagreement and Inaccuracy 120
4.2.1 Simple Short Term Predictions 120
4.2.2 Longer Term Predictions 123
4.2.3 Model Accuracy Varies from Data Source to Data Source 126
4.2.4 Why We Cannot Select the Best Model a Priori 126
4.2.5 Discussion - a Possible Way Forward 127
4.3 Methods of Analyzing Predictive Accuracy 128
4.3.1 Basic Ideas - Recursive Comparison of Predictions with Eventual Outcomes 128
4.3.2 The Prequential Likelihood Ratio (PLR) 131
4.3.3 The U-Plot 135
4.3.4 The Y-Plot 140
4.3.5 Discussion: the Likely Nature of Prediction Errors, and How We can Detect Inaccuracy 141
4.4 Recalibration 145
4.4.1 The U-Plot as a Means of Detecting 'Bias' 145
4.4.2 The Recalibration Technique 146
4.4.3 Examples of the Power of Recalibration 147
4.5 A Worked Example 150
4.6 Discussion 156
4.6.1 Summary of the Good News: Where We Are Now 156
4.6.2 Limitations of Present Techniques 159
4.6.3 Possible Avenues for Improvement of Methods 160
4.6.4 Best Advice to Potential Users 162
4.7 Summary 163
Problems 164
Chapter 5. The Operational Profile
John Musa, Bruce Juhlin, Gene Fuoco, Diane Kropfl, and Nancy Irving (AT&T Bell Labs.)
5.1 Introduction 167
5.2 Concepts 168
5.3 Development Procedure 170
5.3.1 Customer Type List 173
5.3.2 User Type List 173
5.3.3 System Mode List 174
5.3.4 Functional Profile 176
5.3.5 Operational Profile 183
5.4 Test Selection 194
5.4.1 Selecting Operations 195
5.4.2 Regression Test 196
5.5 Special Issues 197
5.5.1 Indirect Input Variables 197
5.5.2 Updating the Operational Profile 197
5.5.3 Distributed Systems 198
5.6 Other Uses 199
5.7 Application to DEFINITY 200
5.7.1 Project Description 200
5.7.2 Development Process Description 200
5.7.3 Describing Operational Profiles 201
5.7.4 Implementing Operational Profiles 203
5.7.5 Conclusion 204
5.8 Application to FASTAR (Fast Automated Restoration) 204
5.8.1 System Description 204
5.8.2 FASTAR: SRE Implementation 206
5.8.3 FASTAR: SRE Benefits 210
5.9 Application to the Power Quality Resource System 210
5.9.1 Project Description 210
5.9.2 Developing the Operational Profile 211
5.9.3 Testing 213
5.9.4 Conclusion 214
5.10 Summary 215
Problems 215
Chapter 6. Best Current Practice of SRE
Mary Donnelly, Bill Everett, John Musa, and Geoff Wilson (AT&T Bell Labs.)
6.1 Introduction 219
6.2 Benefits and Approaches to SRE 220
6.2.1 Importance and Benefits 221
6.2.2 An SRE Success Story 221
6.2.3 SRE Costs 222
6.2.4 SRE Activities 223
6.2.5 Implementing SRE Incrementally 223
6.2.6 Implementing SRE on Existing Projects 224
6.2.7 Implementing SRE on Short-Cycle Projects 226
6.3 SRE During Feasibility and Requirements Phase 226
6.3.1 Feasibility Stage 226
6.3.2 Requirements Stage 228
6.4 SRE during Design and Implementation Phase 232
6.4.1 Design Stage 232
6.4.2 Implementation Stage 233
6.5 SRE during the System Test and Field Trial Phase 235
6.5.1 Determine Operational Profile 236
6.5.2 System Test Stage 237
6.5.3 Field Trial Stage 241
6.6 SRE during Post-Delivery and Maintenance Phase 242
6.6.1 Project Post-Release Staff Needs 242
6.6.2 Monitor Field Reliability vs. Objectives 243
6.6.3 Track Customer Satisfaction 245
6.6.4 Time New Feature Introduction by Monitoring Reliability 245
6.6.5 Guide Produce and Process Improvement with Reliability Measures 246
6.7 Getting Started with SRE 246
6.7.1 Prepare Your Organization for SRE 247
6.7.2 Find More Information or Support 250
6.7.3 Do an SRE Self-Assessment 250
6.8 Summary 252
Problems 253
Chapter 7. Software Reliability Measurement Experience
Allen Nikora (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.)
7.1 Introduction 255
7.2 Measurement Framework 256
7.2.1 Establishing Software Reliability Requirements 259
7.2.2 Setting up a Data Collection Process 266
7.2.3 Defining Data to be Collected 267
7.2.4 Choosing a Preliminary Set of Software Reliability Models 272
7.2.5 Choosing Reliability Modeling Tools 273
7.2.6 Model Application and Application Issues 273
7.2.7 Dealing with Evolving Software 276
7.2.8 Practical Limits in Modeling Ultrareliability 277
7.3 Investigation at JPL 278
7.3.1 Project Selection and Characterization 278
7.3.2 Characterization of Available Data 280
7.3.3 Experimental Results 280
7.4 Investigation at Bellcore 281
7.4.1 Project Characteristics 281
7.4.2 Data Collection 284
7.4.3 Application Results 285
7.5 Linear Combination of Model Results 289
7.5.1 Statically-Weighted Linear Combinations 290
7.5.2 Weight Determination Based on Ranking Model Results 290
7.5.3 Weight Determination Based on Changes in Prequential Likelihood 291
7.5.4 Modeling Results 291
7.5.5 Overall Project Results 292
7.5.6 Extensions and Alternatives 295
7.5.7 Long-Term Prediction Capability 298
7.6 Summary 299
Problems 300
Chapter 8. Measurement Based Analysis of Software Reliability
Ravi K. Iyer (University of Illinois) and Inhwan Lee (Tandem, Inc.)
8.1 Introduction 303
8.2 Framework 304
8.2.1 Overview 304
8.2.2 Operational vs. Development Phase Evaluation 306
8.2.3 Past Work 306
8.3 Measurement Techniques 307
8.3.1 On-Line Machine Logging 308
8.3.2 Manual Reporting 310
8.4 Preliminary Analysis of Data 312
8.4.1 Data Processing 312
8.4.2 Fault and Error Classification 314
8.4.3 Error Propagation 317
8.4.4 Error and Recovery Distributions 320
8.5 Detailed Analysis of Data 323
8.5.1 Dependency Analysis 324
8.5.2 Hardware-Related Software Errors 327
8.5.3 Evaluation of Software Fault Tolerance 328
8.5.4 Recurrences 329
8.6 Model Identification and Analysis of Models 333
8.6.1 Impact of Failures on Performance 333
8.6.2 Reliability Modeling in the Operational Phase 335
8.6.3 Failure/Error/Recovery Model 339
8.6.4 Multiple Error Model 344
8.7 Impact of System Activity 345
8.7.1 Statistical Models from Measurements 345
8.7.2 Overall System Behavior Model 348
8.8 Summary 352
Problems 353
Chapter 9. Orthogonal Defect Classification
Ram Chillarege (IBM Research)
9.1 Introduction 359
9.2 Measurement and Software 360
9.2.1 Software Defects 361
9.2.2 The Spectrum of Defect Analysis 364
9.3 Principles of ODC 367
9.3.1 The Intuition 367
9.3.2 The Design of Orthogonal Defect Classification 370
9.3.3 Necessary Condition 371
9.3.4 Sufficient Conditions 373
9.4 The Defect-Type Attribute 374
9.5 Relative Risk Assessment Using Defect Types 376
9.5.1 Subjective Aspects of Growth Curves 377
9.5.2 Combining ODC and Growth Modeling 379
9.6 The Defect Trigger Attribute 384
9.6.1 The Trigger Concept 384
9.6.2 System Test Triggers 387
9.6.3 Review and Inspection Triggers 387
9.6.4 Function Test Triggers 388
9.6.5 The Use of Triggers 389
9.7 Multidimensional Analysis 393
9.8 Deploying ODC 396
9.9 Summary 398
Problems 399
Chapter 10. Trend Analysis
Karama Kanoun and Jean-Claude Laprie (LAAS-CNRS, France)
10.1 Introduction 401
10.2 Reliability Growth Characterization 402
10.2.1 Definitions of Reliability Growth 403
10.2.2 Graphical Interpretation of the Subadditive Property 404
10.2.3 Subadditive Property Analysis 406
10.2.4 Subadditive Property and Trend Change 407
10.2.5 Some Particular Situations 408
10.2.6 Summary 409
10.3 Trend Analysis 410
10.3.1 Trend Tests 410
10.3.2 Example 419
10.3.3 Typical Results That Can Be Drawn from Trend Analyses 422
10.3.4 Summary 424
10.4 Application to Real Systems 424
10.4.1 Software of System SS4 425
10.4.2 Software of System S27 427
10.4.3 Software of System SS1 427
10.4.4 Software of System SS2 429
10.4.5 SAV 429
10.5 Extension to Static Analysis 431
10.5.1 Static Analysis Conduct 431
10.5.2 Application 433
10.6 Summary 433
Problems 435
Chapter 11. Field Data Analysis
Wendell Jones (BNR, Inc.) and Mladen Vouk (NCSU)
11.1 Introduction 439
11.2 Data Collection Principles 441
11.2.1 Introduction 441
11.2.2 Failures, Faults, and Related Data 442
11.2.3 Time 444
11.2.4 Usage 445
11.2.5 Data Granularity 446
11.2.6 Data Maintenance and Validation 447
11.2.7 Analysis Environment 448
11.3 Data Analysis Principles 449
11.3.1 Plots and Graphs 450
11.3.2 Data Modeling and Diagnostics 454
11.3.3 Diagnostics for Model Determination 455
11.3.4 Data Transformations 458
11.4 Important Topics in Analysis of Field Data 459
11.4.1 Calendar Time 461
11.4.2 Usage Time 461
11.4.3 An Example 462
11.5 Calendar-Time Reliability Analysis 463
11.5.1 Case Study (IBM Corp.) 464
11.5.2 Case Study (Hitachi) 466
11.5.3 Further Examples 468
11.6 Usage-Based Reliability Analysis 469
11.6.1 Case Study (Northern Telecom Telecommunication Systems) 469
11.6.2 Further Examples 470
11.7 Special Events 472
11.7.1 Rare Event Models 473
11.7.2 Case Study (Space Shuttle Flight Software) 476
11.8 Availability 479
11.8.1 Introduction 479
11.8.2 Measuring Availability 480
11.8.3 Empirical Unavailability 481
11.8.4 Models 483
11.9 Summary 486
Problems 487
Chapter 12. Software Metrics for Reliability Assessment
John Munson (University of Idaho) and Taghi Khoshgoftaar (Florida Atlantic University)
12.1 Introduction 493
12.2 Static Program Complexity 495
12.2.1 Software Metrics 495
12.2.2 A Domain Model of Software Attributes 496
12.2.3 Principal Components Analysis 497
12.2.4 The Usage of Metrics 499
12.2.5 Relative Program Complexity 500
12.2.6 Software Evolution 502
12.3 Dynamic Program Complexity 504
12.3.1 Execution Profile 505
12.3.2 Functional Complexity 505
12.3.3 Dynamic Aspects of Functional Complexity 507
12.3.4 Operational Complexity 509
12.4 Software Complexity and Software Quality 510
12.4.1 An Overview 510
12.4.2 An Application and Its Metrics 512
12.4.3 Multivariate Analysis in Software Quality Control 514
12.4.4 Fault Prediction Models 518
12.4.5 Enhancing Predictive Models with Increased Domain Coverage 520
12.5 Software Reliability Modeling 523
12.5.1 Reliability Modeling with Software Complexity Metrics 524
12.5.2 The Incremental Build Problem 526
12.6 Summary 527
Problems 527
Chapter 13. Software Testing and Reliability
Joseph R. Horgan (Bellcore) and Aditya P. Mathur (Purdue University)
13.1 Introduction 531
13.2 Overview of Software Testing 532
13.2.1 Kinds of Software Testing 532
13.2.2 Concepts from White-Box and Black-Box Testing 532
13.3 Operational Profiles 534
13.3.1 Difficulties in Estimating the Operational Profile 535
13.3.2 Estimating Reliability 537
13.4 Time/Structure Based Software Reliability Estimation 539
13.4.1 Definitions and Terminology 539
13.4.2 Basic Assumptions 540
13.4.3 Testing Methods and Saturation Effect 541
13.4.4 Testing Effort 541
13.4.5 Limits of Testing Methods 542
13.4.6 Empirical Basis of the Saturation Effect 543
13.4.7 Reliability Overestimation due to Saturation 545
13.4.8 Incorporating Coverage in Reliability Estimation 546
13.4.9 Filtering Failure Data Using Coverage Information 547
13.4.10 Selecting the Compression Ratio 551
13.4.11 Handling Rare Events 553
13.5 A Microscopic Model of Software Risk 554
13.5.1 A Testing-Based Model of Risk Decay 554
13.5.2 Risk Assessment: An Example 555
13.5.3 A Simple Risk Computation 558
13.5.4 A Risk Browser 560
13.5.5 The Risk Model and Software Reliability 561
13.6 Summary 563
Problems 563
Chapter 14. Fault-Tolerant Software Reliability Engineering
David McAllister and Mladen Vouk (NCSU)
14.1 Introduction 567
14.2 Present Status 568
14.3 Principles and Terminology 569
14.3.1 Result Verification 570
14.3.2 Redundancy 574
14.3.3 Failures and Faults 575
14.3.4 Adjudication by Voting 577
14.3.5 Tolerance 578
14.4 Basic Techniques 581
14.4.1 Recovery Blocks 581
14.4.2 N-Version Programming 582
14.5 Advanced Techniques 583
14.5.1 Consensus Recovery Block 583
14.5.2 Acceptance Voting 584
14.5.3 N Self-Checking Programming 584
14.6 Reliability Modeling 585
14.6.1 Diversity and Dependence of Failures 586
14.6.2 Data-Domain Modeling 589
14.6.3 Time-Domain Modeling 594
14.7 Reliability in the Presence of Inter-Version Failure Correlation 596
14.7.1 An Experiment 596
14.7.2 Failure Correlation 598
14.7.3 Consensus Voting 599
14.7.4 Consensus Recovery Block 601
14.7.5 Acceptance Voting 603
14.8 Development and Testing of Multi-Version Fault-Tolerant Software 604
14.8.1 Requirements and Design 605
14.8.2 Verification, Validation and Testing 606
14.8.3 Cost of Fault-Tolerant Software 607
14.9 Summary 609
Problems 609
Chapter 15. Software Reliability Analysis using Fault Trees
Joanne Bechta Dugan (University of Virginia)
15.1 Introduction 615
15.2 Fault Tree Modeling 615
15.2.1 Cutset Generation 617
15.2.2 Fault Tree Analysis 619
15.3 Fault Trees as a Design Aid for Software Systems 622
15.4 Safety Validation Using Fault Trees 623
15.5 Analysis of Fault Tolerant Software Systems 627
15.5.1 Fault Tree Model for Recovery Block System 629
15.5.2 Fault Tree Model for N-Version Programming System 630
15.5.3 Fault Tree Model for N Self-Checking Programming System 632
15.6 Qualitative Analysis of Fault Tolerant Software 635
15.6.1 Methodology for Parameter Estimation from Experimental Data 635
15.6.2 A Case Study in Parameter Estimation 639
15.6.3 Comparative Analysis of Three Software Fault Tolerant Systems 642
15.7 System-Level Analysis of Hardware and Software System 645
15.7.1 System Reliability/Safety Model for DRB 647
15.7.2 System Reliability/Safety Model for NVP 648
15.7.3 System Reliability/Safety Model for NSCP 650
15.7.4 A Case Study in System-Level Analysis 651
15.8 Summary 657
Problems 657
Chapter 16. Software Reliability Simulation
Robert Tausworthe (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.)
16.1 Introduction 661
16.2 Reliability Simulation 662
16.2.1 The Need for Dynamic Simulation 663
16.2.2 Dynamic Simulation Approaches 664
16.3 The Reliability Process 665
16.3.1 The Nature of the Process 666
16.3.2 Structures and Flows 667
16.3.3 Interdependencies among Elements 668
16.3.4 Software Environment Characteristics 669
16.4 Artifact-Based Simulation 669
16.4.1 Simulator Architecture 670
16.4.2 Results 675
16.5 Rate-Based Simulation 676
16.5.1 Event Process Statistics 677
16.5.2 Single-Event Process Simulation 678
16.5.3 Recurrent Event Statistics 679
16.5.4 Recurrent Event Simulation 681
16.5.5 Secondary Event Simulation 682
16.5.6 Limited Growth Simulation 683
16.5.7 The General Simulation Algorithm 684
16.6 Rate-Based Reliability 686
16.6.1 Rate Functions of Conventional Models 686
16.6.2 Simulator Architecture 687
16.6.3 Display of Results 689
16.7 The Galileo Project Application 690
16.7.1 Simulation Experiments and Results 691
16.7.2 Comparisons with Other Software Reliability Models 694
16.8 Summary 696
Problems 697
Chapter 17. Neural Networks for SRE
Nachimu Karunanithi (Bellcore) and Yashwant Malaiya (Colorado State University)
17.1 Introduction 699
17.2 Neural Networks 700
17.2.1 Processing Unit 700
17.2.2 Architecture 702
17.2.3 Learning Algorithms 705
17.2.4 Backpropagation Learning 705
17.2.5 Cascade-correlation Learning Architecture 707
17.3 Application of Neural Networks for Software Reliability 709
17.3.1 Dynamic Reliability Growth Modeling 709
17.3.2 Identifying Fault-Prone Modules 710
17.4 Software Reliability Growth Modeling 710
17.4.1 Training Regimes 712
17.4.2 Data Representation Issue 712
17.4.3 A Prediction Experiment 713
17.4.4 Analysis of Neural Network Models 718
17.5 Identification of Fault-Prone Software Modules 718
17.5.1 Identification of Fault-Prone Modules Using Software Metrics 719
17.5.2 Data Set Used 719
17.5.3 Classifiers Compared 720
17.5.4 Data Representation 722
17.5.5 Training Data Selection 723
17.5.6 Experimental Approach 723
17.5.7 Results 723
17.6 Summary 726
Problems 726
Appendix A. Software Reliability Tools
729
Appendix B. Review of Reliability Theory, Analytical Techniques, and Basic Statistics