Handbook of Software Reliability Engineering

[Book: Handbook of Software Reliability Engineering](https://www.cse.cuhk.edu.hk/~lyu/book/reliability/) [[Software Reliability Engineering]]の書籍。 ## Table of Contents Preface xxiii ### Chapter 1. Introduction #### 1.1 The Need for Reliable Software 3 #### 1.2 Software Reliability Engineering Concepts 5 #### 1.3 Book Overview 8 #### 1.4 Basic Definitions 12 ![[Pasted image 20220706070541.png]] - #### 1.5 Technical Areas Related to the Book 19 1.5.1 Fault Prevention 19 1.5.2 Fault Removal 20 1.5.3 Fault Tolerance 20 1.5.4 Fault/Failure Forecasting 21 1.5.5 Scope of this Handbook 21 1.6 Summary 22 Problems 22 Chapter 2. Software Reliability and System Reliability Jean-Claude Laprie and Karama Kanoun (LAAS-CNRS, France) 2.1 Introduction 27 2.2 The Dependability Concept 28 2.2.1 Basic Definitions 28 2.2.2 On the Impairments to Dependability 28 2.2.3 On the Attributes of Dependability 32 2.2.4 On the Means for Dependability 33 2.3 Failure Behavior of an X-Ware System 35 2.3.1 Atomic Systems 35 2.3.2 Systems Made up of Components 41 2.4 Failure Behavior of an X-Ware System with Service Restoration 49 2.4.1 Characterization of System Behavior 50 2.4.2 Maintenance Policies 51 2.4.3 Reliability Modeling 53 2.4.4 Availability Modeling 60 2.5 Situation with Respect to the State-of-the-Art in Reliability Evaluation 64 2.6 Summary 68 Problems 68 Chapter 3. Software Reliability Modeling Survey William Farr (Naval Surface Warfare Center) 3.1 Introduction 71 3.2 Historical Perspective and Implementation 72 3.2.1 Historical Background 72 3.2.2 Model Classification Scheme 73 3.2.3 Model Limitations and Implementation Issues 76 3.3 Exponential Failure Time Class of Models 77 3.3.1 Jelinski-Moranda "De-Eutrophication" Model 77 3.3.2 Nonhomogeneous Poisson Process Model 80 3.3.3 Schneidewind's Model 82 3.3.4 Musa's Basic Execution Time Model 87 3.3.5 Hyperexponential Model 90 3.3.6 Others 92 3.4 Weibull and Gamma Failure Time Class of Models 93 3.4.1 Weibull Model 93 3.4.2 S-Shaped Reliability Growth Model 95 3.5 Infinite Failure Category Models 98 3.5.1 Duane's model 98 3.5.2 Geometric Model 99 3.5.3 Musa-Okumoto Logarithmic Poisson 102 3.6 Bayesian Models 104 3.6.1 Littlewood-Verrall Reliability Growth Model 105 3.6.2 Other Bayesian Models 109 3.7 Model Relationships 109 3.7.1 Generalized Exponential Model Class 109 3.7.2 Exponential Order Statistic Model Class 111 3.8 Software Reliability Prediction in Early Phases of the Life Cycle 111 3.8.1 Phase-Based Model 111 3.8.2 Predicting Software Defects from Ada Design 112 3.8.3 Rome Laboratory Work 113 3.9 Summary 114 Problems 115 Chapter 4. Techniques for Prediction Analysis and Recalibration Sarah Brocklehurst, Bev Littlewood (City University of London) 4.1 Introduction 119 4.2 Examples of Model Disagreement and Inaccuracy 120 4.2.1 Simple Short Term Predictions 120 4.2.2 Longer Term Predictions 123 4.2.3 Model Accuracy Varies from Data Source to Data Source 126 4.2.4 Why We Cannot Select the Best Model a Priori 126 4.2.5 Discussion - a Possible Way Forward 127 4.3 Methods of Analyzing Predictive Accuracy 128 4.3.1 Basic Ideas - Recursive Comparison of Predictions with Eventual Outcomes 128 4.3.2 The Prequential Likelihood Ratio (PLR) 131 4.3.3 The U-Plot 135 4.3.4 The Y-Plot 140 4.3.5 Discussion: the Likely Nature of Prediction Errors, and How We can Detect Inaccuracy 141 4.4 Recalibration 145 4.4.1 The U-Plot as a Means of Detecting 'Bias' 145 4.4.2 The Recalibration Technique 146 4.4.3 Examples of the Power of Recalibration 147 4.5 A Worked Example 150 4.6 Discussion 156 4.6.1 Summary of the Good News: Where We Are Now 156 4.6.2 Limitations of Present Techniques 159 4.6.3 Possible Avenues for Improvement of Methods 160 4.6.4 Best Advice to Potential Users 162 4.7 Summary 163 Problems 164 Chapter 5. The Operational Profile John Musa, Bruce Juhlin, Gene Fuoco, Diane Kropfl, and Nancy Irving (AT&T Bell Labs.) 5.1 Introduction 167 5.2 Concepts 168 5.3 Development Procedure 170 5.3.1 Customer Type List 173 5.3.2 User Type List 173 5.3.3 System Mode List 174 5.3.4 Functional Profile 176 5.3.5 Operational Profile 183 5.4 Test Selection 194 5.4.1 Selecting Operations 195 5.4.2 Regression Test 196 5.5 Special Issues 197 5.5.1 Indirect Input Variables 197 5.5.2 Updating the Operational Profile 197 5.5.3 Distributed Systems 198 5.6 Other Uses 199 5.7 Application to DEFINITY 200 5.7.1 Project Description 200 5.7.2 Development Process Description 200 5.7.3 Describing Operational Profiles 201 5.7.4 Implementing Operational Profiles 203 5.7.5 Conclusion 204 5.8 Application to FASTAR (Fast Automated Restoration) 204 5.8.1 System Description 204 5.8.2 FASTAR: SRE Implementation 206 5.8.3 FASTAR: SRE Benefits 210 5.9 Application to the Power Quality Resource System 210 5.9.1 Project Description 210 5.9.2 Developing the Operational Profile 211 5.9.3 Testing 213 5.9.4 Conclusion 214 5.10 Summary 215 Problems 215 Chapter 6. Best Current Practice of SRE Mary Donnelly, Bill Everett, John Musa, and Geoff Wilson (AT&T Bell Labs.) 6.1 Introduction 219 6.2 Benefits and Approaches to SRE 220 6.2.1 Importance and Benefits 221 6.2.2 An SRE Success Story 221 6.2.3 SRE Costs 222 6.2.4 SRE Activities 223 6.2.5 Implementing SRE Incrementally 223 6.2.6 Implementing SRE on Existing Projects 224 6.2.7 Implementing SRE on Short-Cycle Projects 226 6.3 SRE During Feasibility and Requirements Phase 226 6.3.1 Feasibility Stage 226 6.3.2 Requirements Stage 228 6.4 SRE during Design and Implementation Phase 232 6.4.1 Design Stage 232 6.4.2 Implementation Stage 233 6.5 SRE during the System Test and Field Trial Phase 235 6.5.1 Determine Operational Profile 236 6.5.2 System Test Stage 237 6.5.3 Field Trial Stage 241 6.6 SRE during Post-Delivery and Maintenance Phase 242 6.6.1 Project Post-Release Staff Needs 242 6.6.2 Monitor Field Reliability vs. Objectives 243 6.6.3 Track Customer Satisfaction 245 6.6.4 Time New Feature Introduction by Monitoring Reliability 245 6.6.5 Guide Produce and Process Improvement with Reliability Measures 246 6.7 Getting Started with SRE 246 6.7.1 Prepare Your Organization for SRE 247 6.7.2 Find More Information or Support 250 6.7.3 Do an SRE Self-Assessment 250 6.8 Summary 252 Problems 253 Chapter 7. Software Reliability Measurement Experience Allen Nikora (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.) 7.1 Introduction 255 7.2 Measurement Framework 256 7.2.1 Establishing Software Reliability Requirements 259 7.2.2 Setting up a Data Collection Process 266 7.2.3 Defining Data to be Collected 267 7.2.4 Choosing a Preliminary Set of Software Reliability Models 272 7.2.5 Choosing Reliability Modeling Tools 273 7.2.6 Model Application and Application Issues 273 7.2.7 Dealing with Evolving Software 276 7.2.8 Practical Limits in Modeling Ultrareliability 277 7.3 Investigation at JPL 278 7.3.1 Project Selection and Characterization 278 7.3.2 Characterization of Available Data 280 7.3.3 Experimental Results 280 7.4 Investigation at Bellcore 281 7.4.1 Project Characteristics 281 7.4.2 Data Collection 284 7.4.3 Application Results 285 7.5 Linear Combination of Model Results 289 7.5.1 Statically-Weighted Linear Combinations 290 7.5.2 Weight Determination Based on Ranking Model Results 290 7.5.3 Weight Determination Based on Changes in Prequential Likelihood 291 7.5.4 Modeling Results 291 7.5.5 Overall Project Results 292 7.5.6 Extensions and Alternatives 295 7.5.7 Long-Term Prediction Capability 298 7.6 Summary 299 Problems 300 Chapter 8. Measurement Based Analysis of Software Reliability Ravi K. Iyer (University of Illinois) and Inhwan Lee (Tandem, Inc.) 8.1 Introduction 303 8.2 Framework 304 8.2.1 Overview 304 8.2.2 Operational vs. Development Phase Evaluation 306 8.2.3 Past Work 306 8.3 Measurement Techniques 307 8.3.1 On-Line Machine Logging 308 8.3.2 Manual Reporting 310 8.4 Preliminary Analysis of Data 312 8.4.1 Data Processing 312 8.4.2 Fault and Error Classification 314 8.4.3 Error Propagation 317 8.4.4 Error and Recovery Distributions 320 8.5 Detailed Analysis of Data 323 8.5.1 Dependency Analysis 324 8.5.2 Hardware-Related Software Errors 327 8.5.3 Evaluation of Software Fault Tolerance 328 8.5.4 Recurrences 329 8.6 Model Identification and Analysis of Models 333 8.6.1 Impact of Failures on Performance 333 8.6.2 Reliability Modeling in the Operational Phase 335 8.6.3 Failure/Error/Recovery Model 339 8.6.4 Multiple Error Model 344 8.7 Impact of System Activity 345 8.7.1 Statistical Models from Measurements 345 8.7.2 Overall System Behavior Model 348 8.8 Summary 352 Problems 353 Chapter 9. Orthogonal Defect Classification Ram Chillarege (IBM Research) 9.1 Introduction 359 9.2 Measurement and Software 360 9.2.1 Software Defects 361 9.2.2 The Spectrum of Defect Analysis 364 9.3 Principles of ODC 367 9.3.1 The Intuition 367 9.3.2 The Design of Orthogonal Defect Classification 370 9.3.3 Necessary Condition 371 9.3.4 Sufficient Conditions 373 9.4 The Defect-Type Attribute 374 9.5 Relative Risk Assessment Using Defect Types 376 9.5.1 Subjective Aspects of Growth Curves 377 9.5.2 Combining ODC and Growth Modeling 379 9.6 The Defect Trigger Attribute 384 9.6.1 The Trigger Concept 384 9.6.2 System Test Triggers 387 9.6.3 Review and Inspection Triggers 387 9.6.4 Function Test Triggers 388 9.6.5 The Use of Triggers 389 9.7 Multidimensional Analysis 393 9.8 Deploying ODC 396 9.9 Summary 398 Problems 399 Chapter 10. Trend Analysis Karama Kanoun and Jean-Claude Laprie (LAAS-CNRS, France) 10.1 Introduction 401 10.2 Reliability Growth Characterization 402 10.2.1 Definitions of Reliability Growth 403 10.2.2 Graphical Interpretation of the Subadditive Property 404 10.2.3 Subadditive Property Analysis 406 10.2.4 Subadditive Property and Trend Change 407 10.2.5 Some Particular Situations 408 10.2.6 Summary 409 10.3 Trend Analysis 410 10.3.1 Trend Tests 410 10.3.2 Example 419 10.3.3 Typical Results That Can Be Drawn from Trend Analyses 422 10.3.4 Summary 424 10.4 Application to Real Systems 424 10.4.1 Software of System SS4 425 10.4.2 Software of System S27 427 10.4.3 Software of System SS1 427 10.4.4 Software of System SS2 429 10.4.5 SAV 429 10.5 Extension to Static Analysis 431 10.5.1 Static Analysis Conduct 431 10.5.2 Application 433 10.6 Summary 433 Problems 435 Chapter 11. Field Data Analysis Wendell Jones (BNR, Inc.) and Mladen Vouk (NCSU) 11.1 Introduction 439 11.2 Data Collection Principles 441 11.2.1 Introduction 441 11.2.2 Failures, Faults, and Related Data 442 11.2.3 Time 444 11.2.4 Usage 445 11.2.5 Data Granularity 446 11.2.6 Data Maintenance and Validation 447 11.2.7 Analysis Environment 448 11.3 Data Analysis Principles 449 11.3.1 Plots and Graphs 450 11.3.2 Data Modeling and Diagnostics 454 11.3.3 Diagnostics for Model Determination 455 11.3.4 Data Transformations 458 11.4 Important Topics in Analysis of Field Data 459 11.4.1 Calendar Time 461 11.4.2 Usage Time 461 11.4.3 An Example 462 11.5 Calendar-Time Reliability Analysis 463 11.5.1 Case Study (IBM Corp.) 464 11.5.2 Case Study (Hitachi) 466 11.5.3 Further Examples 468 11.6 Usage-Based Reliability Analysis 469 11.6.1 Case Study (Northern Telecom Telecommunication Systems) 469 11.6.2 Further Examples 470 11.7 Special Events 472 11.7.1 Rare Event Models 473 11.7.2 Case Study (Space Shuttle Flight Software) 476 11.8 Availability 479 11.8.1 Introduction 479 11.8.2 Measuring Availability 480 11.8.3 Empirical Unavailability 481 11.8.4 Models 483 11.9 Summary 486 Problems 487 Chapter 12. Software Metrics for Reliability Assessment John Munson (University of Idaho) and Taghi Khoshgoftaar (Florida Atlantic University) 12.1 Introduction 493 12.2 Static Program Complexity 495 12.2.1 Software Metrics 495 12.2.2 A Domain Model of Software Attributes 496 12.2.3 Principal Components Analysis 497 12.2.4 The Usage of Metrics 499 12.2.5 Relative Program Complexity 500 12.2.6 Software Evolution 502 12.3 Dynamic Program Complexity 504 12.3.1 Execution Profile 505 12.3.2 Functional Complexity 505 12.3.3 Dynamic Aspects of Functional Complexity 507 12.3.4 Operational Complexity 509 12.4 Software Complexity and Software Quality 510 12.4.1 An Overview 510 12.4.2 An Application and Its Metrics 512 12.4.3 Multivariate Analysis in Software Quality Control 514 12.4.4 Fault Prediction Models 518 12.4.5 Enhancing Predictive Models with Increased Domain Coverage 520 12.5 Software Reliability Modeling 523 12.5.1 Reliability Modeling with Software Complexity Metrics 524 12.5.2 The Incremental Build Problem 526 12.6 Summary 527 Problems 527 Chapter 13. Software Testing and Reliability Joseph R. Horgan (Bellcore) and Aditya P. Mathur (Purdue University) 13.1 Introduction 531 13.2 Overview of Software Testing 532 13.2.1 Kinds of Software Testing 532 13.2.2 Concepts from White-Box and Black-Box Testing 532 13.3 Operational Profiles 534 13.3.1 Difficulties in Estimating the Operational Profile 535 13.3.2 Estimating Reliability 537 13.4 Time/Structure Based Software Reliability Estimation 539 13.4.1 Definitions and Terminology 539 13.4.2 Basic Assumptions 540 13.4.3 Testing Methods and Saturation Effect 541 13.4.4 Testing Effort 541 13.4.5 Limits of Testing Methods 542 13.4.6 Empirical Basis of the Saturation Effect 543 13.4.7 Reliability Overestimation due to Saturation 545 13.4.8 Incorporating Coverage in Reliability Estimation 546 13.4.9 Filtering Failure Data Using Coverage Information 547 13.4.10 Selecting the Compression Ratio 551 13.4.11 Handling Rare Events 553 13.5 A Microscopic Model of Software Risk 554 13.5.1 A Testing-Based Model of Risk Decay 554 13.5.2 Risk Assessment: An Example 555 13.5.3 A Simple Risk Computation 558 13.5.4 A Risk Browser 560 13.5.5 The Risk Model and Software Reliability 561 13.6 Summary 563 Problems 563 Chapter 14. Fault-Tolerant Software Reliability Engineering David McAllister and Mladen Vouk (NCSU) 14.1 Introduction 567 14.2 Present Status 568 14.3 Principles and Terminology 569 14.3.1 Result Verification 570 14.3.2 Redundancy 574 14.3.3 Failures and Faults 575 14.3.4 Adjudication by Voting 577 14.3.5 Tolerance 578 14.4 Basic Techniques 581 14.4.1 Recovery Blocks 581 14.4.2 N-Version Programming 582 14.5 Advanced Techniques 583 14.5.1 Consensus Recovery Block 583 14.5.2 Acceptance Voting 584 14.5.3 N Self-Checking Programming 584 14.6 Reliability Modeling 585 14.6.1 Diversity and Dependence of Failures 586 14.6.2 Data-Domain Modeling 589 14.6.3 Time-Domain Modeling 594 14.7 Reliability in the Presence of Inter-Version Failure Correlation 596 14.7.1 An Experiment 596 14.7.2 Failure Correlation 598 14.7.3 Consensus Voting 599 14.7.4 Consensus Recovery Block 601 14.7.5 Acceptance Voting 603 14.8 Development and Testing of Multi-Version Fault-Tolerant Software 604 14.8.1 Requirements and Design 605 14.8.2 Verification, Validation and Testing 606 14.8.3 Cost of Fault-Tolerant Software 607 14.9 Summary 609 Problems 609 Chapter 15. Software Reliability Analysis using Fault Trees Joanne Bechta Dugan (University of Virginia) 15.1 Introduction 615 15.2 Fault Tree Modeling 615 15.2.1 Cutset Generation 617 15.2.2 Fault Tree Analysis 619 15.3 Fault Trees as a Design Aid for Software Systems 622 15.4 Safety Validation Using Fault Trees 623 15.5 Analysis of Fault Tolerant Software Systems 627 15.5.1 Fault Tree Model for Recovery Block System 629 15.5.2 Fault Tree Model for N-Version Programming System 630 15.5.3 Fault Tree Model for N Self-Checking Programming System 632 15.6 Qualitative Analysis of Fault Tolerant Software 635 15.6.1 Methodology for Parameter Estimation from Experimental Data 635 15.6.2 A Case Study in Parameter Estimation 639 15.6.3 Comparative Analysis of Three Software Fault Tolerant Systems 642 15.7 System-Level Analysis of Hardware and Software System 645 15.7.1 System Reliability/Safety Model for DRB 647 15.7.2 System Reliability/Safety Model for NVP 648 15.7.3 System Reliability/Safety Model for NSCP 650 15.7.4 A Case Study in System-Level Analysis 651 15.8 Summary 657 Problems 657 Chapter 16. Software Reliability Simulation Robert Tausworthe (Jet Propulsion Laboratory) and Michael R. Lyu (AT&T Bell Labs.) 16.1 Introduction 661 16.2 Reliability Simulation 662 16.2.1 The Need for Dynamic Simulation 663 16.2.2 Dynamic Simulation Approaches 664 16.3 The Reliability Process 665 16.3.1 The Nature of the Process 666 16.3.2 Structures and Flows 667 16.3.3 Interdependencies among Elements 668 16.3.4 Software Environment Characteristics 669 16.4 Artifact-Based Simulation 669 16.4.1 Simulator Architecture 670 16.4.2 Results 675 16.5 Rate-Based Simulation 676 16.5.1 Event Process Statistics 677 16.5.2 Single-Event Process Simulation 678 16.5.3 Recurrent Event Statistics 679 16.5.4 Recurrent Event Simulation 681 16.5.5 Secondary Event Simulation 682 16.5.6 Limited Growth Simulation 683 16.5.7 The General Simulation Algorithm 684 16.6 Rate-Based Reliability 686 16.6.1 Rate Functions of Conventional Models 686 16.6.2 Simulator Architecture 687 16.6.3 Display of Results 689 16.7 The Galileo Project Application 690 16.7.1 Simulation Experiments and Results 691 16.7.2 Comparisons with Other Software Reliability Models 694 16.8 Summary 696 Problems 697 Chapter 17. Neural Networks for SRE Nachimu Karunanithi (Bellcore) and Yashwant Malaiya (Colorado State University) 17.1 Introduction 699 17.2 Neural Networks 700 17.2.1 Processing Unit 700 17.2.2 Architecture 702 17.2.3 Learning Algorithms 705 17.2.4 Backpropagation Learning 705 17.2.5 Cascade-correlation Learning Architecture 707 17.3 Application of Neural Networks for Software Reliability 709 17.3.1 Dynamic Reliability Growth Modeling 709 17.3.2 Identifying Fault-Prone Modules 710 17.4 Software Reliability Growth Modeling 710 17.4.1 Training Regimes 712 17.4.2 Data Representation Issue 712 17.4.3 A Prediction Experiment 713 17.4.4 Analysis of Neural Network Models 718 17.5 Identification of Fault-Prone Software Modules 718 17.5.1 Identification of Fault-Prone Modules Using Software Metrics 719 17.5.2 Data Set Used 719 17.5.3 Classifiers Compared 720 17.5.4 Data Representation 722 17.5.5 Training Data Selection 723 17.5.6 Experimental Approach 723 17.5.7 Results 723 17.6 Summary 726 Problems 726 Appendix A. Software Reliability Tools 729 Appendix B. Review of Reliability Theory, Analytical Techniques, and Basic Statistics