> Formally, assume 𝑆 = {𝑠1, 𝑠2, ..., 𝑠𝑛 } represent the set of all components or services in the system, and let 𝐿 = {𝑙1, 𝑙2, ..., 𝑙𝑚 } represent the set of log entries or metric data points. The goal of failure localization is to identify the subset 𝑆′ ⊆ 𝑆 and 𝐿′ ⊆ 𝐿 where the anomalies are most likely to have originated. This process can be represented as Equation 6, where 𝐴 represents the observed anomalies, and 𝑃 (𝑆′, 𝐿′ | 𝐴) is the probability that the components in 𝑆′ and log entries or metrics in 𝐿′ are responsible for the observed anomalies.
$
\left(S^{\prime}, L^{\prime}\right)=\underset{S^{\prime}, L^{\prime}}{\arg \max } P\left(S^{\prime}, L^{\prime} \mid A\right)
$
[[2025__CSUR__A Survey of AIOps for Failure Management in the Era of Large Language Models]]より転載。