Sage Using Unsupervised Learning for Scalable Perf

# Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices Authors: 1Yu Gan, 1Mingyu Liang, 2Sundar Dev, 2David Lo, and 1Christina Delimitrou Bibtex: @article{gan2021sage, title={Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices}, author={Gan, Yu and Liang, Mingyu and Dev, Sundar and Lo, David and Delimitrou, Christina}, journal={arXiv preprint arXiv:2101.00267}, year={2021} } Conference/Journal: Arxiv Created: January 12, 2021 4:47 PM URL: https://arxiv.org/pdf/2101.00267.pdf Year: 2021 ## Abstract Cloud applications are increasingly shifting from largemonolithic services to complex graphs of loosely-coupled mi-croservices. Despite the advantages of modularity and elastic-ity microservices offer, they also complicate cluster manage-ment and performance debugging, as dependencies betweentiers introduce backpressure and cascading QoS violations.We present Sage, a machine learning-driven root causeanalysis system for interactive cloud microservices. Sageleverages unsupervised ML models to circumvent the overheadof trace labeling, captures the impact of dependencies betweenmicroservices to determine the root cause of unpredictableperformance online, and applies corrective actions to recovera cloud service’s QoS. In experiments on both dedicated localclusters and large clusters on Google Compute Engine weshow that Sage consistently achieves over 93% accuracy incorrectly identifying the root cause of QoS violations, andimproves performance predictability. クラウドアプリケーションは、大規模なモノリシックサービスから、ゆるく結合されたマイクロサービスの複雑なグラフへと移行しつつあります。マイクロサービスは、モジュール性と弾力性があるという利点があるにもかかわらず、クラスタ管理やパフォーマンスデバッグを複雑にしています。 Sageは、教師なしのMLモデルを平均化してトレースラベリングのオーバーヘッドを回避し、マイクロサービス間の依存関係の影響を捕捉して予測不可能なパフォーマンスの根本原因をオンラインで特定し、是正措置を適用してクラウドサービスのQoSを回復させる。Google Compute Engine上の専用ローカルクラスタと大規模クラスタの両方での実験では、Sageは一貫して93%以上の精度でQoS違反の根本原因を誤って特定し、パフォーマンスの予測可能性を向上させることを示しています。 [[Sage Using Unsupervised Learning for Scalable Perf__translations]]