fold-k4-from-2026-06-16-to-2026-06-17-n16

Level-4 fold of 16 log entries spanning 2026-06-16 to 2026-06-17. Dominant themes: アラート管理の体系的カタログ化(5+9+3+10=27 本の一括取り込みで介入点の 5 層構造が確定)、LLM × 時系列異常検知の分岐(LLMAD/ChatTS/Time-RA/ARGOS の 4 路線の役割分化)、GLM ファミリーの系統樹完成と LLM 評価手法の産業化(CursorBench、SWE-bench 課題)。 ## Child Entries | Date | Op | Title | Page | Summary (extractive) | |---|---|---|---|---| | 2026-06-17 | ingest-slides | AI時代に向けたクラウドにおける信頼性エンジニアリングの未来構想 | [[@2022__DICOMO__AI時代に向けたクラウドにおける信頼性エンジニアリングの未来構想]] | 2022 年時点で SRE の信頼性制御思想を 2040 年代の利用者主導セルフクラフトへ延長し、技術者-AI 協働段階として Interactive AIOps(実験可能性 + 解釈性)を提示。 | | 2026-06-17 | ingest-paper | Ironies of Automation 後続 2 論文 | [[@2012__ECCE__The Ironies of Automation Still Going Strong at 30]] 他 | Bainbridge (1983) のアイロニーは 40 年以上にわたり構造的に未解消。Strauch は新アイロニー(技能マスキング・同一エラー反復・機能過多)を体系化、Baxter らはクラウドの低コスト品質迂回を特定。 | | 2026-06-17 | ingest | ペパボ研究所 gpt-oss サービング性能評価 | [[@2025__ペパボ研究所__gpt-ossモデルのサービング性能評価]] | H100 でのみ並列スケーリングが有効、出力トークン数がスループットを支配、Reasoning effort はモデルサイズ選択と同等に重要。 | | 2026-06-17 | ingest-paper | マイクロサービスベンチマーク/データセット 4 論文一括 | [[@2019__ASPLOS__An Open-Source Benchmark Suite for Cloud and IoT Microservices]] 他 | DeathStarBench(2019)が学術ベンチの原典、TrainTicketTrace(2026)が fault localization dataset の現代版。Train-Ticket が 3 本に共通の benchmark system として登場。 | | 2026-06-17 | ingest-paper | Time-RA (ACL Findings 2026) | [[@2026__ACL Findings__Time-RA - Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback]] | TSAD を二値識別から生成型推論(検知+分類+因果説明)へ転換。SFT + LoRA の Qwen2.5-7B が未見ドメインにプラグアンドプレイで転用可能であることを初めて実証。 | | 2026-06-17 | ingest-paper | LLMAD + ChatTS | [[@2025__KDD__Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection]] 他 | LLMAD は LLM を直接判定器として使い Best F1=0.759・年間 $65.70。ChatTS は時系列をネイティブモダリティとして扱う初の TS-MLLM で GPT-4o vision を alignment +46.0% / reasoning +25.8% で凌駕。 | | 2026-06-17 | wiki-query | wiki-query 追補 — Ganatra+ 年代別レビューに統合 | [[アラーティングの進歩-年代別]] | Yang+ DSN2022(既存アラートの 6 アンチパターン)と Ganatra+ FSE2023(アラート不在の 6 カテゴリ)が相補的タクソノミ対を成すことを確認。 | | 2026-06-17 | wiki-query | アラーティングの進歩 — 年代別レビュー | [[アラーティングの進歩-年代別]] | 1980s 商用 NMS から 2026 agentic SRE まで 5+1 介入点の層分化を再構成。集約アルゴリズム 3 段世代交代、LLM 採用境界が SkyNet で明文化。 | | 2026-06-17 | ingest-paper | GLM family — 起点→GLM-4.5→GLM-5→GLM-OCR | [[@2022__ACL__GLM - General Language Model Pretraining with Autoregressive Blank Infilling]] 他 | GLM 系統 4 本が単一論文ファミリーとして wiki に揃った。GLM-OCR の 0.9B が 235B モデルを上回り、DSA が GLM-5 で次世代スパーシティ手法として採用。 | | 2026-06-17 | ingest | CursorBench | [[@2026__Cursor__CursorBench - How Cursor Evaluates Model Quality]] | Cursor が CursorBench 3.1 のハイブリッド評価手法を公開。OpenAI は SWE-bench Verified 報告を停止、内部ベンチマーク + オンライン評価への業界移行を示唆。 | | 2026-06-17 | ingest-paper | アラート管理・時系列異常検知 10 本一括 | [[@2012__NOMS__Optimizing System Monitoring Configurations for Non-Actionable Alerts]] 他 | Fudan アラート集約三部作(OAS→DyAlert→ProAlert)の系譜確定。EVT が現代アラートストーム検知のルーツ。アラートランキング 3 ルーツの対照構造、IBM 系 12 年のアラート品質ロードマップを確認。 | | 2026-06-17 | ingest-paper | Harp: VPC Network Availability | [[@2026__NSDI__Harp - Improving VPC Network Availability via Efficient Failure Detection and Rerouting in Tencent Cloud]] | UDP ソースポートによる ECMP 決定論的パス制御とインバンドプローブ埋め込みで、サブ秒の VPC 障害回復を数十万台の本番環境で実現(停止時間 78-99.97% 削減)。 | | 2026-06-17 | ingest-paper | アラート管理 3 本 (Zha+ / VOCE / SkyNet) | [[@2024__Electronics__Leveraging Large Language Models for Efficient Alert Aggregation in AIOPs]] 他 | LLM 採用/不採用の境界が failure severity × スケールで引かれることを確認。LLM の RCA 内役割が外部知識リーダー/グラフマッパー/多因子分析器の 3 系統に分化。 | | 2026-06-16 | ingest-slides | Reliability in the Age of AI: Engineering for AI Velocity | [[@2026__SpeakerDeck__Reliability in the Age of AI - Engineering for AI Velocity]] | AI 時代の信頼性課題は開発速度自体ではなく、生成物の品質管理・本番での観測・SRE 判断のスケールが同時に追いつかなくなる点にある。 | | 2026-06-16 | ingest-paper | アラート管理 9 論文一括取り込み | [[@2020__ICSE-SEIP__Understanding and Handling Alert Storm for Online Service Systems]] 他 | 9 本が抑制・フィルタリング・集約・ランキング・RCA の 5 介入点に分化。HPC の連続的過負荷とクラウドの断続的ストームは別問題で集約戦略が異なる。 | | 2026-06-16 | ingest-paper | アラート管理・集約・予測の系譜 5 論文一括 | [[@2022__DSN__Characterizing and Mitigating Anti-patterns of Alerts in Industrial Cloud Systems]] 他 | Yang+ 2022 と Kuang+ 2024 が同じ CUHK+Huawei Cloud 連携で SOP の限界実証→LLM で SOP 再活用の 2 段ループ。AirAlert の Bayesian network+XGBoost が PAGER(2026) に 7 年先行。 | related: - "[[DragonScale Memory]]" - "[[log]]" - "[[index]]" --- ## Key Outcomes - アラート管理の介入点 5 層構造(抑制・フィルタリング・集約・ランキング・RCA)が 27 本の論文一括取り込みにより確定。加えて年代別レビューで「閾値+通知」一体の 2007 から 2024 の 6+1 層分解までの通時的変遷を再構成し、EVT(KDD2017) がアラートストーム検知のルーツであること、Fudan 集約三部作(OAS→DyAlert→ProAlert)の 3 年スパンの漸進的進化を確認 (from 2026-06-16 ingest-paper x5, 2026-06-16 ingest-paper x9, 2026-06-17 ingest-paper x3, 2026-06-17 ingest-paper x10, 2026-06-17 wiki-query entries) - LLM × 時系列異常検知が 4 路線に分岐。LLMAD は LLM を直接判定器として使う単変量+解釈路線(Best F1=0.759)、ChatTS は時系列をネイティブモダリティとして扱う初の TS-MLLM(GPT-4o vision を +46.0% で凌駕)、Time-RA は TSAD を生成型推論(検知+分類+因果説明)へ転換、ARGOS は LLM を訓練時ルール生成のみに使い推論はルールベースで実行。LLM の組み込み方が「訓練時ルール抽出」と「推論時検知」で根本的に分化 (from 2026-06-17 LLMAD+ChatTS, Time-RA, ingest-paper x10 entries) - GLM 系統 4 本(2022 ACL → 2025 GLM-4.5 → 2026 GLM-5 → 2026 GLM-OCR)が単一ファミリーとして wiki に揃い、GLM-OCR の 0.9B が 235B モデルを上回ること、DSA が次世代スパーシティ手法として 744B 規模で実装されたこと、非同期エージェント RL インフラ(slime フレームワーク)が MiniMax-M2 の Forge と独立に同じ問題意識に到達していることを確認 (from 2026-06-17 GLM family entry) - Bainbridge (1983) の自動化のアイロニーが 40 年以上にわたり構造的に未解消であることを後続 2 論文で確認。Strauch は新アイロニー(技能マスキング・同一エラー反復・機能過多)を体系化し、Baxter らはクラウドの低コストによる品質迂回を特定。2022 年の DICOMO 発表で SRE の信頼性制御を Interactive AIOps → セルフクラフトへ延長する未来構想との接続を確認 (from 2026-06-17 Ironies of Automation, ingest-slides DICOMO entries) - SRE と AI の関係について 2 つのスライド資料を取り込み。2022 年の DICOMO 発表では Interactive AIOps(実験可能性+解釈性)を提示、2026 年の Reliability in the Age of AI では開発速度向上に対して品質管理・本番観測・SRE 判断のスケールが追いつかない課題を指摘。SLI/SLO とエラーバジェットの AI サービス固有拡張が提案された (from 2026-06-16 Reliability in the Age of AI, 2026-06-17 DICOMO entries) - マイクロサービスベンチマーク/データセット 4 本で DeathStarBench(2019) が学術ベンチの原典、TrainTicketTrace(2026) が fault localization dataset の現代版として位置づけられ、Train-Ticket が de facto 共通基盤化していることを裏付け (from 2026-06-17 マイクロサービスベンチマーク entry) - CursorBench 3.1 のハイブリッド評価手法公開と OpenAI の SWE-bench Verified 報告停止(未解決問題の 60% にテスト欠陥)により、内部ベンチマーク + オンライン評価への業界移行が示唆された (from 2026-06-17 CursorBench entry) - Harp が UDP ソースポートによる ECMP 決定論的パス制御とインバンドプローブ埋め込みで、特定ハードウェア不要のサブ秒 VPC 障害回復を Tencent Cloud 数十万台の本番環境で実現(停止時間 78-99.97% 削減) (from 2026-06-17 Harp entry) ## Cross-entry Themes - **アラート管理の全層的カタログ化が完了**: 5+9+3+10=27 本の論文を 2 日間で一括取り込みし、介入点の 5 層構造(抑制・フィルタリング・集約・ランキング・RCA)を確定。年代別レビューで 1980s 商用 NMS から 2026 agentic SRE までの通時的変遷を再構成した。Yang+ DSN2022(既存アラートの 6 アンチパターン)と Ganatra+ FSE2023(アラート不在の 6 カテゴリ)が相補的タクソノミ対を成し、集約アルゴリズムはペア類似度(2014)→動的グラフ表現学習(2023)→教師なしトポロジセマンティクス(2025, ProAlert)の 3 段世代交代が確認された (supported by: 2026-06-16 ingest-paper x5, 2026-06-16 ingest-paper x9, 2026-06-17 ingest-paper x3, 2026-06-17 ingest-paper x10, 2026-06-17 wiki-query, 2026-06-17 wiki-query 追補 entries) - **LLM × 時系列異常検知の役割分化が鮮明化**: LLMAD(単変量+解釈)、ChatTS(多変量+推論、TS-MLLM)、Time-RA(生成型推論)、ARGOS(訓練時ルール生成のみ)、VisualTimeAnomaly(MLLM 推論時検知)の路線が出揃い、「訓練時ルール抽出 vs 推論時検知」という根本的分岐と、LLM 採用/不採用の境界(failure severity × スケール)が SkyNet で明文化された (supported by: 2026-06-17 LLMAD+ChatTS, 2026-06-17 Time-RA, 2026-06-17 ingest-paper x10, 2026-06-17 ingest-paper x3 entries) - **SRE × 自動化のアイロニーの構造的連続性**: Bainbridge(1983)のアイロニーが 40 年超にわたり未解消であることを 2 論文で確認しつつ、2022 年 DICOMO の Interactive AIOps 構想と 2026 年 Reliability in the Age of AI の速度追従課題を並べることで、自動化のアイロニーが SRE の AI 適用においても再現しうる構図が見えた。agentic 時代にアラートの意味論自体が人間通知用から autonomous handler 入力用へ変質中であり、この変化を理論化する研究はまだない (supported by: 2026-06-17 Ironies of Automation, 2026-06-17 DICOMO ingest-slides, 2026-06-16 Reliability in the Age of AI ingest-slides, 2026-06-17 wiki-query entries) - **LLM 評価手法の産業化と学術ベンチマークの限界露呈**: CursorBench 3.1 が学術 LLM 論文(GLM-5)から産業ベンチマークとして引用される一方、OpenAI は SWE-bench Verified の報告停止に至った。GLM-OCR の 0.9B が 235B モデルを上回る結果は、タスク特化型小型 VLM の可能性を示すと同時に、汎用ベンチマークでは捕捉できない性能分布の存在を示唆 (supported by: 2026-06-17 GLM family, 2026-06-17 CursorBench entries) ## Contradictions or Corrections - None detected. ## Child Pages - [[@2022__DICOMO__AI時代に向けたクラウドにおける信頼性エンジニアリングの未来構想]] - [[@2012__ECCE__The Ironies of Automation Still Going Strong at 30]] - [[@2017__IEEE THMS__Ironies of Automation - Still Unresolved After All These Years]] - [[@2025__ペパボ研究所__gpt-ossモデルのサービング性能評価]] - [[@2019__ASPLOS__An Open-Source Benchmark Suite for Cloud and IoT Microservices]] - [[@2023__arXiv__Benchmarks for End-to-End Microservices Testing]] - [[@2024__MSR__A Dataset of Microservices-based Open-Source Projects]] - [[@2026__SANER-C__TrainTicketTrace - A Multi-Fault Distributed Dataset for Microservice Fault Detection and Localization]] - [[@2026__ACL Findings__Time-RA - Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback]] - [[@2025__KDD__Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection]] - [[@2025__VLDB__ChatTS - Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning]] - [[アラーティングの進歩-年代別]] - [[@2022__ACL__GLM - General Language Model Pretraining with Autoregressive Blank Infilling]] - [[@2025__arXiv__GLM-4.5 - Agentic Reasoning and Coding Foundation Models]] - [[@2026__arXiv__GLM-5 - From Vibe Coding to Agentic Engineering]] - [[@2026__arXiv__GLM-OCR Technical Report]] - [[@2026__Cursor__CursorBench - How Cursor Evaluates Model Quality]] - [[@2012__NOMS__Optimizing System Monitoring Configurations for Non-Actionable Alerts]] - [[@2018__CIKM__Collaborative Alert Ranking for Anomaly Detection]] - [[@2022__ICSE__Online Summarizing Alerts through Semantic and Behavior Information]] - [[@2025__arXiv__ARGOS - Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models]] - [[@2025__arXiv__Can Multimodal LLMs Perform Time Series Anomaly Detection]] - [[@2009__ICAC__Ranking the Importance of Alerts for Problem Determination in Large Computer Systems]] - [[@2017__KDD__Anomaly Detection in Streams with Extreme Value Theory]] - [[@2020__CLOUD__DEAR - Distributed Evaluation of Alerting Rules]] - [[@2025__FSE__Alert Summarization for Online Service Systems by Validating Propagation Paths of Faults]] - [[@2024__FSE__ChangeRCA - Finding Root Causes from Software Changes in Large Online Systems]] - [[@2026__NSDI__Harp - Improving VPC Network Availability via Efficient Failure Detection and Rerouting in Tencent Cloud]] - [[@2024__Electronics__Leveraging Large Language Models for Efficient Alert Aggregation in AIOPs]] - [[@2025__FASE__VOCE - A Virtual On-Call Engineer for Automated Alert Incident Analysis Using a Large Language Model]] - [[@2025__SIGCOMM__SkyNet - Analyzing Alert Flooding from Severe Network Failures in Large Cloud Infrastructures]] - [[@2026__SpeakerDeck__Reliability in the Age of AI - Engineering for AI Velocity]] - [[@2020__ICSE-SEIP__Understanding and Handling Alert Storm for Online Service Systems]] - [[@2020__ISSRE__AlertRank - Automatically and Adaptively Identifying Severe Alerts for Online Service Systems]] - [[@2023__arXiv__ESRO - Experience Assisted Service Reliability against Outages]] - [[@2023__ASE__Dynamic Graph Neural Networks-Based Alert Link Prediction for Online Service Systems]] - [[@2023__JCC__Filtering Alerts on Cloud Monitoring Systems]] - [[@2023__ICSE-SEIP__TraceArk - Towards Actionable Performance Anomaly Alerting for Online Service Systems]] - [[@2024__CCGRID__AlertRCA - Causality Enhanced Graph Representation Learning for Alert-Based Root Cause Analysis]] - [[@2024__ICSE-SEIP__Dynamic Alert Suppression Policy for Noise Reduction in AIOps]] - [[@2024__ISSRE__Exploring Hierarchical Patterns for Alert Aggregation in Supercomputers]] - [[@2022__DSN__Characterizing and Mitigating Anti-patterns of Alerts in Industrial Cloud Systems]] - [[@2024__ICSE-SEIP__Knowledge-aware Alert Aggregation in Large-scale Cloud Systems - a Hybrid Approach]] - [[@2025__arXiv__Metric Criticality Identification for Cloud Microservices]] - [[@2014__KDD__Unveiling Clusters of Events for Alert and Incident Management in Large-Scale Enterprise IT]] - [[@2019__WWW__Outage Prediction and Diagnosis for Cloud Service Systems]] - [[Interactive AIOps]] - [[セルフクラフト]] - [[LLMAD]] - [[ChatTS]] - [[AnoCoT]] - [[TSEvol]] - [[マイクロサービスベンチマーク]] - [[コーディングエージェント評価]] - [[VPCネットワーク可用性]] - [[アラートインシデント分析]] - [[LLMによる根本原因分析]] - [[サービス依存グラフ]] - [[ネットワーク監視]] - [[アラートストーム]] - [[アラート抑制]] - [[アクショナブルアラート]] - [[Quality of Alerts]] - [[アラートアンチパターン]] - [[COLA]] - [[KIMetrix]] - [[情報量基準メトリクス選定]] - [[AirAlert]] ## Related - [[DragonScale Memory]] - fold-operator spec - [[log]] - source entries - [[index]] - vault catalog