## [[LLM4SRE]]
- [[2023__arXiv__Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents]]
- [[2023__arXiv__Assess and Summarize - Improve Outage Understanding with Large Language Models]]
- [[2023__ICSE__Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models]]
- [[Large-language models for automatic cloud incident management]]
## Others
- [[2025__arXiv__An Empirical Study of Production Incidents in Generative AI Cloud Services]]
- [[2024__ICSE__Intelligent Monitoring Framework for Cloud Services - A Data-Driven Approach]]
- [[2023__ESEC-FSE__Detection Is Better Than Cure - A Cloud Incidents Perspective]]
- [[2022__SoCC__How to Fight Production Incidents? An Empirical Study on a Large-scale Cloud Service]]
- [[2021__ISSRE__How Long Will it Take to Mitigate this Incident for Online Service Systems?]]
- [[2020__ESEC-FSE__How to Mitigate the Incident? An Effective Troubleshooting Guide Recommendation Technique for Online Service Systems]]
- [[2020__ESEC-FSE__Towards Intelligent Incident Management - Why We Need It and How We Make It]]
- [[2019__ASE__Continuous Incident Triage for Large-Scale Online Service Systems]]
- [[2019__HotOS__What bugs cause production cloud incidents?]]