P-D-Serve - yuuk1's Digital Garden

# P-D-Serve P-D-Serve は、Huawei Technologies が提案した大規模分離型 LLM サービングシステムである。Ascend と MindSpore 上に実装され、数万 NPU で 8 か月超商用展開されたと報告される。主な構成は、シナリオごとの細粒度 P/D group、RoCE 対応の動的 P/D organization、rejection に基づく on-demand forwarding、離散 KVCache ブロックを連続バッファとして送る block-free D2D transfer である。論文は E2E throughput 60%、TTFT SLO 42%、D2D transfer time 46% の改善と、集約型 LLM 比 6.7 倍の throughput を報告する。(Source: [[@2024__arXiv__P-D-Serve - Serving Disaggregated Large Language Model at Scale]]) ## 関連 - 概念: [[Prefill-Decode分離]] / [[KVキャッシュ管理]] - 人物: [[Yibo Jin]] - ソース: [[@2024__arXiv__P-D-Serve - Serving Disaggregated Large Language Model at Scale]]