動的インストルメンテーション

# 動的インストルメンテーション ## 定義動的インストルメンテーション(dynamic instrumentation)とは、対象プログラムのソース改変・再コンパイルを伴わずに、実行中(あるいは実行直前)に計測コードを挿入して挙動を観測する手法。eBPF の kprobe/uprobe/tracepoint、GPU 向けの PTX 注入や NVBit による SASS バイナリ書き換え、LLVM パスによるコンパイル時の計装などが含まれる。非侵襲性(アプリを止めず・直さず観測できる)が本番適用の鍵になる。([[@2025__eBPF__eInfer - Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF]], [[@2025__HCDS__eGPU - Extending eBPF Programmability and Observability to GPUs]]) ## 横断的知見 - **「どの抽象度に計測を差し込むか」で実装系統が分岐し、オーバーヘッドが決まる**: ホスト側ランタイム関数への eBPF uprobe/uretprobe/tracepoint([[@2026__arXiv__ProfInfer - An eBPF-based Fine-Grained LLM Inference Profiler]]・[[@2025__eBPF__eInfer - Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF]])、GPU 中間表現への PTX 注入([[@2025__HCDS__eGPU - Extending eBPF Programmability and Observability to GPUs]])、GPU バイナリ(SASS)書き換えの NVBit([[@2024__arXiv__Microsecond-scale Dynamic Validation of Idempotency for GPU Kernels]] が利用)、LLVM コンパイル時計装([[@2024__TOPC__Low-Overhead Trace Collection and Profiling on GPU Compute Kernels]])と、差し込み先の抽象度が高位 IR から低位バイナリ・あるいはコンパイル時へと分かれる。eGPU は「PTX(高位 IR)注入は NVBit(SASS 書き換え)より低オーバーヘッド」と主張し、抽象度の選択が直接コストに効く。(Source: [[@2025__HCDS__eGPU - Extending eBPF Programmability and Observability to GPUs]], [[@2024__arXiv__Microsecond-scale Dynamic Validation of Idempotency for GPU Kernels]]) - **ソース改変不要・本番常時運用が共通の売り**: eInfer・ProfInfer・eGPU はいずれも「ソースを直さず・再コンパイルせず」を価値の中心に置き、ProfInfer は QoS 違反検出時に一部プローブを切って速度を回復する適応制御まで持つ。動的計装は単なる導入容易性でなく、稼働中に粒度を調整できる運用柔軟性として使われる。(Source: [[@2026__arXiv__ProfInfer - An eBPF-based Fine-Grained LLM Inference Profiler]], [[@2025__eBPF__eInfer - Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF]]) - **演算子引数の構造を辿って意味を回復する**: ProfInfer は `ggml_tensor` 構造体を `bpf_probe_read_user` で辿り演算子名・型・テンソル次元を抽出し、MoE では二段ポインタ参照で活性化エキスパート ID を読む。動的計装は単なるタイムスタンプ収集を超え、実行時データ構造から演算子セマンティクスを復元する段階に来ている。(Source: [[@2026__arXiv__ProfInfer - An eBPF-based Fine-Grained LLM Inference Profiler]]) ## 未解決の問い - 稼働中の GPU カーネルを書き換える(PTX 注入・SASS 書き換え)際の長期的な安定性・安全性をどう担保するか。本番大規模採用に向けた検証が不足。 - ランタイム関数シンボルに依存する計装は、推論エンジンのバージョンアップでシンボルが変わると保守コストが生じる(eInfer の限界)。CO-RE 的な可搬性をどこまで確保できるか。 - 他の推論エンジンへの移植コスト — ProfInfer は MNN-LLM への移植を moderate effort と見積もるが実測は未確認。`perf --callgraph`/`pahole` による類似関数特定は汎用化できるか。 ## 関連 - ソース: [[@2025__HCDS__eGPU - Extending eBPF Programmability and Observability to GPUs]] / [[@2025__eBPF__eInfer - Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF]] / [[@2026__arXiv__ProfInfer - An eBPF-based Fine-Grained LLM Inference Profiler]] / [[@2024__TOPC__Low-Overhead Trace Collection and Profiling on GPU Compute Kernels]] / [[@2024__arXiv__Microsecond-scale Dynamic Validation of Idempotency for GPU Kernels]] - 概念: [[eBPF]] / [[GPU観測性]] / [[ハードウェアカウンタ]] / [[テレメトリ]] - エンティティ: [[NVBit]] / [[CUPTI]] / [[PTX]] / [[bpftime]] / [[BCC]] / [[libbpf]] - 関連 MOC: [[AI Infra Telemetry - MOC]] ## 出典 - [[@2025__HCDS__eGPU - Extending eBPF Programmability and Observability to GPUs]](PTX 注入 vs NVBit SASS 書き換え) - [[@2026__arXiv__ProfInfer - An eBPF-based Fine-Grained LLM Inference Profiler]](uprobe/tracepoint・テンソル構造抽出・適応制御) - [[@2025__eBPF__eInfer - Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF]](ランタイム適応型トレーシング) - [[@2024__TOPC__Low-Overhead Trace Collection and Profiling on GPU Compute Kernels]](LLVM コンパイル時計装)