[Performance — NVIDIA NeMo Framework User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-summary.html)
事前学習 [[H100]]だと、230 ~ 854 Model TFLOP / sec / GPU、320 - 14744 tokens / sec / GPU
[GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#performance-benchmarking)
![[Pasted image 20250831230508.png]]
- [[MFU]]を維持しながらスケールアウトできている [[Strong Scaling and Weak Scaling]]
-