paper: [[2019__arXiv__Megatron-LM - Training Multi-Billion Parameter Language Models Using Model Parallelism]] code: [GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale](https://github.com/nvidia/megatron-lm)