- [How do you manage your Machine Learning Experiments? | by Hady Elsahar | Medium](https://hadyelsahar.medium.com/how-do-you-manage-your-machine-learning-experiments-ab87508348ac) - [実験管理について考える - Re:ゼロから始めるML生活](https://www.nogawanogawa.com/entry/experiment_management#%E5%90%84%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%82%92%E7%A2%BA%E8%AA%8D%E3%81%99%E3%82%8B) ## Knobs - Code: Model architecture, Bug fixes, Evaluation Code, (Add / Fix) a Hyper-parameter - Datasets: Change in datasets, preprocessing, manual fixing some examples. - Debugging: those minor changes you always do to debug a certain model behaviour. - Training: Hyperparameter tuning either manually, or automatically using hyper-param opt systems. - Meta: experiment name, tag, time, what were you doing back then. ## Watchlists - Evaluation Metrics: Accuracy, [[ROC]], [[BLEU]], [[ROUGE]] ..etc, not only which metric you use but which implementation of those metrics. - Debugging and Intermediate Metrics: Training and dev loss and accuracy, Gradient per layer per epochs. System info like hostname, GPU memory %, GPU occupation % ## ツール - [[Neptune.ai]] - [[MLFlow]] - [[Comet.ml]] - [[WandB]]