ML Data Verification - yuuk1's Digital Garden

[Machine Learning Systems](https://thegradient.pub/systems-for-machine-learning/) > Data verification serves as a natural follow-up to data collection. Data quality is a critical problem for machine learning pipelines. To use the common phrase, “garbage in, garbage out.” To produce high quality models for their system, the maintainer has to ensure that the data they’re feeding in is also high quality. > To quote the TensorFlow Data Validation (TFDV) paper: > > “Data validation is neither a new problem nor unique to ML, and so we borrow solutions from related fields (e.g., database systems). However, we argue that the problem acquires unique challenges in the context of ML and hence we need to rethink existing solutions” from <iframe class="hatenablogcard" style="width:100%;height:155px;max-width:680px;" title="akira on Twitter: "機械学習を使ったシステムを構築するときの要素をデータ収集、データ検証、学習、デプロイの４つの観点から解説している。データ検証に関しては、新しい問題ではないため、既存ツールを使うことを進めている https://t.co/bddgWhtSLy" / Twitter" src="https://hatenablog-parts.com/embed?url=https://twitter.com/AkiraTOSEI/status/1436311236614885377" width="300" height="150" frameborder="0" scrolling="no"></iframe>