Tri Dao - yuuk1's Digital Garden

# Tri Dao FlashAttention シリーズ（FA1〜FA4）の一貫した中心著者であり、IO 認識型アテンションアルゴリズムの提唱者。Stanford University の Christopher Ré 研究室で FA1 を開発し、FA2 は Princeton University に移籍後に単著で発表した。FA3・FA4 は [[Together AI]] および [[Princeton University]] に所属しながら共著者として関与している。 FA1（2022）でタイリング＋再計算による厳密アテンションの IO 複雑性解析を提示し、FA2（2023）ではワーク分割最適化により A100 で 72% MFU を達成した。FA3（2024）は [[Jay Shah]] らと Hopper GPU 向けワープ特化＋FP8 対応を、FA4（2026）は Ted Zadouri らと Blackwell GPU 向け非対称スケーリング協調設計を行った。 GitHub リポジトリ `Dao-AILab/flash-attention` を運営し、PyTorch・Hugging Face への統合により産業・学術の双方で広く採用されている。Mamba（状態空間モデル）の開発者でもある。 (Source: [[@2022__arXiv__FlashAttention - Fast and Memory-Efficient Exact Attention with IO-Awareness]], [[@2023__arXiv__FlashAttention-2 - Faster Attention with Better Parallelism and Work Partitioning]], [[@2024__arXiv__FlashAttention-3 - Fast and Accurate Attention with Asynchrony and Low-precision]], [[@2026__arXiv__FlashAttention-4 - Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling]])