2:31
Scientific ML · CUDA · Containers
Why Containers Don't Guarantee AI Reproducibility
CUDA parallelizes operations across thousands of GPU threads, and the order those threads accumulate values isn't fixed. In floating-point arithmetic, order matters. This video walks through how that breaks scientific reproducibility inside identical containers.
CUDA
Floating-Point
Containers
Reproducibility
Scientific ML
1:52
HPC · Distributed Training · GPU Scaling
More GPUs, Slower AI?
The assumption that adding GPUs accelerates training is deeply embedded in how teams think about AI infrastructure. This video walks through what happens when that assumption breaks: communication overhead, gradient reduction complexity, and how scaling hardware can actively degrade convergence.
Distributed Training
GPU Scaling
HPC
Performance
Reproducibility
More videos in progress
Upcoming topics include cuDNN algorithm selection, physics-informed model validation, and distributed training reproducibility on HPC clusters.