A hitchhiker’s guide to CUDA programming
How to write a CUDA kernel to achieve 95% cuBLAS performance
How to write a CUDA kernel to achieve 95% cuBLAS performance
My opinions on SSMs
Takeaways from DDIM
Walk the walk and walk the dog
If you were to design an HF
A quick rundown on expectation maximization