Walkthrough of writing a SGEMM kernel that achieves 95% of cuBLAS performance
Every architecture contains some implicit trade-offs. My impression is SSMs are a good sequential architecture for modalities where interactions within a sequence matters less than a good compression of past states.
Imagine that you have 2 curves in a 2-D space, how would you measure the similarity of these 2 curves?
Explaining the EM algorithm in a nutshell