1 Comment

This is really impressive! Do you have any metrics on long context benchmarks such as RULER or NIAH? That seems to be the last advantage an attention mechanism would hold, compared to a state-space approach like this.

Expand full comment