Discussion about this post

User's avatar
Subendhu Rongali's avatar

This is really impressive! Do you have any metrics on long context benchmarks such as RULER or NIAH? That seems to be the last advantage an attention mechanism would hold, compared to a state-space approach like this.

Expand full comment
Howard's avatar

nice work, really close to Qwen2.5 this time

Expand full comment
1 more comment...

No posts