Is your model faster than Mistral 7B AWQ on say T4?
I am not sure how to compare. Mistral Super Fast seems to output at the same speed, but the HF Space does not say exactly what kind of hardware it runs on.
With the exact same settings, from an architecture stand point - ours should be faster in overall
However while we do support quantization - we do not support speculative decoding. for now
As a result transformer models are able to match or be faster then our model with the support of speculative decoding - which in the future we do plan to add support for as well
Is there any Google Colab to run this model and to run it on a CPU? I ran this model on CPU in high-ram machine and it works but it is very slow. Something like 1 minute per token or more.
You can try it online today on
- our hugging face : https://huggingface.co/spaces/recursal/EagleX-7B-1.7T-Gradio-Demo
- our new cloud platform : https://recursal.ai
Is your model faster than Mistral 7B AWQ on say T4?
I am not sure how to compare. Mistral Super Fast seems to output at the same speed, but the HF Space does not say exactly what kind of hardware it runs on.
https://huggingface.co/spaces/osanseviero/mistral-super-fast
With the exact same settings, from an architecture stand point - ours should be faster in overall
However while we do support quantization - we do not support speculative decoding. for now
As a result transformer models are able to match or be faster then our model with the support of speculative decoding - which in the future we do plan to add support for as well
Is there any Google Colab to run this model and to run it on a CPU? I ran this model on CPU in high-ram machine and it works but it is very slow. Something like 1 minute per token or more.
```
model_path = hf_hub_download(repo_id="recursal/EagleX_1-7T", filename="EagleX-1_7T.pth")
strategy = "cpu fp32i8"
model = RWKV(model=model_path, strategy=strategy)
from rwkv.utils import PIPELINE, PIPELINE_ARGS
pipeline = PIPELINE(model, "rwkv_vocab_v20230424")
```