🚀 Launching 🪶 Featherless.AI

Run any 🦙 model from Hugging Face, instantly.

Jun 24, 2024

Earlier this year, we took the world by storm when we announced that our Eagle model had beaten Meta’s Llama-2 while taking less training time, being the world’s most efficient model.

While Eagle still packs a powerful punch, and has been helping diverse use-cases from multi-lingual, to content moderation, gaming, and role-play, we’ve been working on something new, to bring our insights on efficiency to a much broader realm.

Just this Friday, we launched Featherless AI, which enables serverless inference of every Llama-3 8B and 70B model on Hugging Face we grabbed our hands.

That’s over 475 models. With many more being added daily.

Allowing anyone to quickly experiment, try, and choose the latest and best models, from huggingface. Starting from $10 / month.

Previously, to use the even the smallest fine-tunes requires dedicated hardware, which translates to real hosting costs, whether you’re experimenting with a model or ramping up production use. This is a barrier to a host of use cases particularly agents where each step in the agent computation might benefit from a particular model.

The goal of featherless is to make every model on HuggingFace available serverless and with these Llama & RWKV based models, we’re a big step of the way there.

With featherless, you can experiment with an entirely new range of models at completely different economics.

A guest post by

Eugene Cheah

Builds Attention-Free Transformer AI models (http://wiki.rwkv.com) from scratch, CEO @ featherless.ai (prv recursal.ai) - Also known for k8s infra & UI testing tools, webapps, and GPU.js, Hot-takes/Views are my own

Featherless AI - recursive dev blog

Discussion about this post

Ready for more?