<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Featherless AI - recursive dev blog: RWKV News]]></title><description><![CDATA[Updates on RWKV open model development (and other open models) done by the featherless AI team]]></description><link>https://substack.recursal.ai/s/rwkv-news</link><image><url>https://substackcdn.com/image/fetch/$s_!RY89!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png</url><title>Featherless AI - recursive dev blog: RWKV News</title><link>https://substack.recursal.ai/s/rwkv-news</link></image><generator>Substack</generator><lastBuildDate>Sat, 18 Apr 2026 09:35:39 GMT</lastBuildDate><atom:link href="https://substack.recursal.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Recursal AI]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[featherless@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[featherless@substack.com]]></itunes:email><itunes:name><![CDATA[Featherless AI - dev blog]]></itunes:name></itunes:owner><itunes:author><![CDATA[Featherless AI - dev blog]]></itunes:author><googleplay:owner><![CDATA[featherless@substack.com]]></googleplay:owner><googleplay:email><![CDATA[featherless@substack.com]]></googleplay:email><googleplay:author><![CDATA[Featherless AI - dev blog]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[RADLADS: Dropping the cost of AI architecture experiment by 250x]]></title><description><![CDATA[Unlocking and accelerating the next wave of AI architecture research]]></description><link>https://substack.recursal.ai/p/radlads-dropping-the-cost-of-ai-architecture</link><guid isPermaLink="false">https://substack.recursal.ai/p/radlads-dropping-the-cost-of-ai-architecture</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Mon, 12 May 2025 18:13:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8b0ab9da-e776-4944-9881-fc754d946fa3_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Why do most large AI research labs swear by scaling and avoid architecture research?</p><ul><li><p><strong>What works small often fails big</strong> &#8212; Architectural innovations that show promise at 1M parameters may break down at 1B or 50B.</p></li><li><p><strong>Validating at scale is expensive</strong> &#8212; Training from scratch to test a new architecture at meaningful scale can cost at least $5&#8211;10M.</p></li><li><p><strong>High risk, uncertain reward</strong> &#8212; You&#8217;re just as likely to degrade performance as improve it&#8212;making architecture exploration financially unsustainable for most labs.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5149" height="2191" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2191,&quot;width&quot;:5149,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;burning banknotes&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="burning banknotes" title="burning banknotes" srcset="https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1554768803-2ae381da5645?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxidXJuJTIwbW9uZXl8ZW58MHx8fHwxNzQ2OTM1MDI0fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Money Burning Photo - by <a href="true">Jp Valery</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Training a state-of-the-art language model from scratch costs roughly $5-10M&#8212;just to validate a new attention mechanism, recurrence scheme, or memory system.</p><p>From our team experience, it typically takes 20&#8211;80 architecture iterations to achieve a 10%+ improvement. We've done this four times over the past two years.</p><p>For most AI labs, that level of experimentation would cost around $250 million in research GPU time. From that perspective, it's often more rational to invest in scaling model parameters and datasets for a near-guaranteed performance gain of ~10%.</p><p>At Featherless, we believe this bottleneck in architecture validation has slowed progress&#8212;not only in capabilities but in reliability.</p><p>But what if the cost to validate an architecture dropped from $5 million to $20K?</p><p>With that same $250 million, we could run over 12,500 iterations, uncovering 100+ architecture improvements, each with 10%+ gains. Compounded, that&#8217;s a theoretical 1,378,000% improvement in performance.</p><p>That&#8217;s why we&#8217;re excited about RADLADS.</p><div><hr></div><h1><strong>Introducing RADLADS</strong></h1><p>RADLADS (Rapid Attention Distillation to Linear Attention Decoders at Scale) is a new method for converting massive transformer models (e.g., Qwen-72B) into new AI models with alternative attention mechanisms&#8212;at a fraction of the original training cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XWoX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XWoX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 424w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 848w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 1272w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XWoX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png" width="701" height="461" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:461,&quot;width&quot;:701,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XWoX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 424w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 848w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 1272w, https://substackcdn.com/image/fetch/$s_!XWoX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F585341ab-6138-41ce-ab1f-2ea2578bec2c_701x461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Total cost: $2,000&#8211;$20,000</p></li><li><p>Tokens used: ~500 million</p></li><li><p>Training time: A few days on accessible cloud GPUs (8&#215; MI300)</p></li><li><p>Cost reduction: ~250&#215; reduction in the cost of scientific experimentation</p></li></ul><p>Instead of training from scratch, we convert existing models to new attention architectures in three steps:</p><ol><li><p>Align hidden states between the original transformer and the target attention architecture</p></li><li><p>Distill output behavior (logits) from the original model</p></li><li><p>Fine-tune for long-context performance</p></li></ol><p>You can read about the process details from our paper review on <a href="https://huggingface.co/papers/2505.03005">huggingface</a>  and <a href="https://www.arxiv.org/abs/2505.03005">arxiv</a>. This is the same technique that allowed us to train our latest 72B<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> attention-free, with only 8 GPU&#8217;s.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bd373744-7aac-4094-b1f5-1042c1742031&quot;,&quot;caption&quot;:&quot;We are proud to announce the updated QRWKV-72B and 32B.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;&#129727;QRWKV-72B and 32B : Training large attention free models, with only 8 GPU's&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:99170118,&quot;name&quot;:&quot;Eugene Cheah&quot;,&quot;bio&quot;:&quot;Builds Attention-Free Transformer AI models (http://wiki.rwkv.com) from scratch, CEO @ featherless.ai (prv recursal.ai) - Also known for k8s infra &amp; UI testing tools, webapps, and GPU.js, Hot-takes/Views are my own&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8dcb57e-6203-4be3-ae29-03732db5c5f7_460x460.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://substack.tech-talk-cto.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://substack.tech-talk-cto.com&quot;,&quot;primaryPublicationName&quot;:&quot;Tech Talk CTO&quot;,&quot;primaryPublicationId&quot;:1004639}],&quot;post_date&quot;:&quot;2025-03-24T17:30:34.860Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!iyqS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large&quot;,&quot;section_name&quot;:&quot;RWKV News&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:159379897,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:14,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Featherless AI - recursive dev blog&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RY89!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2mkT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2mkT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2mkT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png" width="296" height="296" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:296,&quot;bytes&quot;:2410734,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/163307360?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2mkT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2mkT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56983e6d-be9a-4e39-b8c7-95b1ab746cd4_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>What does this mean for research?</h1><p>RADLADS is already changing how we explore AI architecture. We can now:</p><ul><li><p>Rapidly test novel attention mechanisms and hybrid designs</p></li><li><p>Iterate on model structures in days, not months</p></li><li><p>Validate alignment and interpretability hypotheses at scale</p></li></ul><p>This isn&#8217;t just about RWKV&#8212;it opens doors for advancing Transformers, State Space models, xLSTMs, and architectures yet to be imagined. Its about accelerating our pace of research.</p><p>And we&#8217;re not doing it alone. Since announcing our work, we've collaborated with other researchers to validate multiple attention mechanisms, including Transformer-based variants.</p><blockquote><p><em>Reach out to us if you have any attention alternative your research team or university lab is working on and looking to validate in collaboration.</em></p></blockquote><p>It&#8217;s all part of our mission to make personalized reliable AI &#8212; and eventually AGI &#8212; a reality</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bdbc8ded-382b-4c87-9cce-8cdf1701c321&quot;,&quot;caption&quot;:&quot;If you want to find out more about the latest Qwerky model, that makes all of this possible, it is recommended to read this first:&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;&#128739;&#65039; Our roadmap to Personalized AI and AGI&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:99170118,&quot;name&quot;:&quot;Eugene Cheah&quot;,&quot;bio&quot;:&quot;Builds Attention-Free Transformer AI models (http://wiki.rwkv.com) from scratch, CEO @ featherless.ai (prv recursal.ai) - Also known for k8s infra &amp; UI testing tools, webapps, and GPU.js, Hot-takes/Views are my own&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8dcb57e-6203-4be3-ae29-03732db5c5f7_460x460.jpeg&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://substack.tech-talk-cto.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://substack.tech-talk-cto.com&quot;,&quot;primaryPublicationName&quot;:&quot;Tech Talk CTO&quot;,&quot;primaryPublicationId&quot;:1004639}],&quot;post_date&quot;:&quot;2025-03-24T17:40:14.797Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/139429cb-892d-41ef-9894-5ca461ef1510_1646x1058.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://substack.recursal.ai/p/our-roadmap-to-personalized-ai-and&quot;,&quot;section_name&quot;:&quot;RWKV News&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:159708830,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:4,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Featherless AI - recursive dev blog&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="pullquote"><p><strong>One more thing:</strong><br>QRWKV2, based on the RWKV architecture &amp; Qwen 3 models, is already training...</p><p><strong>Translation:</strong><br>A linear GPT-4o class text model is on its way...<br>After that its O1, and O3 class</p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Featherless AI - recursive dev blog! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This model was originally published as Qwerky-72B. However, due to confusion with another similar naming company/model, we have been requested to avoid using the Qwerky name, so we have renamed our models to QRWKV-72B</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[🛣️ Our roadmap to Personalized AI and AGI]]></title><description><![CDATA[We will not train a trillion parameter model: A <100B active parameters is all you need.]]></description><link>https://substack.recursal.ai/p/our-roadmap-to-personalized-ai-and</link><guid isPermaLink="false">https://substack.recursal.ai/p/our-roadmap-to-personalized-ai-and</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Mon, 24 Mar 2025 17:40:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/139429cb-892d-41ef-9894-5ca461ef1510_1646x1058.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you want to find out more about the latest QRWKV<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> model, that makes all of this possible, it is recommended to read this first:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:159379897,&quot;url&quot;:&quot;https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large&quot;,&quot;publication_id&quot;:2073186,&quot;publication_name&quot;:&quot;Featherless AI - recursive dev blog&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RY89!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png&quot;,&quot;title&quot;:&quot;&#129727;QRWKV-72B and 32B : Training large attention free models, with only 8 GPU's&quot;,&quot;truncated_body_text&quot;:&quot;We are proud to announce the updated QRWKV-72B and 32B.&quot;,&quot;date&quot;:&quot;2025-03-24T17:30:34.860Z&quot;,&quot;like_count&quot;:14,&quot;comment_count&quot;:3,&quot;bylines&quot;:[{&quot;id&quot;:99170118,&quot;name&quot;:&quot;Eugene Cheah&quot;,&quot;handle&quot;:&quot;techtalkcto&quot;,&quot;previous_name&quot;:&quot;PicoCreator&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8dcb57e-6203-4be3-ae29-03732db5c5f7_460x460.jpeg&quot;,&quot;bio&quot;:&quot;Builds Attention-Free Transformer AI models (http://wiki.rwkv.com) from scratch, CEO @ featherless.ai (prv recursal.ai) - Also known for k8s infra &amp; UI testing tools, webapps, and GPU.js, Hot-takes/Views are my own&quot;,&quot;profile_set_up_at&quot;:&quot;2022-07-17T02:17:11.664Z&quot;,&quot;reader_installed_at&quot;:null,&quot;twitter_screen_name&quot;:&quot;picocreator&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationId&quot;:1004639,&quot;primaryPublicationName&quot;:&quot;Tech Talk CTO&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://substack.tech-talk-cto.com&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://substack.tech-talk-cto.com/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:false,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!RY89!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png"><span class="embedded-post-publication-name">Featherless AI - recursive dev blog</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">&#129727;QRWKV-72B and 32B : Training large attention free models, with only 8 GPU's</div></div><div class="embedded-post-body">We are proud to announce the updated QRWKV-72B and 32B&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 14 likes &#183; 3 comments &#183; Eugene Cheah</div></a></div><p>In the past quarter, we have seen the following breakthroughs in open-source</p><ul><li><p>GPT-4o Mini class open models (deepseek, qwen-32b) on our platform</p></li><li><p>72B Attention-Free QRWKV model built on 8 GPUs</p></li><li><p>Both with more knowledge and capability than an average human</p></li><li><p>A tipping point with RWKV v6 and v7 improvements</p></li></ul><p>As such, my prediction (Eugene Cheah) for 2025 is that we are at the inflection point. </p><ul><li><p>Where we scale towards model reliability, with better memories</p></li><li><p>Instead of scaling a trillion parameters</p></li></ul><div><hr></div><h1>Our vision for AGI in summary</h1><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e4b0675a-c277-4f83-bdb9-80325f9b3203&quot;,&quot;duration&quot;:null}"></div><blockquote><p>The video above, is the condensed version of our vision for personalized AI &amp; AGI<br>The writing below, is the longer form with some additional details</p></blockquote><div><hr></div><h2>The scaling wall for bigger models</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aqC1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aqC1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 424w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 848w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 1272w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aqC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png" width="1456" height="833" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afec588c-225c-4e56-952b-f733448cfa72_4116x2356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:833,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4482172,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aqC1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 424w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 848w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 1272w, https://substackcdn.com/image/fetch/$s_!aqC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafec588c-225c-4e56-952b-f733448cfa72_4116x2356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://futurism.com/ai-researchers-tech-industry-dead-end">[Futurism Link]</a>, <a href="https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/">[Techcrunch Link]</a></figcaption></figure></div><p>The problem with scaling today is within the fundamental promise is for a step-increased improvement in capability, for every 10x in parameter size.</p><p>While it has remain somewhat true, that there is a step-up improvement in capability. <a href="https://arxiv.org/abs/2411.13055">The problem is in the diminishing returns, on both training and inference costs.</a></p><p>We are now within Billion&#8217;s of investment dollar range for the current GPT 4.5 model, with no clear answer if OpenAI Super AGI goal is achievable with another 10x, or 100x, or 1000x. After which, we are starting to talk in Trillions of dollars. Just for training. With even more dollars required to actually run the model.</p><p>While this path might still make sense, potentially for &#8220;Super AGI&#8221;.</p><p>It makes no sense for &#8220;Human-level AGI&#8221;. Because &#8230;.</p><div><hr></div><h2>Short Context AGI - is already here (sort of)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_udp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_udp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_udp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_udp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_udp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_udp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg" width="1024" height="529" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:529,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_udp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_udp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_udp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_udp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb42b0738-1bab-427d-b29f-5f9b33295273_1024x529.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Character AI, showcases a vast array of unique personalities</figcaption></figure></div><p>For the vast majority of users on applications like Character.AI, within that context window of approximately 8,000 tokens.</p><p>These AI characters, within the window, is AGI, with perhaps some flaws, but a unique character experience regardless. It&#8217;s what drive the 28 Million plus users to keep engaging with the platform.</p><p>The limits are however clear</p><ul><li><p>How they are unable to reliably follow instructions within memory</p></li><li><p>Nor are they able to gracefully handle memories beyond their context length</p></li></ul><div><hr></div><h2>The lack of reliable AI memory</h2><div class="pullquote"><p>What holds back AI (and AI agents) is &#8230;<br><strong>The lack of reliable understanding, in memories</strong></p></div><p>Because here lies an irony, these models are no doubt knowledgeable and capable.</p><p>Today&#8217;s best models for both open source (ie. DeepSeek R1) and close source, for example, is no-doubt capable of doing PhD level math and physics, with some degree of reliability (let say 1-out-of-30 times)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QulY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QulY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 424w, https://substackcdn.com/image/fetch/$s_!QulY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 848w, https://substackcdn.com/image/fetch/$s_!QulY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 1272w, https://substackcdn.com/image/fetch/$s_!QulY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QulY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png" width="593" height="388.13804945054943" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:953,&quot;width&quot;:1456,&quot;resizeWidth&quot;:593,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QulY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 424w, https://substackcdn.com/image/fetch/$s_!QulY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 848w, https://substackcdn.com/image/fetch/$s_!QulY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 1272w, https://substackcdn.com/image/fetch/$s_!QulY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1fed6aa-cf38-4c87-b843-48dd57029510_2007x1313.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Grok, demonstrating its ability to calculate Mars to Earth orbital transfer window</figcaption></figure></div><p>But yet they lack the reliability to do basic college-level task (30-out-of-30 times), be it as a cashier, or any simple agent needing to do a long multi step process.</p><p>This is also known, as the &#8220;Compounding AI Agent Error&#8221; problem</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GfzF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GfzF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 424w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 848w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 1272w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GfzF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png" width="560" height="344.3415077202543" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:677,&quot;width&quot;:1101,&quot;resizeWidth&quot;:560,&quot;bytes&quot;:242164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GfzF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 424w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 848w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 1272w, https://substackcdn.com/image/fetch/$s_!GfzF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff942a0b9-608a-4fc0-ab86-4372bfb6f6ec_1101x677.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Even Google DeepMind Founder and CEO, Demis Hassabis, was warning about the compounding AI error problem, just a few days ago at the Google Vertex Event.</figcaption></figure></div><p>And fixing the reliability problem does not need a bigger model. We already have proof of this as observed across our platform&#8230;</p><div><hr></div><h2>How reliability is being solved in production today?</h2><p>One of the interesting patterns we have observed on our platform is the workload differences between individual and scaled-up commercial use cases at featherless</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x7Cd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x7Cd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 424w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 848w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 1272w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x7Cd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png" width="724" height="291.8873626373626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:587,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:1984523,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x7Cd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 424w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 848w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 1272w, https://substackcdn.com/image/fetch/$s_!x7Cd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d092e0c-11b1-4153-a9d2-5dcd00be8e32_4116x1660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Featherless.ai individual users model usage chart - by finetune, and by model class</figcaption></figure></div><p>For example, in the above, we show, how our individual users are running thousands of fine-tuned model - with a bias towards the top models like the DeepSeek R1, or LLaMA3 70B as expected.</p><p>But this dramatically changes when we see the models running by commercial users in production at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yEin!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yEin!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 424w, https://substackcdn.com/image/fetch/$s_!yEin!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 848w, https://substackcdn.com/image/fetch/$s_!yEin!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 1272w, https://substackcdn.com/image/fetch/$s_!yEin!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yEin!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png" width="470" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:470,&quot;bytes&quot;:2675503,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yEin!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 424w, https://substackcdn.com/image/fetch/$s_!yEin!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 848w, https://substackcdn.com/image/fetch/$s_!yEin!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 1272w, https://substackcdn.com/image/fetch/$s_!yEin!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6947f66-40d5-41c4-8f45-9bc87f8522c6_3844x3844.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Commercial scaled-up production workload</figcaption></figure></div><p>The graph is not a mistake, the vast majority of the production workload we see at scale (by request or token count). Is the 1-year-old mistral Nemo-12b (including both base and finetuned)</p><ul><li><p>Not the latest model</p></li><li><p>Not the biggest model</p></li><li><p>That is good enough, for predictable prompt engineering or finetuning</p></li></ul><p>This is consistent with AI systems and agents in production at scale, where either or both of the following solutions are used.</p><ul><li><p><strong>AI Engineering:</strong> large problems are scoped, and broken into smaller tasks, solvable via prompt engineering</p></li><li><p><strong>Finetuning:</strong> specialized domain-specific model is finetuned for specific tasks</p></li></ul><p>In both cases, significant engineering effort is required </p><ul><li><p>to design the AI agent, and its workflow, with prompt engineering</p></li><li><p>to calibrate the dataset for finetuning, requiring dedicated specialized talent. Due to the limitations of <strong>catastrophic forgetting</strong></p></li></ul><p>The later, is typically reserved for larger teams, due to the difficulty in scaling the resources required to &#8220;get it right&#8221;. Due to the highly trial-and-error nature of the process involved in both tasks.</p><blockquote><p><strong>Catastrophic forgetting: Is the challenge faced by all existing model, where existing knowledge and capabilities is lost. While adding in new knowledge.</strong></p></blockquote><div class="pullquote"><p>For reliability at scale, it&#8217;s less about model size &#8230;<br>And more about, designing the task, and breaking it into reliable parts.<br><br>Prompt Engineering for initial results, and PMF (Product Market Fit), <br>Finetuning for longer-term reliability.</p></div><h2>How RWKV memory module can solve this</h2><p>If all AI models are already &#8220;capable&#8221; enough to potentially handle the vast majority of commercial tasks. Especially when they are finetuned for high reliability.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8UNC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8UNC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 424w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 848w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 1272w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8UNC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png" width="402" height="445.49508196721314" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:610,&quot;resizeWidth&quot;:402,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RWKV-V4 language modeling architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RWKV-V4 language modeling architecture" title="RWKV-V4 language modeling architecture" srcset="https://substackcdn.com/image/fetch/$s_!8UNC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 424w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 848w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 1272w, https://substackcdn.com/image/fetch/$s_!8UNC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25d14e7-30f3-4b0a-aa9a-31b3a7613f85_610x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A recurrent model, with memories being part of its core design. Dramatically simplify the process of &#8220;finetuning&#8221;. In particular around the topic of catastrophic forgetting</p><p>For example, it is possible to &#8220;memory tune&#8221; the memory module state (Time Mix, in the diagram), which by default is left as blank in our base models. </p><p>This will not induce any <strong>Catastrophic forgetting, </strong>as none of the initial model weights is modified. A process we been testing, and implementing for a few pilot customers.</p><p>Recurrent models, also by their nature/design, is trained to handle memories, much closer to how we humans do so. Which makes it naturally more scalable for solving the reliable memory problem transformers face.</p><p>It also brings about multiple additional benefits, such as 100x lower inference cost (due to its linear scaling nature).</p><div class="pullquote"><p>But more importantly, it&#8217;s not about what it can do today.<br>But what we can build tomorrow, iterating on this technology path.</p></div><div><hr></div><h2>We are by no means perfect today (or ever), <br>it is why we iterate, more than anyone else</h2><p>The RWKV team, is arguably the only team today, who has been consistently be making step function improvements in our architecture every half a year.</p><p>This can be best seen by our model performance chart across the past 2 years, from version 4 to version 7.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I995!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I995!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 424w, https://substackcdn.com/image/fetch/$s_!I995!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 848w, https://substackcdn.com/image/fetch/$s_!I995!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!I995!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I995!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png" width="576" height="407.4725274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1030,&quot;width&quot;:1456,&quot;resizeWidth&quot;:576,&quot;bytes&quot;:333254,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I995!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 424w, https://substackcdn.com/image/fetch/$s_!I995!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 848w, https://substackcdn.com/image/fetch/$s_!I995!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!I995!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74af0ec4-4d11-4578-b7c5-9e67bb015e3a_1702x1204.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A process that we intend to accelerate, moving forward.</p><div><hr></div><h2>How to achieve Personalized AI </h2><p>With the appropriate resource support. We expect to iterate into a reliable version of personalized AI within the next year or two.</p><p>We define this, as being able to &#8220;Memory Tune&#8221; without <strong>Catastrophic forgetting, </strong>and with high accuracy in narrow tasks. With 100 Million tokens or less. Into production use cases.</p><p>Quickly, and without the need for highly specialized professionals.</p><p>This is not considered &#8220;AGI&#8221;, because it will be constrained up to what it was &#8220;designed to be trained on&#8221;. As it would not permanently be learning new knowledge.</p><p>However, this is not an issue for most commercial use cases, as it would become the workhorse that drives the AI agent adoption cycle.</p><div><hr></div><h2>And make it &#8594; Personalized AGI</h2><p>Once personalized AI is mastered, it would allow us to focus on the next step - to automate the process for the model to collect and reflect on their &#8220;day-to-day&#8221; interaction. For further memory tuning &#8220;in the night&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N0Kr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N0Kr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 424w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 848w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 1272w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N0Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png" width="1456" height="865" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:865,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:852389,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159708830?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N0Kr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 424w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 848w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 1272w, https://substackcdn.com/image/fetch/$s_!N0Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9366943b-0f04-4780-88d5-4ab55c6fe42c_4116x2446.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For most part, this would be done incrementally, instead of a giant leap.</p><p>This is due to how its an extension of the Personalized AI memory tuning process. </p><p>The main differentiator, however, is we would now allow the model to decide on new knowledge and information from its day to day experiences to &#8220;learn from&#8221;.</p><p>However, we do not expect this automated tuning process to be infinitely stable. Due to potential memory loss in between past a certain scale. Be it a billion or a trillion tokens.</p><p>However, what this will simply translate to, is a lifespan for a continuous Personal AGI agent. One that can span days, to weeks, months, and eventually years for each iterative improvement.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="4159" height="2773" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2773,&quot;width&quot;:4159,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;trees under cloudy sky during sunset&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="trees under cloudy sky during sunset" title="trees under cloudy sky during sunset" srcset="https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1470252649378-9c29740c9fa8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0fHxzdW5yaXNlfGVufDB8fHx8MTc0Mjc3NjMzNXww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Dawid Zawi&#322;a</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h1>Why all of this is inevitable</h1><p>It is either a binary outcome, can recurrent models, be iterated and scaled (not just by param size), to be more reliable, and have better memories.</p><ul><li><p>We have provided evidence that the above statement is true, on a progressive trend for 2 years, despite the limited resources invested in the team.</p></li><li><p>We have provided evidence that recurrent models can scale up to match the capability of some of the largest open transformer models.</p><p></p></li></ul><p>Assuming the above is true</p><ul><li><p>The high-level roadmap is here, in public.</p></li><li><p>Better and more capable open transformer models will arrive, <br>which will speed up the process for us to iterate on. <br>(little to no training from scratch required)<br></p></li><li><p>We have enough resources to keep scaling our inference platform, <br>while slowly iterating on this roadmap, with a handful of researchers.</p></li><li><p>Even if we Featherless were to suddenly disappear. Enough resources and momentum within the open model space, will let future teams slowly eventually follow this same roadmap laid out.</p><p></p></li></ul><p>Overall, we expect the following timeline, if featherless AI is properly supported in this journey, to seize the opportunity window</p><ul><li><p>&lt; 2 Years for personalized AI</p></li><li><p>&lt; 4 Years for personalized AGI</p><p></p></li></ul><p>Left on its own, with Moore&#8217;s law level of improvement in computing, we expect the hardware requirement to iterate on this roadmap to enter the &#8220;personal computing&#8221; space in 4 years. Once that is reached, we expect a rapid innovation cycle that leads to the same result, be it within or outside the USA.</p><div class="pullquote"><p>So my question is: Do you want to support us in making this happen in 2-4 years?<br>And seize the opportunity with us. Or would you like to miss out on it when it inevitably comes</p></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This model was originally published as Qwerky-72B. However, due to confusion with another similar naming company/model, we have been requested to avoid using the Qwerky name, so we have renamed our models to QRWKV-72B</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[🪿QRWKV-72B and 32B : Training large attention free models, with only 8 GPU's]]></title><description><![CDATA[&#8252;&#65039; Attention is NOT all you need &#8252;&#65039;]]></description><link>https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large</link><guid isPermaLink="false">https://substack.recursal.ai/p/qwerky-72b-and-32b-training-large</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Mon, 24 Mar 2025 17:30:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iyqS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iyqS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iyqS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 424w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 848w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 1272w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iyqS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png" width="1456" height="941" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:941,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iyqS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 424w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 848w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 1272w, https://substackcdn.com/image/fetch/$s_!iyqS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F525f7c7c-725d-4174-b2f7-7d3c6ccc3c04_2743x1773.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We are proud to announce the updated QRWKV-72B<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> and 32B.</p><p>Both models are available on huggingface and featherless.ai</p><ul><li><p>32B | <a href="https://huggingface.co/featherless-ai/Qwerky-QwQ-32B">Hugging Face Link</a> | <a href="https://featherless.ai/models/featherless-ai/Qwerky-QwQ-32B/readme">Featherless AI Link</a></p></li><li><p>72B | <a href="https://featherless.ai/models/featherless-ai/Qwerky-72B/readme">Hugging Face Link</a> | <a href="https://featherless.ai/models/featherless-ai/Qwerky-72B/readme">Featherless AI Link</a></p></li></ul><p>The largest model to date - that is not based on the transformer attention architecture. </p><p>Surpassing existing transformer models in several benchmarks, while following right behind in others.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vJPd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vJPd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 424w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 848w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vJPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png" width="724" height="188.5132075471698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:3975,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:783967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159379897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f937f8b-4b3b-4895-a276-033b71099d9b_4378x1438.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vJPd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 424w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 848w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!vJPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e250c60-f42f-48e4-b56a-bb69981a0277_3975x1035.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>This builds on our previous experiments in converting the QRWKV6, where we converted the previous Qwen 2.5 32B model to RWKV. And the previous 72B preview.</p><p>Which we applied instead for the Qwen-QwQ-32B model and the Qwen-72B model respectively.</p><p>But lets take a step back at what this means &#8230;</p><div><hr></div><h1>We now have a model far surpassing <br>GPT-3.5 turbo, without QKV attention</h1><p>While slowly closing in on GPT-4O-mini</p><p>With lower inference cost, param size, and better performance.</p><div class="pullquote"><p>In 2024: When we proposed scaling up RWKV to replace attention.<br>Many believed transformer attention, is the <em><strong>only</strong></em> viable path <br>to GPT 3.5 or better intelligence. Today this is disproven false.</p></div><h2>We need no super cluster - only a single server.</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6tPh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6tPh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6tPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg" width="1260" height="709" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:709,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6tPh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6tPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d6e658-6567-4250-adc4-962b71a16ee6_1260x709.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Because we were keeping most of the feed forward network layer the same.<br>We can perform the conversion, (barely) within a single server of 8 MI300 GPU&#8217;s</p><p>Requiring the full 192GB VRAM allocation per GPU</p><div><hr></div><h1>How the conversion is done: A summary</h1><p>While more details will be revealed in an upcoming paper. The core idea is similar to the previous <a href="https://substack.recursal.ai/p/q-rwkv-6-32b-instruct-preview">QRWKV6 conversion</a> , but this time we apply it to the Qwen-72B and QwQ-32B models</p><p>At a high level, you take an existing transformer model</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4UqB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4UqB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 424w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 848w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4UqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png" width="1456" height="702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:340672,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159379897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4UqB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 424w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 848w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!4UqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc19f66e6-aa7d-4626-a7a6-30afaed08cfa_2330x1124.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Freeze all the weights, delete the attention layer, replace it with RWKV, and train it through multiple stages</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AD0z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AD0z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 424w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 848w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AD0z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png" width="1456" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:404633,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159379897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AD0z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 424w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 848w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!AD0z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88462dde-6010-46b2-a038-4cca803cee36_2328x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>All while referencing the original model logits as a &#8220;teacher model&#8221;</p><p>More specifically it would be the following</p><ul><li><p>Train the RWKV layer individually, referencing the individual teacher blocks</p></li><li><p>Train the RWKV layer, with the whole model, training on teacher logits</p><ul><li><p>At this point the model is &#8220;usable&#8221; but has much more to improve on</p></li></ul></li><li><p>Train all the layers (both FFNN and RWKV), on teacher logits</p></li><li><p>Train all the layers with longer context length</p></li></ul><p>Unfortunately, due to the limitation of VRAM, our training was limited to 8k context length. However we view this as a resource constraint, and not a method constraint.</p><div><hr></div><h1>Implication:<br>AI knowledge, is not in attention, but FFN</h1><p>Due to the limited token training of 200-500M, of the converted layers. We do not believe that the newly trained RWKV layers, is sufficiently trained for &#8220;knowledge/intelligence&#8221; at this level.</p><p>In other words, the vast majority of an AI model knowledge, is not in the attention but the matrix multiplication FFN (Feed-Forward-Network) layer.</p><p>It would be more accurate, to view the Attention mechanisms, be it transformer based, or RWKV. As a means of guiding the model to focus on &#8220;what the model thinks&#8221; about in the FFN layer.</p><div><hr></div><h1>Benefits: Ideal for large scale application</h1><p>Additionally, with the shift towards inference-time-computing.</p><p>Linear architectures represents a dramatic reduction in both compute and vram requirement cost. Allowing us to scale hundreds to thousand requests per GPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r0u4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r0u4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 424w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 848w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 1272w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r0u4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png" width="472" height="359.83516483516485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1456,&quot;resizeWidth&quot;:472,&quot;bytes&quot;:235521,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159379897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r0u4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 424w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 848w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 1272w, https://substackcdn.com/image/fetch/$s_!r0u4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10c34ad9-606d-4790-a1de-8748b9117965_1500x1144.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>We can now rapidly iterate new RWKV architectures on &lt;100B scales</h1><p>By dramatically reducing the compute requirement for scaling and testing a new RWKV attention architecture. To a small number of GPU&#8217;s</p><p>We will be able to test, iterate, and validate newer architecture changes faster, taking experiments what previously took weeks (or even months), to days.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WeB2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WeB2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 424w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 848w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WeB2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png" width="530" height="374.93131868131866" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1030,&quot;width&quot;:1456,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:333254,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://substack.recursal.ai/i/159379897?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WeB2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 424w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 848w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!WeB2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54099f83-aee3-4afa-9c50-3d4556ea6ac3_1702x1204.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Historically, the RWKV group has been averaging 4 major versions across 2 years. With improvement to both model architecture accuracy and memories at every step.</p><p>A trend which we plan to accelerate moving forward.</p><p>As we work on our roadmap to Personalized AI and eventually Personalized AGI, which you can see more in our following article &#8230;</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:159708830,&quot;url&quot;:&quot;https://substack.recursal.ai/p/our-roadmap-to-personalized-ai-and&quot;,&quot;publication_id&quot;:2073186,&quot;publication_name&quot;:&quot;Featherless AI - recursive dev blog&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!RY89!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png&quot;,&quot;title&quot;:&quot;&#128739;&#65039; Our roadmap to Personalized AI and AGI&quot;,&quot;truncated_body_text&quot;:&quot;If you want to find out more about the latest QRWKV model, that makes all of this possible, it is recommended to read this first:&quot;,&quot;date&quot;:&quot;2025-03-24T17:40:14.797Z&quot;,&quot;like_count&quot;:4,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:99170118,&quot;name&quot;:&quot;Eugene Cheah&quot;,&quot;handle&quot;:&quot;techtalkcto&quot;,&quot;previous_name&quot;:&quot;PicoCreator&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8dcb57e-6203-4be3-ae29-03732db5c5f7_460x460.jpeg&quot;,&quot;bio&quot;:&quot;Builds Attention-Free Transformer AI models (http://wiki.rwkv.com) from scratch, CEO @ featherless.ai (prv recursal.ai) - Also known for k8s infra &amp; UI testing tools, webapps, and GPU.js, Hot-takes/Views are my own&quot;,&quot;profile_set_up_at&quot;:&quot;2022-07-17T02:17:11.664Z&quot;,&quot;reader_installed_at&quot;:null,&quot;twitter_screen_name&quot;:&quot;picocreator&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationId&quot;:1004639,&quot;primaryPublicationName&quot;:&quot;Tech Talk CTO&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://substack.tech-talk-cto.com&quot;,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://substack.tech-talk-cto.com/subscribe?&quot;}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://substack.recursal.ai/p/our-roadmap-to-personalized-ai-and?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!RY89!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F655b233b-f955-4a8e-b220-6e4f392736ef_160x160.png" loading="lazy"><span class="embedded-post-publication-name">Featherless AI - recursive dev blog</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">&#128739;&#65039; Our roadmap to Personalized AI and AGI</div></div><div class="embedded-post-body">If you want to find out more about the latest QRWKV model, that makes all of this possible, it is recommended to read this first&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 4 likes &#183; Eugene Cheah</div></a></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This model was originally published as Qwerky-72B. However, due to confusion with another similar naming company/model, we have been requested to avoid using the Qwerky name, so we have renamed our models to QRWKV-72B</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[QRWKV6 and a charm of finches]]></title><description><![CDATA[And how QRWKV6 stands out among our various RWKV6 experiments]]></description><link>https://substack.recursal.ai/p/qrwkv6-and-a-charm-of-finches</link><guid isPermaLink="false">https://substack.recursal.ai/p/qrwkv6-and-a-charm-of-finches</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 11 Dec 2024 10:59:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g60H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g60H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g60H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 424w, https://substackcdn.com/image/fetch/$s_!g60H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 848w, https://substackcdn.com/image/fetch/$s_!g60H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!g60H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g60H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14930437,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g60H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 424w, https://substackcdn.com/image/fetch/$s_!g60H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 848w, https://substackcdn.com/image/fetch/$s_!g60H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!g60H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf7b2317-9d3e-4cad-82f2-89a283866771_3840x2304.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Happy December Neurips,</p><p>We are proud to announce the triple model weights release of a charm of finches</p><h2>Q-RWKV-6 32B Instruct Preview</h2><p>Our latest frontier model. </p><p>A variant of RWKV-6, converted from an existing Qwen 32B model.</p><p>This is our strongest linear model to date, beating out all previous RWKV, State Space and Liquid AI models, smashing all previous key english benchmarks and evals.</p><p>Excitingly, this unlocks the option of converting existing transformer models to more efficient RWKV linear architecture.</p><p>Its limitation however, is how it inherits its knowledge training, and tokenizer, from the parent model. Which in this case is limited to approximately 30 languages (compared to RWKV 100+ languages)</p><p>See more info: <a href="https://substack.recursal.ai/p/q-rwkv-6-32b-instruct-preview">Announcement article</a><br>Try the model on our: <a href="https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1">Featherless.ai inference</a></p><h2>RWKV-6 Finch MoE 37B</h2><p>Our first RWKV MoE model, for RWKV-6, with 11B out of 37B active parameters. Currently provides one of strongest multi-lingual model</p><p>See more info: <a href="https://substack.recursal.ai/p/flock-of-finches-rwkv-6-mixture-of">Announcement article</a></p><h2>RWKV-6 Finch 7B World 3</h2><p>An overall multi-lingual upgrade of our v6 7B base models, that is a major bump up from our previous 7B models for multi-lingual and mixed use cases.</p><p>This was developed and released under the RWKV foundation. With various contributors from Eleuther AI and RWKV open source group.</p><p>See more info: <a href="https://blog.rwkv.com/p/rwkv-6-finch-7b-world-3-now-with">Announcement article</a></p>]]></content:encoded></item><item><title><![CDATA[QRWKV6 32B Instruct Preview]]></title><description><![CDATA[The strongest, and largest RWKV model variant to date: QRWKV6 32B Instruct Preview]]></description><link>https://substack.recursal.ai/p/q-rwkv-6-32b-instruct-preview</link><guid isPermaLink="false">https://substack.recursal.ai/p/q-rwkv-6-32b-instruct-preview</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 11 Dec 2024 10:51:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_aR2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_aR2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_aR2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 424w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 848w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_aR2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12945638,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_aR2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 424w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 848w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!_aR2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb979eaed-e3c9-4e44-8b68-e226d2ed88ce_3840x2304.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The strongest linear model to date, beating out all previous RWKV, State Space and Liquid AI models, smashing all previous key english benchmarks and evals.</p><p>You can find this model available on both</p><ul><li><p>Hugging Face: <br><a href="https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1">https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1</a></p></li><li><p>Featherless.ai: <a href="https://featherless.ai/models/recursal/QRWKV6-32B-Instruct-Preview-v0.1">https://featherless.ai/models/recursal/QRWKV6-32B-Instruct-Preview-v0.1</a></p></li></ul><p>Note: that as an instruction preview, the model is not considered final</p><p>Trained by converting the weights of the Qwen 32B Instruct model, into a customized QRWKV6 architecture. We were successfully able to replace the existing transformer attention heads with RWKV-V6 attention heads, through a groundbreaking new conversion training process.</p><p>This unique training process was developed by the team at Recursal AI, in joint collaboration with the RWKV and EleutherAI open source community.</p><h3><strong>Benchmarks</strong></h3><p>We compared QRWKV6 against existing open weights models, both transformer based and linear-architecture based.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fC9I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fC9I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 424w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 848w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 1272w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fC9I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png" width="1456" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:216374,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fC9I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 424w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 848w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 1272w, https://substackcdn.com/image/fetch/$s_!fC9I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0051c55b-b57a-430b-a0b2-6f1ec9ec129f_2212x468.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In overall, what is most exciting is how the QRWKV6 converted model, perform similarly to its original 32B model.</p><h3><strong>GPU Sponsor: <a href="https://tensorwave.com">TensorWave</a></strong></h3><p>The conversion process for QRWKV6, was done on 16 AMD MI300X GPUs kindly donated by TensorWave. Each MI300X comes with a whopping 192GB of VRAM, while having comparable H100 level of compute performance.</p><p>This allowed us to reduce the minimum number of nodes required for our training process, and simplify our overall training and conversion process.</p><p>The conversion process took about 8 hours</p><h3><strong>The Exciting</strong></h3><p>Linear models hold promise in substantially lower compute cost at scale. Delivering over a 1000x compute efficiency in inference cost, especially over large context length. A key multiplier unlock for both O1 style inference time thinking, and making AI more accessible for the world.</p><p>This technique is also scalable to larger transformer based models. Which we have since started.</p><h3><strong>The Good</strong></h3><p>The benefit of this process is that we are able to convert any previously trained QKV Attention based model, such as Qwen and LLaMA based models, into a variant of RWKV. Without needing to retrain the model from scratch.</p><p>This allows us to quickly test and prove out the significantly more efficient RWKV Linear attention mechanic at a larger scale, with a much smaller budget, without training from scratch. Proving out the architecture design and scalability of RWKV.</p><p>Once again proving, QKV attention is not all you need. <br>( Someone ping @jefrankle and @srush_nlp )</p><h3><strong>The Bad</strong></h3><p>The disadvantage of this process is that the model inherent knowledge and dataset training, is based on its &#8220;parent&#8221; model. Meaning unlike previous RWKV models trained on over 100+ languages. The QRWKV model is limited to the approximate 30 languages supported by the Qwen line of models.</p><p>Additionally, instead of RWKV based channel mix and feedforward network layers, we retain the &#8220;parent&#8221; model feed forward network architecture design. This means there will be incompatibility with existing RWKV inference code.</p><p>Separately, due to our compute budget, we were only able to do the conversion process up to 16k context length. While the model does exhibit stability beyond the given context length, the following model may need additional training to accurately support larger context length</p><h3><strong>Future Followups</strong></h3><p>Currently Q-RWKV-6 72B Instruct model is being trained</p><p>Additionally with the finalization of RWKV-7 architecture happening soon, we intend to repeat the process and provide a full line up of</p><ul><li><p>Q-RWKV-7 32B</p></li><li><p>LLaMA-RWKV-7 70B</p></li></ul><p>We intend to provide more details on the conversion process, along with our paper after the subsequent model release.</p><div><hr></div><h3><strong>References</strong></h3><ul><li><p><a href="https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1">Model weights and code here</a></p></li></ul><h3>Acknowledgements</h3><ul><li><p>Special thanks to TensorWave and AMD for sponsoring the MI300X training</p></li><li><p>EleutherAI for support and guidance, especially on benchmarks and publishing research papers about the RWKV architecture</p></li><li><p>Linux Foundation AI &amp; Data group for supporting and hosting the RWKV project</p></li><li><p>Recursal AI for its commitment to providing resources and development for the RWKV ecosystem - you can use their<a href="https://featherless.ai/"> featherless.ai</a> platform to easily run RWKV and compare to it other language models</p></li></ul><p>And of course a huge thank you to the many developers around the world working hard to improve the RWKV ecosystem and provide environmentally friendly open source AI for all.</p>]]></content:encoded></item><item><title><![CDATA[Flock of Finches: RWKV-6 Mixture of Experts]]></title><description><![CDATA[The largest RWKV MoE model yet!]]></description><link>https://substack.recursal.ai/p/flock-of-finches-rwkv-6-mixture-of</link><guid isPermaLink="false">https://substack.recursal.ai/p/flock-of-finches-rwkv-6-mixture-of</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 11 Dec 2024 07:38:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7WCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7WCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7WCL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7WCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:152444,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7WCL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7WCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75968f5b-a14e-411c-949d-1b24dedad3e4_1024x1024.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re excited to release the latest addition to the RWKV family of model releases: Flock of Finches 37B-A11B v0.1! </p><p>This is an experimental model that uses 11 billion active parameters, and despite our new flock having been trained on only 109 billion tokens, roughly matches our recently released Finch 14B model on common benchmark evaluation scores. You can find the model and code at <a href="https://huggingface.co/recursal/Finch-MoE-37B-A11B-v0.1-HF">huggingface here</a> , or try it on <a href="https://featherless.ai/models/recursal/Finch-MoE-37B-A11B-v0.1-HF">featherless AI platform here</a></p><p>We leveraged an efficient Sparse Mixture of Experts (MoE) method to supply a higher total parameter count while activating only a fraction of those parameters for any given token. This saves time and uses less compute during both training and inference. As with most architectural choices, there is a tradeoff; increased efficiency comes in exchange for higher VRAM usage. </p><p>From our perspective, the ability to inexpensively train and run a model with greater powers seems very much worth that cost.</p><h1><strong>GPU Sponsor: <a href="https://tensorwave.com/">TensorWave</a></strong></h1><p>We trained Flock of Finches on 16 AMD MI300X GPUs kindly donated by <a href="https://tensorwave.com/">TensorWave</a>, over a period of nearly four weeks. Each MI300X comes with a whopping 192GB of VRAM, which easily accommodated the added VRAM requirements we had for MoE. </p><p>This allowed us to use our limited time efficiently, finding the best hyper-parameters and doing training instead of spending days or weeks developing software workarounds.</p><h1><strong>MoE Overview</strong></h1><p>A large part of the knowledge and intelligence of LLMs come from a component known as the Feed Forward Network (FFN), sometimes called the Channel Mixer. We added a flock of eight new Feed-Forward Network (&#8220;FFN&#8221;) &#8220;experts&#8221; to a Finch 7B checkpoint that had been trained on around 2 trillion tokens, then continued training it for only 109 billion more. </p><p>The original Finch FFN in Flock of Finches is always evaluated like usual, acting like the leader of the flock, and we call it the &#8220;shared expert&#8221;. Alongside this shared expert one additional expert from the flock is chosen for each token, and the results are added together. </p><p>This forms the mathematical equivalent of a double-width dynamically chosen FFN. The shared expert contributes the shared intelligence learned during the original 2 trillion tokens of training, while the new experts in the flock selectively contribute new information depending on context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KMuQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KMuQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 424w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 848w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 1272w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KMuQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png" width="1150" height="629" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:629,&quot;width&quot;:1150,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111650,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KMuQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 424w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 848w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 1272w, https://substackcdn.com/image/fetch/$s_!KMuQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe93e3a95-3fa9-40bf-b2bd-43d03e637059_1150x629.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1><strong>MoE Shared Expert</strong></h1><p>A few choices we made were unusual, and make Flock of Finches a bit different from other MoE architectures you may have encountered in the wild. One such choice was to use a shared expert and add eight fresh experts, instead of replacing the original FFN with eight cloned copies and continuing training from there. </p><p>We found that this setup learned much faster, even when accounting for the extra width and therefore computation it adds. We also discovered that with this setup we were able to use an extremely high initial learning rate for the new experts, eventually annealing it down to the original model&#8217;s learning rate as training progressed.</p><h1><strong>MoE Hash Routing</strong></h1><p>Another unusual choice we made was to use hash routing instead of a trained top-k gated router. We chose this partly for simplicity and speed, but also because it gives us a naturally even token-to-expert routing distribution, which we hope will improve inference efficiency. Hash routing is extremely simple; we take the token index fed into the model plus a prime number and use that result modulo eight as the index of the expert to which that token is sent for processing. Many other MoE models use a learned gating function, which is trained instead of being fixed in advance of training.</p><p>And one final very RWKV-specific quirk was our use of token-shift with these new experts. Ordinarily, RWKV does a unique kind of 1D convolution as part of its FFN called token-shift, which mixes parts of the current and prior token together. This allows the model to perform some kinds of operations in a single layer that a traditional transformer would require two layers to accomplish. We tried various ways of applying token-shift to our new experts, and in the end we found that the most efficient way was to perform the same shift on the input that goes to both the shared and new experts. The gate applied to FFN outputs is also generated from a single token-shift and applied uniformly to the combined output.</p><h1><strong>Benchmark</strong></h1><p>We evaluated Flock of Finches across a range of common industry standard benchmarks using EleutherAI&#8217;s lm-eval-harness. While some benchmarks got higher scores and some lower, it was generally around the same level as our recently released Finch 14B model. This is an interesting result for us, as this model has significantly fewer active parameters (11B versus 14B), and those parameters are more concentrated in the Feed Forward Network portion of the model.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8OeU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8OeU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 424w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 848w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 1272w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8OeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png" width="497" height="183" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:183,&quot;width&quot;:497,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8OeU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 424w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 848w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 1272w, https://substackcdn.com/image/fetch/$s_!8OeU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc362eb-34df-4887-9ac2-b72c0efa12be_497x183.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h1><strong>The Takeaway</strong></h1><p>Flock of Finches 37B-A11B features a new Mixture of Experts RWKV-6 architecture with 11 billion active parameters and 37 billion parameters total. It&#8217;s the largest RWKV MoE model yet, but it&#8217;s just our first step combining MoE with the RWKV architecture. We&#8217;re excited to expand the use of MoE to the Time Mixer portion of RWKV, and to try more complex MoE ideas like employing expert parameter sharing across breadth and depth, and combining a larger number of narrower experts.</p><p>We hope you&#8217;ll give Flock of Finches a try, and see how the RWKV ecosystem is growing with new more powerful models.</p><h1><strong>References</strong></h1><ul><li><p>Weights &amp; Code: <a href="https://huggingface.co/recursal/Finch-MoE-37B-A11B-v0.1-HF">https://huggingface.co/recursal/Finch-MoE-37B-A11B-v0.1-HF</a></p></li></ul><h1><strong>Acknowledgements</strong></h1><ul><li><p>Special thanks to TensorWave and AMD for sponsoring the Flock of Finches MI300X training run</p></li><li><p>Recursal AI for its commitment to providing resources and development for the RWKV ecosystem - you can use their<a href="https://featherless.ai"> featherless.ai</a> platform to easily run RWKV and compare to it other language models</p></li><li><p>EleutherAI for support and guidance, especially on benchmarks and publishing research papers about the RWKV architecture</p></li><li><p>Linux Foundation AI &amp; Data group for supporting and hosting the RWKV project</p></li></ul><p>And of course a huge thank you to the many developers around the world working hard to improve the RWKV ecosystem and provide environmentally friendly open source AI for all.</p>]]></content:encoded></item><item><title><![CDATA[minmodmon: A quickstart to local RWKV]]></title><description><![CDATA[In April we launched our RWKV-based model, EagleX v2. EagleX goes toe-to-toe with modern transformers on performance, while being much cheaper to run, and with an infinite context limit. The most common question I have personally seen about EagleX since then however has been, "How do I run it?".]]></description><link>https://substack.recursal.ai/p/minmodmon-a-quickstart-to-local-rwkv</link><guid isPermaLink="false">https://substack.recursal.ai/p/minmodmon-a-quickstart-to-local-rwkv</guid><dc:creator><![CDATA[Layl Bongers]]></dc:creator><pubDate>Wed, 14 Aug 2024 13:01:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rqLR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rqLR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rqLR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rqLR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png" width="1280" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab64c050-2157-436d-b345-c7520fe63c73_1280x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151215,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rqLR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!rqLR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab64c050-2157-436d-b345-c7520fe63c73_1280x640.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In April we launched our RWKV-based model, <a href="https://substack.recursal.ai/cp/143699561">EagleX v2</a>. EagleX goes toe-to-toe with modern transformers on performance, while being much cheaper to run, and with an infinite context limit. The most common question I have personally seen about EagleX since then however has been, "How do I run it?".</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><a href="https://github.com/recursal/minmodmon">Minmodmon</a> is a small self-contained tool that lets you easily and quickly run RWKV-based models on Windows, on your GPU, <strong>locally</strong>. No dependencies required! Just <a href="https://github.com/recursal/minmodmon/releases">download the latest release ZIP</a> and run it!</p><h1>AI for everyone</h1><p>Our number 1 goal at Recursal is to make sure *everyone* receives the benefits of AI. We don't want you to have to be a technology expert to start running EagleX. There are already plenty of ways to run RWKV, but these tend to be aimed at more expert users.</p><p>Minmodmon was made to be used by anyone wanting to try out AI models, regardless of computer skill. It runs on Windows, requires no separate installs (no python, pip, etc), and needs no command-line expertise.</p><p>However, by design minmodmon is very limited. For a more feature-complete setup, the library <a href="https://github.com/cryscan/web-rwkv">web-rwkv</a> that is used by minmodmon is also used in the excellent project <a href="https://github.com/Ai00-X/ai00_server">ai00_server</a>.</p><h1>What you can do with it</h1><p>Minmodmon is for use with <strong>other</strong> applications. You can't talk to a model directly through its web interface, but it integrates with common standards.</p><p>In particular, I recommend trying out minmodmon with <a href="https://docs.sillytavern.app/">SillyTavern</a>. SillyTavern is an amazing AI chat application that lets you load in AI personas to talk with locally, completely free and open source.</p><p>This is still an early release. If you encounter any issues head over to <a href="https://github.com/recursal/minmodmon/issues">our issue tracker</a> and report them to us!</p><h1>What's next</h1><p>This release is just one step in our plans. We want both local and remote AI to be a seamless experience, putting you in control of what you use and where your data goes. The user experience still leaves much to be desired, but we have big plans in the works.</p><p>If you do not have a powerful GPU necessary to run local models, or just want better performance and larger models, we also recently launched a privacy-focused remote AI service, <a href="https://featherless.ai/">Featherless</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[🦅 EagleX v1 : Soaring past LLaMA 7B 2T in both English and Multi-lang evals (RWKV-v5)]]></title><description><![CDATA[A linear transformer has just cross the gold standard in transformer models, LLaMA 7B, with less tokens trained in both English and multi-lingual evals. A historical first.]]></description><link>https://substack.recursal.ai/p/eaglex-17t-soaring-past-llama-7b</link><guid isPermaLink="false">https://substack.recursal.ai/p/eaglex-17t-soaring-past-llama-7b</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Sat, 16 Mar 2024 08:33:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nmho!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nmho!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nmho!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 424w, https://substackcdn.com/image/fetch/$s_!nmho!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 848w, https://substackcdn.com/image/fetch/$s_!nmho!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!nmho!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nmho!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png" width="1200" height="820.8791208791209" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:996,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:5727069,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nmho!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 424w, https://substackcdn.com/image/fetch/$s_!nmho!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 848w, https://substackcdn.com/image/fetch/$s_!nmho!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 1272w, https://substackcdn.com/image/fetch/$s_!nmho!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An eagle, flying past llama</figcaption></figure></div><blockquote><p>If you are fine-tuning, we recommend waiting for the full EagleX 2T model coming out later this month instead, unless you are doing so for research purpose. <br><br>This model is released for research purposes, as it represents the major checkpoint that surpasses LLaMA2 7B, as part of our current training to 2T tokens and beyond.</p></blockquote><h1>EagleX 1.7T - in short</h1><p>EagleX 1.7T is a early research release of our 7.52B parameter model training that:</p><ul><li><p>Is part of a larger 2T model training</p></li><li><p>Is built on the <a href="https://wiki.rwkv.com">RWKV-v5 architecture</a><br>(a linear transformer with 10-100x+ lower inference cost)</p></li><li><p><a href="https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers">Is continuation based on the original Eagle 7B model</a></p></li><li><p><a href="https://blog.rwkv.com/p/the-worlds-greenest-ai-model-rwkvs">Ranks as the world&#8217;s greenest 7B model (per token)</a></p></li><li><p>Trained on 1.7 Trillion tokens across 100+ languages</p></li><li><p>Outperforms all 7B class models in multi-lingual benchmarks</p></li><li><p>Passes LLaMA2 (2T) in multiple English evals, approaches Mistral (&gt;2T?)</p></li><li><p><a href="https://www.isattentionallyouneed.com/">All while being an &#8220;Attention-Free Transformer&#8221;</a></p></li></ul><p>We are releasing RWKV-v5 EagleX 1.7T, <a href="https://blog.rwkv.com/p/rwkv-joins-the-linux-foundation-as">licensed under Apache 2.0</a>, which can be used personally or commercially without restrictions. </p><ul><li><p><a href="https://huggingface.co/recursal/EagleX_1-7T">Download from HuggingFace </a></p></li><li><p>Try it online today on </p><ul><li><p><a href="https://huggingface.co/spaces/recursal/EagleX-7B-1.7T-Gradio-Demo">our hugging face</a></p></li><li><p><a href="https://recursal.ai/">our new cloud platform</a></p></li></ul></li><li><p>Use our reference <a href="https://pypi.org/project/rwkv/">pip inference package</a>, or any other community inference options (<a href="https://github.com/josStorer/RWKV-Runner">Desktop App</a>, <a href="https://github.com/saharNooby/rwkv.cpp">RWKV.cpp</a>, <a href="https://wiki.rwkv.com/basic/play.html">etc</a>) , and use it anywhere (even locally)</p></li><li><p><a href="https://github.com/RWKV/RWKV-infctx-trainer">Fine-tune using our Infctx trainer</a></p></li><li><p><a href="https://github.com/huggingface/transformers/pull/26963">[Pending PR] Get support merged into Huggingface transformers!</a></p></li><li><p><a href="https://docs.google.com/spreadsheets/d/1PFELH3u8yQlr-bGs9D5lBYXCXqSFZw2O0vfW084jbgI/edit?usp=sharing">All eval data can be found in the google sheet here</a></p></li></ul><h1>What does it mean to fly past LLaMA 7B?</h1><p>It is a definitely a very big claim to say you have caught up and pass the &#8220;Gold Standard&#8221; of the 7B weight class from scratch, which nearly every other major open access model is built on (allegedly even Mistral). Even more so given that this is done with a comparatively lower dataset token count of 1.7 trillion token (vs. 2 trillion tokens).</p><h1>Going big on eval data</h1><p>As this is a entirely different model, trained from scratch, there will be evals that we win and we lose, which we are fully transparent about, in showing how we are ahead of LLaMA 7B on average.</p><p>Instead of simply cherry picking 14 different evals which we won and calling it a day with a victory, we ran ALL the benchmarks in EleutherAI `<a href="https://github.com/EleutherAI/lm-evaluation-harness">lm-eval-harness</a>`, at commit `f78e2da` that we could do, with the following limitations:</p><ul><li><p>It has to complete in under 30 minutes on 8x4090 (we were running lots of evals)</p><ul><li><p>This rules out some of the rather more expensive long chain of thought evals</p></li></ul></li><li><p>We excluded all the personality / alignment evals</p></li><li><p>Eval has to be executable across a wide variety of models, via lm-eval-harness</p></li><li><p>All evals are 0 shot (no 5 shot-ing an MCQ question)</p></li><li><p>We limited comparison to other models within the 7B weight class</p></li></ul><p>These resulted into running 60+ major eval groups, which generated over 1,000+ data points per model. A data point count so high, that we had to drop standard error deviations, just to ensure the raw CSV file can be loaded in MacOS numbers.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mRg-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mRg-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 424w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 848w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 1272w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mRg-!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png" width="1200" height="187.0879120879121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:227,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:240303,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mRg-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 424w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 848w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 1272w, https://substackcdn.com/image/fetch/$s_!mRg-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2550c6fd-9cb6-4cf2-a379-279dc4c1e2f4_3612x564.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">What it takes to fit 184 english eval data point onto the screen.</figcaption></figure></div><p>Whew, that&#8217;s a crazy number of data points to digest. Let me break it down to more digestible parts:</p><ul><li><p>English perplexity</p></li><li><p>Multi lingual performance</p></li><li><p>21 English Eval Focus</p></li><li><p>183 English Evals</p></li></ul><p>All data shown here is made available in the Google Sheet over here:</p><blockquote><p>We included explanations of what several of the evals mean, which you can keep in mind in future eval results you see (demystify what those numbers mean!)</p></blockquote><div><hr></div><h1>Improved English Perplexity </h1><p>We start with basics: Perplexity. This is the loss value against the test dataset (lower score = better), i.e. how good the model is with next token prediction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oxcX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oxcX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 424w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 848w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 1272w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oxcX!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png" width="1200" height="426.9230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:518,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:318875,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oxcX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 424w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 848w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 1272w, https://substackcdn.com/image/fetch/$s_!oxcX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484d5b6b-1e4f-41ba-902a-2577020e6e87_2512x893.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In general, with the perplexity improvements, the EagleX model outperforms LLaMA2-7b, ranking between Falcom/LLaMA2-7b and Mistral.</p><div class="pullquote"><p><strong>Why do experts care about perplexity?</strong><br>Eval in general can be very subjective, and opinion driven, and commonly gives mixed results. Perplexity in a way gives the TLDR summary for most experts to start with</p></div><h1>Leading Multi-lang Perplexity &amp; evals</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tlOH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tlOH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 424w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 848w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 1272w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tlOH!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png" width="1200" height="281.86813186813185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:342,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:269234,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tlOH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 424w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 848w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 1272w, https://substackcdn.com/image/fetch/$s_!tlOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c7a3a64-3ea4-424b-9245-4b5a84608b71_2648x622.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mE6U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mE6U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 424w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 848w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 1272w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mE6U!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png" width="1200" height="418.68131868131866" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:508,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:340434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mE6U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 424w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 848w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 1272w, https://substackcdn.com/image/fetch/$s_!mE6U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52a6d947-e984-4e09-beff-6437cb55b87a_2559x893.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>EagleX maintains the lead for best in class multi-lingual performance, with the incremental improvements we&#8217;re making to the Eagle line of models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FE7f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FE7f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 424w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 848w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 1272w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FE7f!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png" width="1200" height="358.5164835164835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9170f5fa-4616-4507-89ec-122e50178116_1739x519.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:435,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:101919,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FE7f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 424w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 848w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 1272w, https://substackcdn.com/image/fetch/$s_!FE7f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9170f5fa-4616-4507-89ec-122e50178116_1739x519.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most of the tasks here are common sense reasoning tests of wide variety of formats, across languages including <a href="https://blog.rwkv.com/i/141130059/multi-lingual-performance-details">23 of the world&#8217;s most widely used languages.</a></p><p>For the remaining languages, we urge the community to test and judge it themselves, over a 100+ languages was trained. Over time, we would want more languages to be added into evals.</p><div class="pullquote"><p><strong>Why is multi-lingual perf important? </strong><br>The goal of the RWKV project &amp; Eagle line of models, is to build <strong>inclusive</strong> AI for everyone regardless of their language. Our mission is to build AI models not just made for English, but also for the 83% of the world&#8217;s population using a non-English language everyday.</p></div><h1>21 English Evals</h1><p>Nevertheless, English is still important. We reduced the evals down to 21 of the argubly most popular English evals, such as Lambada, Glue, Swag, Winogrande, TruthfulQA, MMLU:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jgJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jgJp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 424w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 848w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 1272w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jgJp!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png" width="1200" height="199.45054945054946" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:242,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:319914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jgJp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 424w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 848w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 1272w, https://substackcdn.com/image/fetch/$s_!jgJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0331cd9-82f2-46eb-8544-c07cb8b08bb3_2817x469.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Narrowing it down to the 4 models that most of us actually care about - LLaMA, Mistral, EagleX and Eagle-7b - the new EagleX model outperforms LLaMA-2-7b on average across the 21 evals, and lags not far behind Mistral.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vSm3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vSm3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 424w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 848w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 1272w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vSm3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png" width="1119" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:1119,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68054,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vSm3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 424w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 848w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 1272w, https://substackcdn.com/image/fetch/$s_!vSm3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f87f388-2250-4dc9-aa60-6c5e5bdc86e6_1119x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Keep in mind that this average shown, is across all 21 evals</figcaption></figure></div><div><hr></div><h4><strong>The Good</strong></h4><p>Now, let&#8217;s look at where our model is blowing the rest of the models out of the water.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CLIM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CLIM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 424w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 848w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 1272w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CLIM!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png" width="1200" height="322.25274725274727" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:391,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:311417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CLIM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 424w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 848w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 1272w, https://substackcdn.com/image/fetch/$s_!CLIM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F478ded26-6adf-4b2d-a245-e96216b6a598_2478x666.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>First, the big stand out is the first 6 evals, which even our small 1.7T trained model beats out even Mistral 2T++ trained model (sciq, glue, anli, mmnli, swag), across multiple tasks focused around either contextual based simple Q&amp;A with common sense reasoning, or deductive logic. EagleX also performs better than LLaMA-2-7b in wingrade and wnli evals, which also involves contextual common sense reasoning as well. This implies that the EagleX model would be applicable in RAG use cases, which are mainly contextual Q&amp;A, with the right prompt engineering.</p><p>Finally, for truthfulqa, while it outperforms LLaMA, but in my opinion, this is still indicative of how vulnerable all models are from learning common human misconceptions from the web, seeing how bad the scores are across all models.<br>(to be fair, this is hard for most humans as well)</p><blockquote><p>PS: The jump for glue/mnli was high enough, that we needed to check the dataset specifically for contamination. Which we were not be able to find any. This jump is currently being attributed to multiple training datasets, along with data augmented / machine rewritten instruct dataset following a similar structure.</p></blockquote><div class="pullquote"><p>Strong common sense reasoning over context, <br>has very strong applicable use cases for multiple RAG use cases</p></div><h4><strong>The Mixed</strong></h4><p>Next: the eval sets with mixed results. Here, we have very similar evals with 2 major variants. The results between EagleX and LLaMA are close enough, that it&#8217;s hard to say which model is clearly better between the two for these evals. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P0Sa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P0Sa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 424w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 848w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 1272w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P0Sa!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png" width="1200" height="403.02197802197804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:177411,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P0Sa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 424w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 848w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 1272w, https://substackcdn.com/image/fetch/$s_!P0Sa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae663f4a-2522-4615-891b-44a018f02f1e_1651x554.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What&#8217;s interesting, is that even though logiqa can be seen as form of &#8220;common sense&#8221; reasoning test, the EagleX model scored much lower compared to the 6 evals (sciq, glue, anli, mmnli, swag). This could mean that while the model is better at reasoning given a context, but it lacks the depth of knowledge compared to other models with more token training.</p><div><hr></div><h4><strong>The &#8220;Not too bad&#8220; and the &#8220;Really Bad&#8221;</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p9de!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p9de!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 424w, https://substackcdn.com/image/fetch/$s_!p9de!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 848w, https://substackcdn.com/image/fetch/$s_!p9de!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 1272w, https://substackcdn.com/image/fetch/$s_!p9de!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p9de!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png" width="1200" height="338.7362637362637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:411,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:306187,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p9de!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 424w, https://substackcdn.com/image/fetch/$s_!p9de!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 848w, https://substackcdn.com/image/fetch/$s_!p9de!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 1272w, https://substackcdn.com/image/fetch/$s_!p9de!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c8533a9-e578-45c7-9b4b-7cb081d14483_2555x721.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These are the evals the EagleX performs worse on compared to both Mistral and LLaMA. However, for the evals that we&#8217;ve lost to LLaMA, it&#8217;s by a narrow margin. But we&#8217;ll be keeping track of these as we train past 2T tokens.</p><p>Let&#8217;s look what went really badly: Math.</p><p>The results for arithmetic eval sank drastically, like a rock, even compared to our original Eagle model.</p><p>What went wrong?</p><p>We dug through the dataset we used for training, and realized we missed out the entire math dataset (along with a few others) due to an error. Oops. </p><p>This emphasize the importance of maintaining the dataset composition over the training run. We&#8217;re adding math back for future runs.</p><blockquote><p>We expect overall math score to rise back up as the training continue, however realistically IMO - no one should be depending on a 7B model for math (just saying)</p></blockquote><div><hr></div><h2>183 English Evals</h2><p>We do not simply want to cherry pick 9 or 21 evals and claim victory over LLaMA, or even Mistral. So, let&#8217;s zoom out, and look at it holistically across 183 English evals.</p><p><a href="https://docs.google.com/spreadsheets/d/1PFELH3u8yQlr-bGs9D5lBYXCXqSFZw2O0vfW084jbgI/edit?usp=sharing">You can view the full results here</a></p><p>Although using the overall averages across all the evals does have a bias the results towards larger eval sets (due to double counting, e.g. mmlu overall and many indivudall mmlu test), it does not change the ranking among the EagleX, Mistral, LLaMA and the original Eagle models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nScm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nScm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 424w, https://substackcdn.com/image/fetch/$s_!nScm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 848w, https://substackcdn.com/image/fetch/$s_!nScm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 1272w, https://substackcdn.com/image/fetch/$s_!nScm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nScm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png" width="767" height="566" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:566,&quot;width&quot;:767,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nScm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 424w, https://substackcdn.com/image/fetch/$s_!nScm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 848w, https://substackcdn.com/image/fetch/$s_!nScm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 1272w, https://substackcdn.com/image/fetch/$s_!nScm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c7691a5-4e2d-473a-9c54-7e837b1ffb44_767x566.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>However these results is extremely useful for smaller insights, for example</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2noV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2noV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 424w, https://substackcdn.com/image/fetch/$s_!2noV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 848w, https://substackcdn.com/image/fetch/$s_!2noV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 1272w, https://substackcdn.com/image/fetch/$s_!2noV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2noV!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png" width="1200" height="168.95604395604394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:205,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:69101,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2noV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 424w, https://substackcdn.com/image/fetch/$s_!2noV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 848w, https://substackcdn.com/image/fetch/$s_!2noV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 1272w, https://substackcdn.com/image/fetch/$s_!2noV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea7f07c-9224-4f58-9a5a-c8d8637332f0_1658x234.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The EagleX model lost to LLaMA2 when it comes to US history, but won in world history. This makes sense, given the broader approach we took to making the dataset from a more inclusive, more global view, instead of a US centric one.</p><p>The detailed insights will be used by our dataset team to iterate and improve on our future datasets.</p><div class="pullquote"><p>How the model answer, is a reflection of the dataset experiences it has learnt<br>How much resources the model consumes, is a reflection of its architecture</p></div><h1>Perhaps a good dataset + Scalable architecture: is all you need?</h1><p>One of the biggest change we did was to change the dataset for the current 1T tokens, which now uses a cleaner filtered set of data with <strong>careful considerations to ensure  permissible licensed content sources used</strong>.</p><p>There are also huge implications on the fact, the model crossed the llama2 line earlier then the plan schedule. That either the architecture is more efficient in training, or that the improvements in dataset quality has a large impact in model performance.</p><p>The following is a summary of the dataset used, its public release will be made available next month after the current 2T training is completed.</p><pre><code>## 15% Code 

Contains code/programming related topics
- the-stack
- codeparrot
- devopedia
- mdn

## 15% Multi lang

Generally multi-lang webtext
- sea-lion (Singapore)
- madlad
- culturax
- multi lang wiki

## The giant soup

Creative content
- fandom (only sites with permissive licenses, and low spam)
- scp-foundation

Wikipedia
- Various Permissively licensed wikis.
- wikipedia

Papers:
- Mainly arxiv (Permissive Licenses) and pes2o

Books:
All the books contained in out train sets are public domains books.
- gutenberg, 
- standardebooks

Webtext
- webtext
- refinedweb (Note: This chunk made the model worse, we recommend against refinedweb in future trains)
- slimpajama
- europarl
- eurlex.
- stackexchange

Various
- aya (multilang convo)
- some system prompt, instruct
- long list of sub 100B training datasets on HF
- rewritten text !!! (splicing in, to replicate the rewritten web paper)</code></pre><div><hr></div><h1>Why does the architecture matter?</h1><p>We are over a 100x more scalable then the transformer architecture.<br>Transformers became the most prominent architecture in AI, not because it was the best, but it was the first to successfully scale to billion of parameters in training.</p><p>Till today</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gCLY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gCLY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 424w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 848w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 1272w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gCLY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png" width="616" height="463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:463,&quot;width&quot;:616,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gCLY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 424w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 848w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 1272w, https://substackcdn.com/image/fetch/$s_!gCLY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c33b708-8c96-4179-8393-8274e86e7fcf_616x463.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">CUDA computational cost, for RWKV-based architecture vs transformer models - that quadratic-vs-linear really scales!</figcaption></figure></div><div><hr></div><h1>The Milestone</h1><p>In overall, the release of this model marks an important milestone and transition for many of us, within both the commercial team within Recursal AI, and the open source team in the RWKV group.</p><ul><li><p>Its the first major training done by the Recursal AI team, in partnership with AWS as our main compute provider</p></li><li><p>This model is being released under Apache 2 licensing</p></li><li><p>The fully trained 2T model will be released under the RWKV group, under the Linux Foundation</p></li><li><p>The first Non-Transformer Architecture to pass LLaMA2 in evals</p></li><li><p>The strongest Linear Transformer to date</p></li><li><p>Proof you can have both strong multi-lingual and english performance</p></li></ul><div><hr></div><h1>What&#8217;s next?</h1><p>Similar to the original Eagle 7B announcements, the following is the revised goals for the model training</p><ul><li><p>[April 2024] Completion of the 2T Eagle 7B models</p></li><li><p>[March-May 2024] Training of our v6 &#8220;Finch&#8221;line of models</p></li><li><p>[June 2024] v6 MoE model, for GPT 3.5 class performance</p></li></ul><blockquote><p>Disclaimer: All dates are approximate, and is heavily subjected to compute availability from our sponsors/compute-provider/investors</p></blockquote><h1>Want more?</h1><p>If you want find more about the RWKV opensource Project at</p><ul><li><p>Wiki: <a href="https://wiki.rwkv.com/">https://wiki.rwkv.com/</a></p></li><li><p>Discord: <a href="https://discord.gg/bDSBUMeFpc">https://discord.gg/bDSBUMeFpc</a></p></li></ul><p>If you like to try the model today, you can do so on our platform at <a href="https://recursal.ai">recursal.ai</a> - the best place host, run, and create finetunes of the Eagle line of RWKV models.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Launching Eagle 7B - into our public demo, and open router (till March 2024)]]></title><description><![CDATA[Brining the worlds strongest multi-lingual model to the world]]></description><link>https://substack.recursal.ai/p/launching-eagle-7b-into-our-public</link><guid isPermaLink="false">https://substack.recursal.ai/p/launching-eagle-7b-into-our-public</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Tue, 30 Jan 2024 01:07:07 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b5e6e8c0-3c3c-4c55-860e-3ce2855b4d0c_2832x1878.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_zPg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_zPg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 424w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 848w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 1272w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_zPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png" width="1045" height="1622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1622,&quot;width&quot;:1045,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1937700,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_zPg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 424w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 848w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 1272w, https://substackcdn.com/image/fetch/$s_!_zPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6cd492e6-1516-4a76-9c37-82c0a344bc4a_1045x1622.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Eagle-7B model launch has been a great success!<br>More details can be found here: <a href="https://substack.recursal.ai/cp/141146731">https://substack.recursal.ai/cp/141146731</a></p><p>As we are working behind the scenes for our cloud platform launch - we have decided to avoid an eagle-and-egg situation. And decided to provide our latest 7B model for free on both our chat demo and open router endpoints</p><ul><li><p><a href="https://openrouter.ai/models/recursal/eagle-7b">https://openrouter.ai/models/recursal/eagle-7b</a></p></li><li><p><a href="https://rwkv-demo-api.recursal.ai/">https://rwkv-demo-api.recursal.ai/</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vGXS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vGXS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 424w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 848w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 1272w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vGXS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png" width="1456" height="966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:966,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:466101,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vGXS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 424w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 848w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 1272w, https://substackcdn.com/image/fetch/$s_!vGXS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74ba324-1390-4e96-ad84-5e2cad625652_2832x1878.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These models will be provided for free, with rate limits until the launch of our cloud platform in March - stay tuned!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Public RWKV 3B Models via OpenRouter]]></title><description><![CDATA[Our free RWKV 3B models, are now accessible with OpenRouter]]></description><link>https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter</link><guid isPermaLink="false">https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Tue, 12 Dec 2023 20:36:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!d_Fa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In collaboration with OpenRouter.ai - and the RWKV team here at Recursal.AI</p><p>We are proud to announce the public release of RWKV 3B models on open router</p><ul><li><p><a href="https://openrouter.ai/models/rwkv/rwkv-5-world-3b">RWKV v5 world 3B</a></p></li><li><p><a href="https://openrouter.ai/models/recursal/rwkv-5-3b-ai-town">RWKV v5 3B AI Town</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d_Fa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d_Fa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 424w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 848w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d_Fa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png" width="1307" height="1044" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1044,&quot;width&quot;:1307,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:242463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d_Fa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 424w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 848w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!d_Fa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5d760c1-b117-4fe9-a254-adf51d434b5d_1307x1044.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is in line with our previous 2 public API release for RWKV world, and AI town respectively, </p><ul><li><p><a href="https://substack.recursal.ai/p/public-rwkv-v5-3b-models">Public RWKV v5 3B OpenAI endpoint</a></p></li></ul><ul><li><p><a href="https://substack.recursal.ai/p/dedicated-ai-town-server-is-up">Public AI town 3B OpenAI endpoint</a></p></li></ul><p>All of which to make it easier for you to switch existing experiments for testing.</p><div><hr></div><p>We hope, by giving such public access, we get to see more interesting and application built on RWKV. Ping @picocreator on twitter or RWKV discord, to show us what you built with it &#128521;</p><blockquote><p>Recursal.AI intend to keep the RWKV 3B models, open to the public, with some resonable anti-abuse / rate limits - till the end of 2024, as we explore means of optimizing our inference infrastructure at scale under load, for our commercial cloud platform.<br><br>Dedicated RWKV inference cloud service will be out soon.<br><br>Disclaimer: This API service is provided as it is, without any warranties or guarantees.</p></blockquote>]]></content:encoded></item><item><title><![CDATA[Public RWKV v5 3B Models]]></title><description><![CDATA[Now you can play with RWKV v5 3B models, anytime, anywhere]]></description><link>https://substack.recursal.ai/p/public-rwkv-v5-3b-models</link><guid isPermaLink="false">https://substack.recursal.ai/p/public-rwkv-v5-3b-models</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Thu, 07 Dec 2023 00:25:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jyfe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Want to play around with True Open Source AI models, for free?</p><p>Recursal.ai - is now launching a free public preview, of our OpenAI compatible chat / completion API, for RWKV models</p><div class="pullquote"><p><a href="https://rwkv-demo-api.recursal.ai">https://rwkv-demo-api.recursal.ai</a></p></div><p>This API currently supports, the following openAI format</p><ul><li><p>chat completion endpoint (with aitown 3B model)</p></li><li><p>completion endpoint</p></li><li><p>rate limited to 10 requests per 10 seconds</p></li></ul><p>With the following two models</p><ul><li><p><a href="https://huggingface.co/BlinkDL/rwkv-5-world/tree/main">RWKV-World-v5-3B</a></p></li><li><p><a href="https://substack.recursal.ai/p/dedicated-ai-town-server-is-up">recursal-aitown-3B</a></p></li><li><p>recursal-pygmalion-chat-3B (coming soon)</p></li></ul><p>All other API requests, will be rerouted to OpenAI endpoint, which will require an OpenAI authentication key</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jyfe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jyfe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 424w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 848w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 1272w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jyfe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png" width="904" height="770" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62915592-2308-40c9-ad16-35fb17b5c008_904x770.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:904,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85983,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jyfe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 424w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 848w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 1272w, https://substackcdn.com/image/fetch/$s_!jyfe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62915592-2308-40c9-ad16-35fb17b5c008_904x770.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enjoy playing with the RWKV 3B world model</p><blockquote><p>We intend to keep this server up, to the public, with some sensible anti-abuse / rate limits - throughout the year, as we explore means of optimizing our inference infrastructure at scale under load, for our commercial cloud platform.<br><br>Dedicated RWKV inference cloud service will be out soon.<br><br>Disclaimer: This API service is provided as it is, without any warranties or guarantees.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Dedicated AI town server is up !]]></title><description><![CDATA[Help push this to over a 1k town folks]]></description><link>https://substack.recursal.ai/p/dedicated-ai-town-server-is-up</link><guid isPermaLink="false">https://substack.recursal.ai/p/dedicated-ai-town-server-is-up</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 06 Dec 2023 21:14:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rIZB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Want to run AI town models at scale, without a crazy openAI bill?</p><p>Now you can - just by pointing to our dedicated AI town servers at </p><div class="pullquote"><p>https://aitown-demo-api.recursal.ai</p></div><p>Without quotes or ending backslashes in the settings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rIZB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rIZB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 424w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 848w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 1272w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rIZB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp" width="1456" height="678" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:678,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19762,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rIZB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 424w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 848w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 1272w, https://substackcdn.com/image/fetch/$s_!rIZB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb45010f-84e7-4de1-9ef1-3841a46cbf63_1764x822.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This API currently supports, the following openAI format</p><ul><li><p>chat completion endpoint (with aitown 3B model)</p></li><li><p>completion endpoint</p></li><li><p>embedding endpoint (passthrough to openAI, using your openAI key)</p></li></ul><p>You will still need an OPENAI_API_KEY for the embeddings. AI town models at scales beyond 100+++</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wOqg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wOqg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 424w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 848w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 1272w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wOqg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png" width="1456" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4582376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wOqg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 424w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 848w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 1272w, https://substackcdn.com/image/fetch/$s_!wOqg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac4fbfa-64d5-4ba1-9e48-868f950ae3d8_3322x1844.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Alternatively to run locally, you can use the github repo here:</p><p><a href="https://github.com/recursal/ai-town-rwkv-proxy">https://github.com/recursal/ai-town-rwkv-proxy </a><br></p><blockquote><p>We intend to keep this server up, to the public, with some sensible anti-abuse / rate limits - throughout the year, as we explore means of optimizing our inference infrastructure at scale under load, for our commercial cloud platform.<br><br>Dedicated RWKV inference cloud service will be out soon.<br><br>Disclaimer: This API service is provided as it is, without any warranties or guarantees.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Recursal.AI is hiring! (For Q1 2024)]]></title><description><![CDATA[We are building, and growing a team, to provide serverless RWKV inference, tuning, and distillation of RWKV models. At a fraction of OpenAI cost. Drastically lowering your AI bill.]]></description><link>https://substack.recursal.ai/p/recursalai-is-hiring-q4-2023</link><guid isPermaLink="false">https://substack.recursal.ai/p/recursalai-is-hiring-q4-2023</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Fri, 01 Dec 2023 00:03:00 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="6000" height="4000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4000,&quot;width&quot;:6000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Greyscale photo of a neon signboard, that reads : Do What You Love&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Greyscale photo of a neon signboard, that reads : Do What You Love" title="Greyscale photo of a neon signboard, that reads : Do What You Love" srcset="https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1529717730488-7a2492983b2c?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw0MHx8am9ic3xlbnwwfHx8fDE3MDEzODg2ODF8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@ninjason">Jason Leung</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Harrison &amp; Eugene (Picocreator) from the RWKV community (+ a few others), is teaming up to build the cloud and enterprise serverless platform for RWKV.</p><ul><li><p>Supports OpenAI compatible inference endpoints</p></li><li><p>Finetuning of RWKV models</p></li><li><p><a href="https://substack.recursal.ai/p/run-over-120-npcs-in-a-tiny-ai-town">Fully automated distillation, and offloading of existing OpenAI workloads</a> </p><ul><li><p>Lower your existing OpenAI cost, without performance compromise</p></li></ul></li><li><p>More stuff in the wraps</p></li></ul><p>TLDR: Everything you love about the OpenAI/Claude, RWKV edition, and more</p><p>We have raised our seed round, with a large percentage of it set aside specifically to support the RWKV open source model training and development process (~500k for this alone).</p><p></p><h3><strong>What does this mean for the OpenSource RWKV community and group?</strong></h3><p>Mostly positives, more resources will be raised and allocated for the RWKV project. </p><p>There is no license changes to the open source model and code. It will remain apache2. Locked safely within the Linux Foundation</p><p>On a more immediate basis, </p><ul><li><p>more GPUs for running RWKV-X experiments, and dev work</p></li><li><p>salaries for open source dev hires, fully focused on just building on RWKV</p></li><li><p>funding to support RWKV auxiliary activities (from events, to paying for gmail)</p></li></ul><p>Additionally, like other linux foundation projects, we expect over time, that more companies will commercialize RWKV. And work together on the common open source offerings. With the core RWKV project remaining neutral under the linux foundation.</p><p>In addition, the vast majority of Recursal AI models and dataset work (besides customer specific fine tunes, etc), will remain in the Opensource Apache 2 license. </p><div class="pullquote"><p>Recursal AI goal, is to be the cloud provider of choice for RWKV users. </p></div><p>Meaning we have to make sure its cheaper, and more efficient to use us, then to DIY inference on your own with a cloud provider. As such, ensuring RWKV stays open source is aligned with us. As the growth of the open source models, also means the increase in usage of our platform from commercial / enterprise customers. </p><p>A win-win structure that has worked for other opensource software </p><p>Major examples of this OSS and Commercial split in practice</p><ul><li><p>Many open source database, with multiple providers</p></li><li><p><a href="https://www.selenium.dev/support/">Selenium browser automation project</a></p></li><li><p>RISC-V</p></li></ul><div><hr></div><h1>Open Job Roles (as of 30th Nov 2023)</h1><p>For all roles the following generally applies</p><ul><li><p>Its Full time, and remote</p></li><li><p>Includes standard leave, vacation, and insurance packages (depending on country)</p></li><li><p>ESOP for recursal.ai (details to be finalized closer to april 2024)</p></li></ul><p></p><h2>AI / Software Dev - For RWKV opensource projects</h2><ul><li><p>Work directly with BlinkDL, PicoCreator and HarrisonV, on improving the RWKV group collections of various opensource projects or demos. This covers software development</p><ul><li><p>For RWKV group inference / training projects</p></li><li><p>Other RWKV related opensource projects</p></li><li><p>Standardizing and development of Benchmarks </p></li><li><p>RWKV-X architecture itself</p></li><li><p>and public demos</p></li></ul></li><li><p>Highly preferred to be an existing active member of the RWKV community</p></li><li><p>Coordinate with the recursal team and RWKV OSS team, on devrel and marketing matters.</p></li></ul><h2>DevOps &amp; System Administrator</h2><ul><li><p>Help coordinate our growing racks of GPUs, and dev servers</p><ul><li><p>Including racking and stacking physical servers</p></li></ul></li><li><p>Proficient in:</p><ul><li><p>PFSense networking</p></li><li><p>Linux Administration</p></li><li><p>Docker containers build process</p></li><li><p>Github &amp; Gitlab for CI/CD build pipelines</p></li></ul></li><li><p>Coordinate with the recursal and RWKV oss team, on ensuring the dev and production environment uptime, and the CI/CD build processes to support them</p></li></ul><p></p><h2>Senior Frontend / Fullstack developer</h2><ul><li><p>Build the Frontend for the Recursal AI platform</p></li><li><p>Work on the user facing API services</p></li><li><p>Work with the rest of the Recursal team, in developing the platform and its various public demos</p></li><li><p>Be a strong UX / DX advocate, our team is too heavily biased to the infra/model side of things</p></li><li><p>Have strong attention to detail for the user facing UI clients.</p></li><li><p>Coordinate user feedback with the rest of the team</p></li></ul><p></p><h2>Senior Backend / Fullstack developer</h2><ul><li><p>Build and scale the backend for the Recursal AI platform</p></li><li><p>Work on the user facing API services, and backend internal services</p></li><li><p>Work on internal UI for administration / etc, and CLI tooling</p></li><li><p>Have a strong attention to detail for the development API platforms, its scaling design, performance, and its unit tests</p></li><li><p>Coordinate user feedback (for our CLI / APII users) to the rest of the team</p></li><li><p>Work with the rest of the Recursal team, in developing the platform and its various public demos</p></li></ul><p></p><div><hr></div><p>If any of the roles excites you, ping Picocreator on the RWKV discord.<br>Or drop an email with your resume to hr @ recursal.ai </p><p>Do not apply if it does not excite you, to be building a platform, for &#8230;<br>truely Open Source AI</p>]]></content:encoded></item><item><title><![CDATA[🐣 RWKV v5 1.5B - Achieves SOTA multi-lingual performance]]></title><description><![CDATA[The best AI model in the smol <2B param weight class has arrived]]></description><link>https://substack.recursal.ai/p/rwkv-v5-15b-achieves-sota-multi-lingual</link><guid isPermaLink="false">https://substack.recursal.ai/p/rwkv-v5-15b-achieves-sota-multi-lingual</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Mon, 13 Nov 2023 07:40:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8ayR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8ayR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png" data-component-name="Image2ToDOM"><div class="image2-inset image2-full-screen"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ayR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 424w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 848w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 1272w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ayR!,w_5760,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;full&quot;,&quot;height&quot;:307,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:246268,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-fullscreen" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8ayR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 424w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 848w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 1272w, https://substackcdn.com/image/fetch/$s_!8ayR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fa983dc-765a-46c6-8b3e-0063807b7610_1747x368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>RWKV v5 1.5B achieves SOTA status with</p><ul><li><p>Industry leading multi-lingual performance (across xLBD, xSC, xWG, xCOPA benchmarks) by significant margins, against all existing models</p></li><li><p>Comparable performance to falcon-rw-1b in english based benchmark </p><ul><li><p>We win out in LAMBDA, StoryCloze16, arch_challenge, arc_easy, headQA_en, openbookQA, sciq, COPA </p></li><li><p>but looses out very slightly on PIQA, Hellaswag, WinoGrade,ReCoRD, COPA</p></li></ul></li></ul><p>For nearly all use cases under the 2B param model class, RWKV V5 now represents either the best model for multi-lingual use, or a tied 1st place model with falcon-rw-1b</p><p>Making this a strong default model of choice within its weight class.</p><p>A pattern we intend to repeat in the 3, 7, and 14B weight classes respectively. We expect the 3B model to be out by first week december.</p><div><hr></div><p>You can access the model today via the following options</p><ul><li><p>Public Demo: <a href="https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio">https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio</a></p></li><li><p>Recursal AI Chat Demo: <a href="https://1b5-demo.recursal.ai/">https://1b5-demo.recursal.ai/</a></p></li><li><p>Model Download : <a href="https://huggingface.co/BlinkDL/rwkv-5-world/tree/main">https://huggingface.co/BlinkDL/rwkv-5-world/tree/main</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[🏘️ Run over 120+ NPCs, in a tiny AI town with RWKV]]></title><description><![CDATA[Small tiny models, are all you need for NPC chat]]></description><link>https://substack.recursal.ai/p/run-over-120-npcs-in-a-tiny-ai-town</link><guid isPermaLink="false">https://substack.recursal.ai/p/run-over-120-npcs-in-a-tiny-ai-town</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Mon, 13 Nov 2023 01:34:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aKuc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you needed proof that AI models, will play a major future in gaming, look no further.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aKuc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aKuc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 424w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 848w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 1272w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aKuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png" width="443" height="483.3406408094435" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:647,&quot;width&quot;:593,&quot;resizeWidth&quot;:443,&quot;bytes&quot;:331473,&quot;alt&quot;:&quot;https://twitter.com/martin_casado/status/1723237353278636257?s=20&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="https://twitter.com/martin_casado/status/1723237353278636257?s=20" title="https://twitter.com/martin_casado/status/1723237353278636257?s=20" srcset="https://substackcdn.com/image/fetch/$s_!aKuc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 424w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 848w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 1272w, https://substackcdn.com/image/fetch/$s_!aKuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7b5e3-17da-4629-9d3f-9149ed929658_593x647.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Original tweet at:  <a href="https://twitter.com/martin_casado/status/1723237353278636257">https://twitter.com/martin_casado/status/1723237353278636257</a></figcaption></figure></div><blockquote><p>Github repo is here: https://github.com/recursal/ai-town-rwkv-proxy</p></blockquote><p>Working together with the AI town team @ a16z - we have fine-tuned a tiny, highly efficient RWKV 3B and 1.5B model, for the purpose of AI town use cases. All running locally on a macbook pro.</p><div id="youtube2-mPHjk0NTc6A" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;mPHjk0NTc6A&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/mPHjk0NTc6A?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>This has exciting potential both for gaming industry as a large, where we can simulate a large number of NPCs, with believable character chatter.</p><p>However what was more exciting to the team here at recursal, was the process behind it, as the impact extends beyond gaming.</p><p>As this can be replicated on any existing AI agents deployment, while taking advantage of RWKV lower cost advantages.</p><p>Where it is able to run on less then 1/10th the cost of existing GPT3.5 models pricing (with more potential to go lower).</p><div><hr></div><h2><strong>Automated distillation with Recursal</strong></h2><p>The above AI town model was distilled from the original AI agent usage of Open AI, with a process that can be fully automated.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S87R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S87R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 424w, https://substackcdn.com/image/fetch/$s_!S87R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 848w, https://substackcdn.com/image/fetch/$s_!S87R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 1272w, https://substackcdn.com/image/fetch/$s_!S87R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S87R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png" width="1456" height="990" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:990,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:167835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S87R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 424w, https://substackcdn.com/image/fetch/$s_!S87R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 848w, https://substackcdn.com/image/fetch/$s_!S87R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 1272w, https://substackcdn.com/image/fetch/$s_!S87R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5e0f518-c710-4d42-9f7b-4094fa320263_1593x1083.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Our team didn&#8217;t need to build the dataset required for the fine tune by hand. All we simply did was collect the required data by using a proxy in between the AI agents and the OpenAI backend.</p><p>Using the data collected, we then fine-tuned a model respectively, and slowly begun offloading requests to the our RWKV optimized model.</p><p>While at the beginning the RWKV model may not be able to cover all use cases, overtime as the dataset is being built up, the model is incrementally finetuned to cover a wider array of capabilities. </p><p>Effectively, allowing drastic reduction in OpenAI 3.5 / 4 bills, at no performance compromises.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IjRy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IjRy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 424w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 848w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 1272w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IjRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png" width="932" height="591" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9d9e59b-09de-413f-90d3-71d026490277_932x591.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:932,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IjRy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 424w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 848w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 1272w, https://substackcdn.com/image/fetch/$s_!IjRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9d9e59b-09de-413f-90d3-71d026490277_932x591.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This process is not new, in itself, and is commonly known as AI distillation.</p><p>However what is new is the usage of</p><ul><li><p>Automated processes to simplify the distillation process, and dataset cleaning.</p></li><li><p>Usage of smaller / more efficient RWKV models in the process, previous industry attempts at distilling LLaMA2 70B models, has proven to be not price competitive</p></li><li><p>The usage of a smaller model in the router, to decide which requests gets to be routed to the RWKV model, and the openAI platform</p></li></ul><p>All of which would be the focus for our upcoming recursal AI platform launch. Of which we would kick start our closed beta by Mid-December.</p><div><hr></div><p>If the above process excites you, on the potential of drastic cost savings on your existing AI work load, you can signup for our closed beta pilot at the following form.</p><p><a href="https://docs.google.com/forms/d/e/1FAIpQLSekNp_npm7unSmlfWsUsGs3aaBrplgKE8sLiHLoyeJaqvj5bQ/viewform">https://docs.google.com/forms/d/e/1FAIpQLSekNp_npm7unSmlfWsUsGs3aaBrplgKE8sLiHLoyeJaqvj5bQ/viewform</a></p><p></p>]]></content:encoded></item><item><title><![CDATA[RWKV joins the Linux Foundation - As the first AI model under the Generative AI Commons]]></title><description><![CDATA[Putting the "Open Source" into "Open AI"]]></description><link>https://substack.recursal.ai/p/rwkv-joins-the-linux-foundation-as</link><guid isPermaLink="false">https://substack.recursal.ai/p/rwkv-joins-the-linux-foundation-as</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 08 Nov 2023 22:04:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OC5T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>RWKV is the worlds first open source AI model to join the Linux Foundation.</p><p>Ensuring that RWKV continues to grow as a true OSS model (Just Apache 2 license) By the community, for the world </p><p>Thanks <a href="https://twitter.com/LFAIDataFdn">@LFAIDataFdn</a> for welcoming us on board OSS summit</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OC5T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OC5T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OC5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg" width="728" height="604.3934426229508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:709,&quot;width&quot;:854,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:108485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OC5T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OC5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F343fc76a-501c-44b6-8c37-c8a2b55a549e_854x709.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>More information can be found here on the official press briefing: <br><a href="https://lfaidata.foundation/blog/2023/09/21/lf-ai-data-launches-generative-ai-commons/">https://lfaidata.foundation/blog/2023/09/21/lf-ai-data-launches-generative-ai-commons/</a></p><blockquote><p>This is a repost of a past event, prior to the setup of this blog</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[🌳 The World's Greenest AI Model: RWKV's Pioneering Sustainability]]></title><description><![CDATA[10-100x lower inference cost = to lower carbon footprint]]></description><link>https://substack.recursal.ai/p/the-worlds-greenest-ai-model-rwkvs</link><guid isPermaLink="false">https://substack.recursal.ai/p/the-worlds-greenest-ai-model-rwkvs</guid><dc:creator><![CDATA[Eugene Cheah]]></dc:creator><pubDate>Wed, 08 Nov 2023 18:09:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g1il!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With the rapidly growing usage of AI models worldwide, and the threat of global warming. The need for a greener AI model to reduce our carbon footprint is more important than ever.</p><p>In that regard, we are proud to say that <a href="https://wiki.rwkv.com/">RWKV</a> (the model which our team works on), has been <a href="https://ml.energy/leaderboard/">independently benchmarked</a> as the world's greenest and most energy-efficient AI model/architecture, on a per token output basis, for models of the same param sizes (7B params).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g1il!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g1il!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 424w, https://substackcdn.com/image/fetch/$s_!g1il!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 848w, https://substackcdn.com/image/fetch/$s_!g1il!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!g1il!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g1il!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png" width="1456" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:221426,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g1il!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 424w, https://substackcdn.com/image/fetch/$s_!g1il!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 848w, https://substackcdn.com/image/fetch/$s_!g1il!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!g1il!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F725e1f16-a515-44e8-98ba-53847ccc427b_1592x1150.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Benchmarks was done for 7B weight class at : <a href="https://ml.energy/leaderboard/">https://ml.energy/leaderboard/</a></figcaption></figure></div><p>The energy efficiency of the RWKV architecture is derived from the 10-100 times compute efficiency of our linear transformer architecture vs the quadratic scaling of transformer architectures. A benefit we expect to scale better as our models get larger.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4zNW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4zNW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 424w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 848w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 1272w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4zNW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png" width="616" height="463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:463,&quot;width&quot;:616,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74477,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4zNW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 424w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 848w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 1272w, https://substackcdn.com/image/fetch/$s_!4zNW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb3228e-dda4-4e14-a90a-8d5377b10bd7_616x463.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Graph is taken from the <a href="https://arxiv.org/abs/2305.13048">EleutherAI RWKV paper</a>: 5 x Cheaper compute at 1k tokens, 10x Cheaper compute at 2k tokens, 100x+ Cheaper beyond 20k tokens</figcaption></figure></div><p>Combined with how RWKV models scales similarly to transformers in evals, against other models with the same dataset.</p><p>The industry wide benefits for scaling more energy efficient architecture, like RWKV, will be significant for our industry as a whole.</p><p>And we, the team at Recursal AI, are excited to be involved in helping spearhead this charge towards more energy efficient models with RWKV, and potentially other future models/architectures as our industry develops.</p><blockquote><p>Towards a future with more, not less alternatives to AI, <br>with the various unique benefits each architecture will bring us.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://substack.recursal.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Recursal AI development blog! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>