Mistral AI’s New 8X7B Sparse Mixture-of-Experts (SMoE) Model in 5 Minutes - Developers Digest

Mistral AI’s New 8X7B Sparse Mixture-of-Experts (SMoE) Model in 5 Minutes

Developers Digest•December 11, 2023•5 Min

Share

About this video

In this video I do a quick overview of the recent release of Mistral AI's latest model. What is compelling with this recent release of the model is that it is a mixture of experts model which is the architecture that is rumored to be behind OpenAI's GPT-4 and ChatGPT. I demonstrate a number of different platforms that have implementations of Mistral 8X7B where you can try out the inference for it. Links: https://twitter.com/MistralAI https://www.researchgate.net/publication/331200491/figure/fig1/AS:728014346272777@1550583554684/Mixture-of-experts-MoE-architecture.ppm https://sdk.vercel.ai/ https://app.fireworks.ai/models/fireworks/mixtral-8x7b-fw-chat https://replicate.com/nateraw/mixtral-8x7b-32kseqlen?prediction=nk5qhclblcabfaxyhltfl3h2ia https://openrouter.ai/models/fireworks/mixtral-8x7b https://labs.perplexity.ai/ Connect and Support I'm the developer behind Developers Digest. If you find my work helpful or enjoy what I do, consider supporting me. Here are a few ways you can do that: Patreon: Support me on Patreon at patreon.com/DevelopersDigest Buy Me A Coffee: You can buy me a coffee at buymeacoffee.com/developersdigest Website: Check out my website at developersdigest.tech Github: Follow me on GitHub at github.com/developersdigest Twitter: Follow me on Twitter at twitter.com/dev__digest

Transcript

Want more like this?

Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.

Free forever. No spam.

Related Articles

NVIDIA's Nemotron 3 Super in 6 Minutes

NVIDIA's Nemotron 3 Super in 6 Minutes

More Videos Like This

OpenAI's GPT 5.4 in 10 Minutes: 1M Context, Computer Use, Coding Gains, Benchmarks & Pricing

OpenAI's GPT 5.4 in 10 Minutes: 1M Context, Computer Use, Coding Gains, Benchmarks & Pricing

Mercury 2: The First Reasoning Diffusion Language Model (1,000+ tokens/sec)

Mercury 2: The First Reasoning Diffusion Language Model (1,000+ tokens/sec)

February 24, 2026

Cursor 2.0: Composer and new UX in 12 Minutes

PreviousGet Started With Llamaindex.TS In 8 Minutes NextV0 by Vercel: Create UI Components from Text & Images!

AI Development Stack

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

--- type: transcript date: 2023-12-11 youtube_id: diMGVabULoU --- # Transcript: Mistral AI’s New 8X7B Sparse Mixture-of-Experts (SMoE) Model in 5 Minutes in this video I'm going to be talking to you about myal AI mixture of experts as well as a number of different Services where you can go ahead and play around with this new model and model architecture if you're interested so Mr AI is a company that came onto the scene largely at the end of September when they released via their tweet here to a torent link of their mystal 7B model so this is an incredibly powerful base model and a lot of people have fine-tuned and built models on top of this model and the thing that's great with this model is because it's a 7B parameter model you can run it on something like a standard MacBook or even an iPhone 15 in some cases now the thing with their latest release that has a lot of people excited and you can see from just this tweet there's over 3 million views to it so it's a pretty uh impressive approach that they've taken for how they're releasing models they don't do it like any other company they just throw up this torrent release they don't even give documentation and a lot of people are just sort of left figuring out trying to solve this puzzle so every time they have a release it's all all of a sudden like all hands on deck in the open source Community trying to figure out how to fine-tune this how to even work with it and it's pretty exciting it's a good way to gain some virality in my opinion now the thing about mixture of experts just to touch on this for a moment so I've seen a number of different explanations of this and I've seen some wrong explanations now I'm going to take a stab at it and try my best to explain my understanding of it but essentially when you put in an input into something like Chad GPT what's happening with what's rumored to be behind the scenes of Chad GPT and gp4 is a mixture of experts Network so the rumor with chat GPT or gp4 rather is that it's a 16-way uh 220 billion parameter mixture model and what's happening is when you put in an input it's going to send that message across the different experts in the network as well as to this gating Network function and that gating Network function is essentially the evaluator that's going to take the output from all the experts that receive the message and it's going to evaluate and assign weight to the outputs that they each generated and from those different weightings that's how the final output can give a response so that evaluator gating Network could essentially say okay this expert has given a great response for this message it's going to be weighted higher whereas something further down the line it might be completely ignored if it doesn't generate a good response so that's sort of a highlevel overview on how it works now if you're interested in playing around with us there's a number of different services that you can take a look at so I'd encourage you to take a look at the sdk. forell doai the nice thing with this is you can uh queue up a number of different models with this new mixture model so you can say something like uh write a short story and what it will do is you can see all the responses in tandem side by side and the evaluations for the new mixture model are still sort of up in the air but some open source contributors have generated uh and tested the model and it looks to be comparable on a lot of metrics to something like gbt 3.5 which is pretty impressive for the size and considering that this is now an open source model so in terms of other uh resources where you can play around with this so from what I understand versell actually uses under the hood this fireworks AI API so if you want to interact with with an API directly you can make an account on fireworks. a and play around with that now there's also a version of this on replicate if you want to check out there's another one on open router I'm going to put all the links in the description of the video where you can check all these out and then finally there's one on perplexity Labs now the one thing to not with all of these is not all of these are chat or instruct models so some of them are the very raw output where it's going to output all the different responses of the experts and it's just something to be mindful of so a lot of people are still trying to wrap their head around how to implement something like mixture of experts so I expect there's going to be even more options across the board but if you run into errors and issues in trying around or trying any of these Services just be mindful of that this is something that was just released a couple days ago so that's pretty much it for this video if you found this video useful please like comment share and subscribe and otherwise until the next one