
In this video, I introduce the beta release of Grok 2 and Grok 2 Mini. I discuss the new models available on the X platform and their impressive performance, including their ranking on the Chatbot Arena and comparisons to other models like GPT-4 Turbo and Cloud 3.5 Sonnet. I also touch on Grok's focus on factuality, evaluation scores, and the exciting partnership with Black Forest Labs for image generation. Learn how to access these models, even without an X premium account, and explore their potential for real-time information retrieval and multimodal functionality. 00:00 Introduction to Grok Two Beta Release 00:13 Grok Two and Grok Two Mini Overview 00:48 Performance and Comparisons 00:55 Chatbot Arena and ELO Scores 02:00 AI Tutor System and Factuality Focus 03:06 Accessing Grok on X Premium 03:46 Black Forest Labs Partnership 04:24 Enterprise API and Infrastructure 04:39 How to Access Grok and Image Generation 05:36 Conclusion and Final Thoughts
--- type: transcript date: 2024-08-14 youtube_id: yMWzDFoTC8Y --- # Transcript: Grok-2: xAI's New Frontier Model the Beta release of grock 2 is now available to you so I'm going to go over the blog post and then I'm going to point you in the direction where you can leverage this model even if you don't have access to a x premium account by the end of the video right now there are two models available grock 2 as well as grock 2 mini both of these models are now being released to grock users on the xplatform the interesting thing with this model is grock 2 mini is a small but capable sibling of grock 2 and an early version of grock 2 has been tested on the LM CIS leaderboard under sus column R given some of the answers that some people were getting on Twitter and whatnot a lot of people actually thought this was a model from open AI given how impressive it was so what's interesting here is that this is just the smaller version of the grock 2 model this does outperform even Claude 3.5 Sonet as well as gbd4 turbo now it outperforms on the chatbot Arena what the chatbot arena is a place where you can go and you can back to llm you can put in a query you can say something like hello world and what it will do is it will give you two responses side by side and then you can choose the response that you prefer now in a longer example it might be a little bit more nuanced you might give it a coding problem or maybe a riddle or something to that effect and generally speaking this is just a very raw metric of what users prefer now if we compare this to some of the other models here this is the overall ELO score on the chatbot Arena which is how all of these different models rank on the leaderboard here right now it's in fourth place on the arena but mind you this is just the small version and then the other thing to note is the Chad GPT 40 latest release this just came out hours before the release of sus column R and then if you want to get into the particular win rate for the different models that it's compared to you can see for claw 3.5 Sonet which a lot of people think is the Frontier Model it don't performs 54% of the time time Etc in terms of the AI tutor preference for factuality now this is definitely going to be an edge potentially that grock has and the xai team given that they do have access to all of that data that is available on Twitter one key area that the xai team has been working on and this isn't to say that other AI Labs aren't working on this as well but they really have this focus on factuality they have this AI tutor system where they present the different answers to a tutor and then they evaluate the model's capability on two key areas so whether it's following instructions and whether it's providing accurate and factual information in terms of the mlu score we're at 87.5% on grock 2 and then 86.2% on grock 2 respectively so for the human eval or the coding Benchmark grock 2 sits at 88.4% and then grock sits at 85.7% when you compare those that's sitting at fourth place and sixth place respectively in in that human coding evaluation there are a few other pieces with Gro that make it particularly interesting you can access it on X premium right now and the benefit of using it on X is it also gives you that context of all of Twitter so you're going to be able to access real-time information new things that are happening out there similar to something almost akin to perplexity and that sort of retrieval augmented generation that gets pulled from different context sources and then ultimately fed into the llm there is multimodality with the model but this is something that is going to be coming soon and then in terms of the chat experience if you try it out it's a really nice intuitive interface you have all the markdown renderers like you typically see in these chat Bots like chat GPT and what have you it is overall a really nice experience now the other thing that is cool with this announcement is they've partnered with black forest labs and black forest Labs is a leading image generation model and some people even consider it higher than something like mid journey and the thing that really interesting with black forest Labs is this is an open source model so there's a ton of different fine tunes out there of people generating images of themselves in different context with this new flux one model that is available and the cool thing with black forest Labs they're also working on a video generation model and given the results on flux one I wouldn't be surprised if the video generation does ultimately come to X another thing with the announcement is there is an Enterprise API coming there are no details in terms of pricing or anything like that and then they mentioned a little bit about the infrastructure it's going to be built on a bespoke text stock that allows multi- region inference deployment for low latency across the world now in terms of next steps to access grock you can just head on over to x.com and then you can click the grock tab here and you will be able to interact with it say if you want to generate a photo let's say generate a photo of a man in New York City at night so this is that black forest labs image generation where there you go you have a man in New York City and it's really impressive I'd really encourage you to play around with this there are a ton of really great examples on Twitter where you can see what it's capable of there's a ton of examples already on X of just different things that people are trying and it doesn't seem to be as censored or as filtered as some of these other image generation tools that are out there right now now if you don't have access to x.com you can access this through the LM arena. website where you'll be able to go over to direct chat and then you can search for sus column R and interact with the model just like that if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.