
Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://v2.scrimba.com/the-ai-engineer-path-c02v?via=developersdigest Links: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message https://lmarena.ai/ https://deepmind.google/technologies/project-mariner/ https://deepmind.google/technologies/project-astra/ https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/ https://x.com/sundarpichai/status/1866868228141597034/photo/1 https://x.com/sundarpichai/status/1866869247013539843 https://aistudio.google.com/live https://aistudio.google.com/
--- type: transcript date: 2024-12-12 youtube_id: a9w0HfkBMfk --- # Transcript: Gemini 2.0 Flash in 10 Minutes what do you see on the screen here you share a screen that shows the Google AI Studio web page specifically the stream Real Time s it also shows a small video feed at the bottom right of the screen and a rent circle with a microphone in the lower left which indicates the audio is being recorded Gemini 2.0 is now here introducing Gemini 2.0 our new AI model for the agentic era just today Google announced a handful of things featured around their new Gemini 2.0 model in this video I'll quickly go over the blog post and some of the key pieces that they mentioned within here and then I'll show you how to get started using Gemini today the first thing to note is this is the first model to be built natively multimodal there are new advances in multimodality like being able to natively generate images as well as have audio output they also describe that they have native tool use which will enable you to build AI agents that bring us closer to their vision of universal assistance starting today they're releasing their Gemini 2.0 Flash experimental model this model is going to be available to all Gemini users in terms of some of the technical details for Gemini 2.0 flash flash is built on the success of 1.5 flash which is the most popular model for developers now with this model it has similar response times notably 2.0 flash even outperforms 1.5 Pro on key benchmarks at twice the speed 2.0 flash also comes with new capabilities in addition to supporting multimodal inputs like like images videos and audio 2.0 flash now supports modal outputs like natively generating images mixed with text and durable text to speech or TTS for multilingual audio finally like I mentioned above it also has a native tool capability like Google search code execution as well as thirdparty user defined functions if we take a look at some of the benchmarks we can see that across a number of different benchmarks that this model outperforms even Gemini 1.5 Pro across a ton of different areas and in some cases by a pretty considerable margin so considering that this model is twice as fast as 1.5 Pro this goes without saying is quite impressive within the blog post they described that they really Envision that this model is going to be a bit of a Harbinger for creating AI agents with its ability to follow complex instruction as well as planning compositional function calling native tool use as well as improved latency instead of stitching together a bunch of different models and a bunch of different Services what's nice with the Gemini 2.0 flash model is we're getting to a place where now you'll be able to send in all of your different input whether it's text video audio or images and this model is ultimately going to be capable of generating voice so it can respond back to you as well as generate images as well as the text responses that we're used to at Google they have a number of different internal projects and they are releasing some of these today where you can sign up for a weit list to be a trusted user one of them is Project Mariner what's interesting with project Mariner is this is similar to a tool that was popularized by mulon think of it as a Chrome extension that allows you to control your browser so you can give it natural language tasks and then it will go and research or perform the actions that need to occur for whatever you're asking for one of the announcements today but this is behind a weight list you do have to sign up to be able to access this now another one is Project Astra this was demonstrated during one of their events last year with project Astra it gives you the ability where you can stream in audio as well as video and get responses back in real time from The Voice as well as the text as shown within the video you can imagine a bunch of different use cases for that right holding up your phone asking what particular things are maybe asking a question for a piece of homework or whatever it might be while project Astra and the official release isn't available today I'm going to show you a place on how you can try something out very similar within their AI Studio another announcement that is built on Gemini 2.0 is Jewels which is agents for developers as they're describing here an experimental AI powered co-agent that integrates directly into a GitHub workflow it can tackle an issue develop a plan and execute it all under the developer Direction and supervision this effort is part of the long-term goal of building AI assistants that are helpful in all domains including coding they have a quick little demonstration here where this is similar to some of the other tools that we've seen out there Devon is one of them that was popularized where this will ultimately be able to tie into your code base and you'll be able to interact with it with natural language and it will perform some of the functions that you typically would and this tool really looks like a human in the loop coding assistant where you can give it instructions on a particular task it will go ahead and update a particular piece of code similar to some of these things like we've seen in cursor agent cursor composer or within tool like Devon it looks similar to something along those lines it will be interesting once this is more generally available this video is brought to you by scrimba the Innovative coding platform that brings Interactive Learning To Life dive into a variety of courses from AI engineering to front-end python UI design and much more scrim is gamechanging feature is their unique scrim screencast format which lets you pause the lesson anytime and start directly editing the teachers code their Cur ulum is built in collaboration with industry leaders including mazilla mdn hugging face and line chain and includes building application with open AI Cloud mistal models and guides you on deploying projects to platforms like Cloud flare while AI tools can assist with coding a solid grasp of the fundamentals is essential for achieving real experience scrimba offers something for everyone from complete beginners to Advanced developers and about 80% of scribus content is completely free sign up for a free account today using my link below and enjoy an extra 20% discount on their Pro plans when you're ready to upgrade and then finally they have some really fun examples on using it for games playing games and actually interacting with the model and asking questions like what should I do within the game now or what approach should I take given what on the screen another thing to note with this model is on the LM Arena which is a place where individuals will go and click their preferred responses between two different models this scores third across the board this model even ranks higher than 01 preview as well as 01 mini on the LM Arena leaderboard effectively what this is a benchmark where all that it is there are two responses that come back to users and users will select which response that they prefer this is a practical use case on how you can rank some of these models because you can see what users ultimately want as an output the fact that it outperformed even these brand new reasoning models from open AI I think is Quite a feat and impressive especially given that Gemini 2.0 flash is generally speaking their smaller model there's also presumably going to be a pro model coming down the pike as well there's also a blog post specifically for developers now a lot of it is on topics that I've already touched on or that I will demonstrate in just a moment here I'll link all of this within the description of the video if you are interested now in terms of where you can access the model right now you can access it within the AI Studio vertex AI as well as from the Gemini web app in my opinion the best place to try this out right now with a ton of really cool features is within AI studio. goole.com within here there is the typical chat interface where you can use it as if it was something like chat GPT but the really cool thing with this is the new stream realtime API within this you can share your screen you can show Gemini your webcam or you can talk to the model which I'll demonstr in just a sec another thing within here is there are these starter apps where you don't even need to pull down a GitHub repo or anything you can just try out the model to see its capabilities here you can see the 2D bounding boxes we can see a wall we can see a shadow and then we also see origami they also have an example for points as well as 3D bounding boxes and then the final example is the starter app which is really a demonstration of that native tool capability here we see the recommended place that came back very quickly we have the location and caption we have the caption here within the app and then we also have the city that it's showing within the map tell me about somewhere rich with ancient history we click it we get the recommended place and then it quickly renders that visual component on the screen there now the coolest feature of this is arguably the multimodal live API with Gemini 2.0 let's just demonstrate this here I'll go ahead and I'll share my screen here and I'll just turn up my speakers hopefully you can hear this and I'll say what do you see on the screen here you share a screen that shows the Google AI Studio we page specifically the stream real time section it also shows a small video feed at the bottom right of the screen and a rent circle with a microphone in the lower left which indicates the audio is being recorded you can see how something like this could be really interesting right let's say you have your code editor up you want to ask questions about the code it will be able to be like an assistant to you by being able to stream in that video and this is just a quick demonstration of having that audio as well as the text stream back as well with all of these different multimodal inputs but otherwise that's it for this video I'll link everything within the description of the video if you found this video useful please like comment share and subscribe otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.