
In this video I show you the new multimodal model by Meta AI that allows you to input speech or text and get a host of different outputs from speech-to-speech translation to speech-to-text translation and more. Links: https://ai.meta.com/blog/seamless-m4t/ https://seamless.metademolab.com/demo https://github.com/facebookresearch/seamless_communication https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf https://huggingface.co/spaces/facebook/seamless_m4t
--- type: transcript date: 2023-08-24 youtube_id: kGhTHtRhouA --- # Transcript: Meta AI's SeamlessM4T Multimodal AI Model for Speech and Text Translations in this video I'm going to be showing you the powerful new foundational multimodal model for speech translation by meta so seamless m4t is a new model that they just released that allows you to input both speech and text and get a variety of different outputs back so you can get speech to speech translation speech to text translation Etc so I'm just going to quickly demonstrate this right off the bat for you and I'll say this is a demonstration for my YouTube channel now this is a playground you can access for free on meta's website here from the blog post and if I just simply click the languages that I'd like to have it translate to you can click up to three and within just a few seconds here you'll see all of the different outputs that the model is capable of so the first thing you'll see here is it has the automatic speech recognition it tells you the actual phrase that you said it detects the language for you it gives you the text version of the languages that you're looking to translate it to and then what is very neat is it also gives you this speech translation for these said languages that you selected so if I just demonstrate these here foreign so as you might imagine there's a host of different applications in where this could be helpful from traveling apps to just daily life so you can imagine if there's a language barrier somewhere and you incorporate this into an app idea that you had you know the possibilities are pretty endless with something like this now the other nice thing with this model is you are able to access it on GitHub there is a paper there's about a hundred pages long but what I'd encourage you to check out is also their hugging face spaces which is very similar to the interface that I just showed you but it also gives you the ability to duplicate the space with just a few clicks here on hugging face if you're looking to run this and have your own private version of it so it's a little bit more expensive to run this model so I'd encourage you if you're looking to do this to you know likely hopefully have had fleshed out an idea before you duplicate the space and start getting build for something like this but it's def definitely a very interesting model to start exploring now similar to the web interface that I showed you here I did notice that when you do try the interface on hugging face depending on the traffic and the queue it can take a little bit of time so I found during business hours it was a little bit slower so I'd encourage you if it is awfully slow just hop over to this interface to play around with it if you're testing for reliability or just sort of the general capability of the thing so we wanted to keep this one short and that's pretty much it for this one so if you found this video useful please like comment share and subscribe and otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.