
Check out my latest script in the Developer's Digest GitHub repository, where we explore the power of AI and Langchain. We're leveraging Google's Vertex AI to perform similarity searches - finding similarities based on images or text. This efficient script uses Faiss vector stores for fast and accurate results. Repository: https://github.com/developersdigest/Multimodal_Embeddings_Langchain_Vertex_AI Developer's Digest GitHub: https://github.com/developersdigest/ Setup is simple - just clone the repo and install the required dependencies. Make sure you have "faiss-node" and "langchain" in your package.json, and remember to set the "type" field to "module". This script is a great tool for anyone interested in AI, from beginners to seasoned developers, offering a practical experience of similarity searches with Google's Vertex AI. Don't forget to visit the relevant links for more information: Google Cloud Console: https://console.cloud.google.com/ Google Model Garden: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/5 Vertex AI: https://cloud.google.com/vertex-ai Google Cloud SDK: https://cloud.google.com/sdk/docs/install Langchain Docs: https://js.langchain.com/docs/modules/data_connection/experimental/multimodal_embeddings/google_vertex_ai
--- type: transcript date: 2023-07-26 youtube_id: cxxEsCYt-C0 --- # Transcript: Multimodal Embedding with Langchain in Node.js with Vertex AI in this video I'm going to show you how to set up a multimodal embeddings with langchain in node.js so multimodal embeddings is a new model that was recently released by Google on their vertex AI platform which is currently within a preview now just yesterday there was also a wrapper within Lang chain that was released which I'm leveraging to make this even simpler so the thing with the multimodal embeddings model is that you can include both a combination of text and images so you can query for images based on text or text based on images or images on images and and what have you so it gives you a whole lot of flexibility and it also gives you that nice ability to have it all contained within one vector database or vector location so the first thing you're going to have to do is you're going to have to have a Google Cloud account if you don't have one you'll be able to get some free credits likely off the bat very simple to sign up just go through the flow once you've signed up simply head over to your console click the drop down here create a new project and then once it's created make sure to hop over to your dashboard and just enable all the recommended apis for vertex AI so once you've done that I encourage you just to go to this link here I'll put it within the description and just follow these setup steps so this took me a couple attempts to make sure that it was all working correctly essentially what you have to do you have to download the gcloud CLI so pick your operating system download that and then once it's downloaded you can run this log in from your terminal and you'll have a browser window pop up you'll log in Google and you'll have a success message in your browser and terminal so once you have that all set up you can pop back in to vs code and I'll show you everything you need to do from scratch so essentially what you'll do off the bat is you'll just npm in it Dash Y what that will do will be it will create this package Json here now once you're within the package Json you won't have a handful of these things but I want you to First add this type module because we're going to be using Imports and then I also want you to npm install these two things which I'll just briefly touch on so Lang chain is likely self-explanatory but this f a i s s is Facebook AI similarity search so this is what we're going to be using as essentially our Vector database but in this example I'm going to be saving it directly to our machine locally so we're not going to be reaching out to any third-party Vector database in this just to keep it simple so you can just npm I line chain and node just like that okay so once you have that you can go ahead and make dir an output and actually while you're at it you can make dirt and images folder So within the images folder this will be what you actually embed in this example I'm going to be showing you a handful of images so a couple pictures of Steve Jobs a parrot you know Apple products a few different animals just to be able to run some tests with the actual model itself okay so once you have that and you've filled out the images folder you can go ahead and touch the index.js here and you'll have this a Javascript file that you can work through so I have these comments here just to be able to try and keep it clear and concise as I go through bit by bit everything that I'm going to be showing you so the first thing we're going to do is we're going to import the required module so we're going to be requiring fs and path to interact with our local file system and then we're going to be importing three different modules from Lang chain so we have the experimental vertex AI multi-modal embeddings we have the Facebook AI similarity search and then we also have the document module we'll be loading it so then we'll just simply initialize the vertex AI multimodal embeddings we'll set up a path for where our Vector store will be saved so this will create a folder called Vector underscore store and there will be a DOT index file and a DOT Json once this has all been saved then from there we're going to have a few helper functions just to keep this clean and for you to be able to grab these pieces as you see fit these are essentially helper files you could export and use if you're trying to set up something similar to what I'm showing you here so the first one is just to clear the directory so I'm going to be saving the images to the output just so you'll be able to have a different place to look at the outputs that's not just within the console so say if you want to actually see those images output back within the folder we're going to be encoding them within base64 within the process of when we actually create and save the vectors so we'll be able to just reach for them and save them as we see fet out as we run through this so the next two functions we're going to have an add image function and then we're going to also have one to add text they're very similar but there's some slight differences and essentially what it's doing in the ad images is we're going to read the image path that's being passed in within the argument we're going to actually embed that image at Google's endpoint then we're going to encode it within a base64 and then we're just going to add a little bit of metadata so just an ID the media type and then also the path then we're going to add it within our Vector store and then we're just going to log it out so next we're also going to have a function for text like I mentioned essentially the same thing uh you know specifying the different media type and there's a little bit of a different method as you see here so we have embed image query and this one we're just going to be embedding query but essentially doing the same thing then the next two functions similar to the first two so we have one for image similarity search and then we have one for text similarity search what we're going to do within here we're going to first clear the output each time that they're run so the output is only ever going to be the most recent outputted output files that were generated and so we're going to again log things out just so it's super clear what's happening we're going to read the file path we actually have to embed that image that we want to query our Vector database so say if you have a picture of a dog that you want to run a query against your database you'll have to first embed it once you've embedded it you can actually run the similarity search Vector with score this will return the data we're looking for as well as the score of the similarity so I'll likely mention this again but the similarity score the lower the better essentially or maybe better way to put it if the lower the score the more similar those those files are so there's like a higher relatedness between those two things so we'll log it out then we also have a little helper function which is essentially like a fancy console log just to be able to show all the different examples so similar just like the other one we're essentially doing the same thing except with text here so we're passing in text and we're going to be embedding that and then looking for results and this is the fancy print result function we don't need to worry about this too much it's essentially just giving us a nice output for the console and it's going to give us slightly different information depending on whether it's an image or a piece of text then we're going to we just have a little nice quality of life thing here where if the vector store exists so if that directory exists it's only going to run the query and embeddings of the new items so it's not going to run through a list of a ton of different items if say you have 100 images it's only going to embed the subsequent images or pieces of text that you're trying to query the vector database against so from there what we're going to do is this is going to be where we actually add all of our different files so we're going to specify a number of different images I'm not going to specify all of them because I want to save some of them for when we actually query the vector database itself then we also have a little bit of generated text here to throw in there and then essentially we're just looping through all the images adding them one by one once the images are done we're going to Loop through all the pieces of text and add those to our Vector store one by one and essentially we're just going to wait for this function to complete and then once it's completed we'll be able to perform our similarity searches which I'll go through in a bit more detail here and then finally we'll be able to just save out our Vector store so once you have that you can just go ahead and save and then you should be able to just node index.js and you'll see it's going to start going through our array so it's embedding dog and depending on the size of the image it will take longer as you might expect so you see it's going through dog cat parrot iPhone just looping through all of these and then it's going to Output all of the different queries that we have here but I'm going to actually go through them one by one so the thing with the similarity search and or for both image and text is we have the path and then we have the number of results that we have so if I just go ahead and clear our console here and I just run this again so in this example I'm running the query of dog2 to be embedded and compare it against everything which is in the vector storage here so as you see here we have a similarity score of 0.75 and it's comparing these two things so if I just pull up the one within the vector store within the one that we passed in you can see that arguably it got the most related image to what was passed in so now if I go and perform another similarity search and I pass in a picture of Steve Jobs which I'll just pull up here now in this example I asked for three results and the interesting thing with this is I got the first result as a different picture of himself a picture of air pods and a picture of an iPhone so I found this interesting and arguably this is what I would find most similar at least for the images so the thing with the images you know we have dog cat parrot and it went and reached for these few things from that initial image instead of others so I thought that was pretty neat so and similar if we just pass in our photo of an Apple store we'll see okay now the order is is it's the same results as above but you see the order is different so in this one we're getting airpod and iPhone and then Steve Jobs last so arguably that is also most similar now the thing that I noticed when setting this up is when you have text and images here let me just run an example the results that you get I found more often than not is if you have text results that you're querying and embedding off the bat is it's more likely to return those text results in the most dissimilar examples that it or that it thinks are are similar here so if I just bump this up and say I want the 10 most similar things and I say dogs so we only embedded three pieces of text and you see those are all at the top but then in terms of dogs itself we have dog as the fourth example so this is still very new but I found that there is a bias towards if you're embedding text and then you're querying for text hoping you'd get an image that there is that um that consideration that it does seem to weight text responses higher so I'll just run the Apple example here and similar you see here we have all the text results so arguably Apple Inc uh you know we wouldn't want dogs that are domesticated animals like arguably we'd want one of the other Apple products right so interesting to think about how this works under the scenes I'm curious if anyone has had better luck when you know embedding both images and text and then trying to get images from text that's the one thing that I found to not work as well now the thing to also mention that you could try is let's say we delete our Vector storm and then we also delete where it's adding in text and we just add in images and let's just clear this and run this again so it will re-embed all those images right from scratch and then we'll just ask it at the end for a similarity search of just Steve Jobs so just to reiterate it's there's only images right so hopefully we you know or we not hopefully we won't be able to get a text response so it'll be interesting to see just based on the text which image is resolved or which image is is provided to us so you see here we have our Steve Jobs image and now if I try dogs with only having the images embedded and I asked for let's just say three results we can see we have dog Steve parrot so it sort of gets a little strange right but it did get that image when I passed in that that dog's query so hopefully you found this useful it's still very new so it's experimental both within Lang chain and it's still within a preview within Google I thought I'd show you this early so you can get your hands on playing around with this if this is something you're interested in but as always if you found this video useful please like comment share and subscribe and otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.