
Exploring PHI-4: Microsoft's New 14 Billion Parameter Open Source Language Model Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://v2.scrimba.com/the-ai-engineer-path-c02v?via=developersdigest In this video, I'll dive into PHI-4, an impressive new open source 14 billion parameter language model released by Microsoft. Available on HuggingFace, this MIT licensed model rivals Llama 70B and Qwen 2.5 70B on the MMLU benchmark. I'll walk you through the model card, some key points from the technical report, and show you how to set up and run PHI-4 on your machine or IDE. Highlights include details on its architecture, training data, and performance, as well as practical tips for using it locally. Don't miss the comparison with models like GPT-4o in coding benchmarks and other diverse applications. Stay tuned for more insights and let me know your thoughts in the comments! Technical Report: https://arxiv.org/pdf/2412.08905 HF: https://huggingface.co/microsoft/phi-4 Ollama: https://ollama.com/ Continue: https://www.continue.dev/ 00:00 Introduction to PHI-4: A New Open Source Language Model 00:36 Model Overview and Technical Specifications 02:04 Setting Up and Running PHI-4 Locally 03:21 Using PHI-4 with VS Code and Continue 04:23 Performance and Technical Report Insights 05:33 Conclusion and Future Prospects
--- type: transcript date: 2025-01-09 youtube_id: H85F0vib85Y --- # Transcript: Microsoft's PHI-4 14B in 5 Minutes we have another impressive small open source language model 54 was just released from Microsoft this is a 14 billion parameter MIT licensed model that you can access on hugging face right now and what's really impressive with this model is even though it's a 14 billion parameter model on the MML U this model ranks up there with llama 3.3 70b as well as quen 257b so in this video I'll quickly go over the model card I'll touch on just a couple pieces within the technical report and then I'll show you how you can set this up and run it on your machine as well as within your IDE if you'd like to try this out first let's read through the description as it stated here 54 is a state-of-the-art open model built upon a blend of synthetic data sets the data is filtered from public domain websites and acquired from academic books and QA data sets the goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning 54 underwent rigorous enhancements and Alignment process incorporating both supervisor fine tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures in terms of the architecture like I mentioned it's a 14 billion perimeter dense model it does only support text inputs and it is optimized for a chat format now in terms of the context length it's 16,000 tokens of input in terms of the gpus that it was trained on it was trained on 1920 h100s and it took 21 days to train the training data was just shy of 10 trillion tokens and the knowledge cut off date is June 24 and earlier for some public available data now what's interesting with this is its original release date was December 12th but this was really overshadowed by things like open ai's 12 days of shipment where they'd ship a new feature each weekday which sort of correlated with a lot of releases that we saw from Gemini so this really fell through the cracks I have to say honestly I didn't even know this came out until there was a little bit of action on X today and I noticed some discussion about it in terms of getting started you can pull down AMA which is a great option on running these models both locally but you can also deploy olama to cloud services as well I do have some videos on that if you're interested so if you don't have Ama to set it up it's really easy to set up you can just download on Mac Linux or Windows so once it's downloaded you can go ahead AMA run 54 now if it's the first time that you're pulling down the model it will take a little bit of time I think 54 is about 10 GB of data in terms of my machine what you're seeing in terms of response time so I have an M3 MacBook Pro with an M3 Pro chip and it has 18 GB of memory I don't have a ton of memory my computer definitely isn't optimized to run models locally I know there are a ton of people out there with like really jacked up machines with like 8 gigs I definitely don't have a machine like that I'm GPU poor so to speak I I don't have have any Nvidia Hardware or anything like that but with that being said the response times that I got are definitely really impressive especially if I was in a pinch if I didn't have internet or if I was in a scenario where maybe I wasn't paying for a service like cursor or a ran out of tokens and I didn't want to renew or something like that this is a really great option on being able to have this model locally now the other thing that I wanted to point out that works really great with AMA is continue continue is a great tool you can pull down it's a vs code extension I believe you can also use this incursor or wind surf even if you'd like given that they are Forks of vs code and what you'll be able to do is with command L you can open up this chat panel on the left hand side and you can ask a question so if I say generate Express server that says hello world I can go ahead and submit that and the great thing with this is it's a much more ergonomic feel especially for coding where you'll be able to just put in the code and see all of the different pieces that you need to do along the way you can just obviously copy the different pieces that you need to paste within your terminal or for the code that it generates itself you can go ahead and just click a button and it will Port that code right directly within the file that you're working in you see this insert at cursor button here or you can apply it similar to the apply feature within tools like cursor that's just another way on how you can leverage 54 if you're interested in using this within a coding context within an ID e next I just wanted to quickly touch on the technical report so this report what's really impressive with this is this model actually outperforms even GPD 40 on GP QA as well as math by about six points on each respectively in terms of some of the coding benchmarks like human ofal this scores at 82.6 whereas llama 3.3 70b instruct scores just a 78.9 in comparison as well as quen 2.5 at 80.4 respectively so it still is about 8 point shy from gbd4 but for a model that's just 14 billion parameters this is really impressive and a really great option for a model of this size so there is a ton of information within this technical report it's about 36 Pages if you're interested in diving into all of the specifics about how this model was trained and how they created this model I'll link all of this within the description of the video I look forward to the future five models that hopefully will will be coming out soon kudos to the team at Microsoft for this new release I love to see these small local models that more and more people will be able to use let me know your thoughts with in the comments of the video but otherwise if you found this video useful please like comment share and subscribe until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.