Fine-tune a language model with Unsloth
Unsloth is a drop-in optimization for LoRA fine-tuning that delivers 2-5x speedups and 60% less VRAM. Same code, faster runs.
Prerequisites
- +NVIDIA GPU with 12GB+ VRAM
- +Python 3.10+
- +An instruction-tuning dataset
Step-by-Step
- 1
Install Unsloth
Unsloth ships a single install command that pulls in compatible torch + bitsandbytes versions.
pip install 'unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git' - 2
Load with FastLanguageModel
FastLanguageModel patches transformers under the hood. The API mirrors HuggingFace.
from unsloth import FastLanguageModel model, tok = FastLanguageModel.from_pretrained('unsloth/llama-3-8b-bnb-4bit', max_seq_length=2048, load_in_4bit=True) - 3
Add LoRA adapters
get_peft_model wires up LoRA with sane defaults for the model.
model = FastLanguageModel.get_peft_model(model, r=16, target_modules=['q_proj','k_proj','v_proj','o_proj'], lora_alpha=16, use_gradient_checkpointing='unsloth') - 4
Train with SFTTrainer
Same TRL trainer as vanilla LoRA - Unsloth slots in transparently.
from trl import SFTTrainer, SFTConfig trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=ds, args=SFTConfig(output_dir='out', max_steps=200, learning_rate=2e-4, bf16=True)) trainer.train() - 5
Save in GGUF for llama.cpp
Unsloth's save_pretrained_gguf converts directly to the format Ollama and llama.cpp consume.
model.save_pretrained_gguf('out-gguf', tok, quantization_method='q4_k_m') - 6
Run locally
Drop the GGUF into Ollama with a Modelfile and you have a chat-ready model.
echo 'FROM ./out-gguf/model.Q4_K_M.gguf' > Modelfile ollama create my-model -f Modelfile ollama run my-model
Common Pitfalls
- !Mixing Unsloth's patched model with non-patched HF utilities causes silent miscompiles.
- !Skipping use_gradient_checkpointing='unsloth' loses half the speedup.
- !GGUF quantization too aggressive (q2_k) destroys quality.
DevDigest Academy
Structured AI engineering courses with hands-on labs. Build production-ready apps faster.
What's Next
- ->Try Unsloth's continued pretraining mode for domain adaptation.
- ->Push to HuggingFace Hub with model.push_to_hub_gguf.
