Fine-tuning

Fine-tune a language model with Unsloth

Unsloth is a drop-in optimization for LoRA fine-tuning that delivers 2-5x speedups and 60% less VRAM. Same code, faster runs.

Prerequisites

+NVIDIA GPU with 12GB+ VRAM
+Python 3.10+
+An instruction-tuning dataset

Step-by-Step

1
Install Unsloth
Unsloth ships a single install command that pulls in compatible torch + bitsandbytes versions.
```
pip install 'unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git'
```

Load with FastLanguageModel

FastLanguageModel patches transformers under the hood. The API mirrors HuggingFace.

from unsloth import FastLanguageModel
model, tok = FastLanguageModel.from_pretrained('unsloth/llama-3-8b-bnb-4bit', max_seq_length=2048, load_in_4bit=True)

Add LoRA adapters

get_peft_model wires up LoRA with sane defaults for the model.

model = FastLanguageModel.get_peft_model(model, r=16, target_modules=['q_proj','k_proj','v_proj','o_proj'], lora_alpha=16, use_gradient_checkpointing='unsloth')

Train with SFTTrainer

Same TRL trainer as vanilla LoRA - Unsloth slots in transparently.

from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=ds, args=SFTConfig(output_dir='out', max_steps=200, learning_rate=2e-4, bf16=True))
trainer.train()

5
Save in GGUF for llama.cpp
Unsloth's save_pretrained_gguf converts directly to the format Ollama and llama.cpp consume.
```
model.save_pretrained_gguf('out-gguf', tok, quantization_method='q4_k_m')
```

Run locally

Drop the GGUF into Ollama with a Modelfile and you have a chat-ready model.

echo 'FROM ./out-gguf/model.Q4_K_M.gguf' > Modelfile
ollama create my-model -f Modelfile
ollama run my-model

Common Pitfalls

!Mixing Unsloth's patched model with non-patched HF utilities causes silent miscompiles.
!Skipping use_gradient_checkpointing='unsloth' loses half the speedup.
!GGUF quantization too aggressive (q2_k) destroys quality.

From the Developers Digest stack

DevDigest Academy

Structured AI engineering courses with hands-on labs. Build production-ready apps faster.

Explore DevDigest Academy Watch on YouTube

What's Next

->Try Unsloth's continued pretraining mode for domain adaptation.
->Push to HuggingFace Hub with model.push_to_hub_gguf.

Glossary

Compare Tools

Unsloth vs LoRA->

More Fine-tune a language model

Fine-tuning

LoRA

Fine-tuning

MLX

All Tutorials

Step-by-Step

Install Unsloth

Unsloth ships a single install command that pulls in compatible torch + bitsandbytes versions.

pip install 'unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git'

Load with FastLanguageModel

FastLanguageModel patches transformers under the hood. The API mirrors HuggingFace.

from unsloth import FastLanguageModel
model, tok = FastLanguageModel.from_pretrained('unsloth/llama-3-8b-bnb-4bit', max_seq_length=2048, load_in_4bit=True)

Add LoRA adapters

get_peft_model wires up LoRA with sane defaults for the model.

model = FastLanguageModel.get_peft_model(model, r=16, target_modules=['q_proj','k_proj','v_proj','o_proj'], lora_alpha=16, use_gradient_checkpointing='unsloth')

Train with SFTTrainer

Same TRL trainer as vanilla LoRA - Unsloth slots in transparently.

from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(model=model, tokenizer=tok, train_dataset=ds, args=SFTConfig(output_dir='out', max_steps=200, learning_rate=2e-4, bf16=True))
trainer.train()

Save in GGUF for llama.cpp

Unsloth's save_pretrained_gguf converts directly to the format Ollama and llama.cpp consume.

model.save_pretrained_gguf('out-gguf', tok, quantization_method='q4_k_m')

Run locally

Drop the GGUF into Ollama with a Modelfile and you have a chat-ready model.

echo 'FROM ./out-gguf/model.Q4_K_M.gguf' > Modelfile
ollama create my-model -f Modelfile
ollama run my-model

Fine-tune a language model with Unsloth

Prerequisites

Step-by-Step

Install Unsloth

Load with FastLanguageModel

Add LoRA adapters

Train with SFTTrainer

Save in GGUF for llama.cpp

Run locally

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Fine-tune a language model

LoRA

MLX

Get Smarter About AI Dev

Fine-tune a language model with Unsloth

Prerequisites

Step-by-Step

Install Unsloth

Load with FastLanguageModel

Add LoRA adapters

Train with SFTTrainer

Save in GGUF for llama.cpp

Run locally

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Fine-tune a language model

LoRA

MLX

Get Smarter About AI Dev