Fine-tuning

Fine-tune a language model with LoRA

LoRA (Low-Rank Adaptation) fine-tunes a model by training a tiny adapter instead of the full weight matrix. You get 90% of the quality at a fraction of the compute.

Prerequisites

+GPU with 24GB+ VRAM (A100, RTX 4090) or rented Colab Pro
+Python 3.10+
+A JSONL dataset of instruction-response pairs

Step-by-Step

1
Install the stack
PEFT provides the LoRA implementation. transformers + accelerate handle the training loop. bitsandbytes enables 4-bit quantization.
```
pip install transformers peft accelerate bitsandbytes datasets trl
```

Load model in 4-bit

Quantizing the base model frees VRAM for the adapter.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B', quantization_config=bnb)
tok = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')

Configure LoRA

r controls adapter capacity. Start at 16, increase if quality plateaus.

from peft import LoraConfig, get_peft_model
cfg = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj','v_proj'], lora_dropout=0.05, task_type='CAUSAL_LM')
model = get_peft_model(model, cfg)
model.print_trainable_parameters()

Format the dataset

Use the model's chat template. Mismatched formatting is the #1 cause of useless fine-tunes.

from datasets import load_dataset
ds = load_dataset('json', data_files='train.jsonl')
def fmt(ex):
    return { 'text': tok.apply_chat_template(ex['messages'], tokenize=False) }
ds = ds.map(fmt)

Train with SFTTrainer

TRL's SFTTrainer wraps the loop. 3 epochs, lr 2e-4, batch size scaled to your VRAM.

from trl import SFTTrainer, SFTConfig
cfg = SFTConfig(output_dir='out', num_train_epochs=3, per_device_train_batch_size=4, learning_rate=2e-4, bf16=True)
trainer = SFTTrainer(model=model, args=cfg, train_dataset=ds['train'], dataset_text_field='text')
trainer.train()

6
Merge and serve
Merge the adapter into the base for single-file inference, or keep separate to swap personalities.
```
model = model.merge_and_unload()
model.save_pretrained('llama3-finetuned')
```

Common Pitfalls

!Wrong chat template silently destroys quality. Print one example before training.
!Tiny dataset (<500 examples) overfits in epoch 1.
!Saving without merging produces a tiny adapter that needs the base model to load.

From the Developers Digest stack

DevDigest Academy

Structured AI engineering courses with hands-on labs. Build production-ready apps faster.

Explore DevDigest Academy Watch on YouTube

What's Next

->Run evals (MMLU, HellaSwag, your task-specific benchmark) before/after.
->Try DPO or ORPO for preference tuning on top of SFT.

Glossary

Compare Tools

LoRA vs Unsloth->

More Fine-tune a language model

Fine-tuning

Unsloth

Fine-tuning

MLX

All Tutorials

Step-by-Step

Install the stack

PEFT provides the LoRA implementation. transformers + accelerate handle the training loop. bitsandbytes enables 4-bit quantization.

pip install transformers peft accelerate bitsandbytes datasets trl

Load model in 4-bit

Quantizing the base model frees VRAM for the adapter.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B', quantization_config=bnb)
tok = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')

Configure LoRA

r controls adapter capacity. Start at 16, increase if quality plateaus.

from peft import LoraConfig, get_peft_model
cfg = LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj','v_proj'], lora_dropout=0.05, task_type='CAUSAL_LM')
model = get_peft_model(model, cfg)
model.print_trainable_parameters()

Format the dataset

Use the model's chat template. Mismatched formatting is the #1 cause of useless fine-tunes.

from datasets import load_dataset
ds = load_dataset('json', data_files='train.jsonl')
def fmt(ex):
    return { 'text': tok.apply_chat_template(ex['messages'], tokenize=False) }
ds = ds.map(fmt)

Train with SFTTrainer

TRL's SFTTrainer wraps the loop. 3 epochs, lr 2e-4, batch size scaled to your VRAM.

from trl import SFTTrainer, SFTConfig
cfg = SFTConfig(output_dir='out', num_train_epochs=3, per_device_train_batch_size=4, learning_rate=2e-4, bf16=True)
trainer = SFTTrainer(model=model, args=cfg, train_dataset=ds['train'], dataset_text_field='text')
trainer.train()

Merge and serve

Merge the adapter into the base for single-file inference, or keep separate to swap personalities.

model = model.merge_and_unload()
model.save_pretrained('llama3-finetuned')

Fine-tune a language model with LoRA

Prerequisites

Step-by-Step

Install the stack

Load model in 4-bit

Configure LoRA

Format the dataset

Train with SFTTrainer

Merge and serve

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Fine-tune a language model

Unsloth

MLX

Get Smarter About AI Dev

Fine-tune a language model with LoRA

Prerequisites

Step-by-Step

Install the stack

Load model in 4-bit

Configure LoRA

Format the dataset

Train with SFTTrainer

Merge and serve

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Fine-tune a language model

Unsloth

MLX

Get Smarter About AI Dev