You do not need an RTX 4090 or a cloud GPU bill to fine-tune a language model in 2026. Google Colab’s free tier provides a T4 GPU for sessions up to 12 hours. Kaggle’s free tier gives you 30 GPU hours per week on a T4 or P100 with no session time limit. Both are sufficient for LoRA fine-tuning on Llama 3.2 3B, Mistral 7B, or Qwen2.5 7B on datasets under 10,000 examples. The only cost is patience.
Analysis Briefing
- Topic: Free LoRA Fine-Tuning on Colab and Kaggle Notebooks
- Analyst: Mike D (@MrComputerScience)
- Context: A research sprint initiated by Gemini 2.0 Flash
- Source: Pithy Cyborg | Pithy Security
- Key Question: Can free Colab and Kaggle GPUs actually fine-tune a real model, or just toy examples?
What Free GPU Tiers Can Actually Fine-Tune in 2026
Colab’s free T4 has 16GB of VRAM. Kaggle’s free tier alternates between a T4 (16GB) and a P100 (16GB) depending on availability. Both are enough to fine-tune models up to 7B parameters using LoRA with 4-bit quantization via the bitsandbytes library and the peft library from Hugging Face.
The practical ceiling on free GPU tiers is a 7B model with QLoRA (quantized LoRA) fine-tuning on a dataset of 1,000 to 10,000 examples for 1 to 3 epochs. That is a real fine-tuning run that produces a model meaningfully different from the base. It is not a toy. A customer support bot fine-tuned on your own conversation data, a code assistant specialized on your codebase’s style, or a domain-specific Q&A model trained on internal documentation are all achievable in a free Kaggle session.
The models that fit comfortably in 16GB VRAM with QLoRA: Llama 3.2 3B (fits with headroom for larger batches), Mistral 7B Instruct (fits with batch size 4 at 4-bit), and Qwen2.5 7B (fits with batch size 2 to 4 at 4-bit). Llama 3.1 8B fits but leaves little headroom. Anything above 8B parameters requires gradient checkpointing and very small batch sizes, which slows training enough that free tier time limits become a real constraint.
Colab’s session disconnects after 12 hours of continuous use and when the browser tab is idle. Save checkpoints every 500 steps to Google Drive to avoid losing progress on longer runs.
The Minimal QLoRA Fine-Tuning Setup That Actually Works
The standard free-tier fine-tuning stack in 2026 is Hugging Face Transformers plus PEFT plus bitsandbytes plus TRL’s SFTTrainer. All four are free and open-source. Install them in any Colab or Kaggle notebook with one pip command:
pip install -q transformers peft bitsandbytes trl datasets accelerate
A minimal working fine-tuning script for Llama 3.2 3B on a custom dataset:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
import torch
model_id = "meta-llama/Llama-3.2-3B-Instruct"
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# LoRA configuration
lora_config = LoraConfig(
r=16, # LoRA rank: higher = more parameters = more capacity
lora_alpha=32, # Scaling factor: typically 2x rank
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typical output: trainable params: 4,194,304 || all params: 3,217,227,776 || 0.13%
# Load your dataset (must have a "text" column with formatted prompts)
dataset = load_dataset("json", data_files="your_dataset.jsonl", split="train")
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
args=SFTConfig(
output_dir="./lora_output",
num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=2e-4,
save_steps=500,
logging_steps=50,
fp16=True,
),
)
trainer.train()
model.save_pretrained("./lora_adapter")
The LoRA adapter saved at the end is a small file, typically 10 to 50MB rather than the full model’s gigabytes. Load it on top of the base model locally with Ollama or llama.cpp after training is complete. LoRA fine-tuning voice collapse covers a real failure mode in fine-tuning dataset quality: training on AI-generated data rather than genuine human examples produces a model that homogenizes toward generic AI output rather than specializing toward your target behavior. Use real examples from your actual use case, not synthetic data generated by another LLM.
Saving Progress and Deploying Your Fine-Tuned Adapter
Colab and Kaggle sessions are ephemeral. Everything in the runtime disappears when the session ends. Saving your work requires pushing to external storage during training, not after.
Mount Google Drive in Colab with from google.colab import drive; drive.mount('/content/drive') and set output_dir in your SFTConfig to a path inside the mounted drive. Every checkpoint saved during training writes directly to your Google Drive. A disconnected session loses at most the progress since the last checkpoint, not the entire run.
On Kaggle, use the Dataset output feature. Go to your notebook’s output section, select your adapter directory, and save it as a Kaggle Dataset. The saved dataset persists across sessions and can be loaded into future notebooks as an input dataset. This is the correct persistence mechanism for Kaggle.
Once you have the LoRA adapter, loading it locally requires only the base model and the peft library:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "./lora_adapter")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
To use the fine-tuned model with Ollama, merge the adapter into the base model weights and export to GGUF format using llama.cpp‘s conversion script. The merged model loads identically to any other Ollama model and runs locally with no ongoing cloud cost.
What This Means For You
- Use Kaggle over Colab for fine-tuning runs that take more than 4 hours. Kaggle’s 30 GPU hours per week resets reliably, has no idle disconnect, and the P100 availability sometimes gives you faster training than Colab’s T4.
- Save checkpoints to Google Drive or Kaggle Dataset output every 500 steps, not at the end of training. Colab disconnects without warning and losing three hours of compute to a missing checkpoint is entirely preventable.
- Set LoRA rank to 16 as a starting point, not 4 or 64. Rank 4 often underfits on datasets with genuine behavioral complexity. Rank 64 consumes enough VRAM to cause OOM errors on free tier hardware. Rank 16 with alpha 32 is the stable default before you have evidence to adjust.
- Build your fine-tuning dataset from real examples, not AI-generated ones. Synthetic training data produces a model that converges toward generic AI output rather than your specific target behavior, which is the opposite of what fine-tuning is for.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
- Pithy Security → Stay ahead of cybersecurity threats.
