Fine-tuning a model can feel like a lot of jargon at first: epochs, batch size, training loss, validation loss, and then LoRA, QLoRA... but trust me, every single piece of this puzzle fits together.
It's really easy if you read this blog thoroughly and slowly
Why Fine-Tune?
Imagine you’ve got a massive base model (let’s say 100GB). It’s great at answering all kinds of questions, but it’s not perfect for your specific needs. Maybe you want it to write legal contracts, help with medical diagnoses, or draft marketing emails. You fine-tune the model by teaching it using data from your domain so it can specialize in your task.
But here’s the problem: if you fine-tune the entire 100GB model, you create a new version of the model for every task. If 10 clients want to fine-tune the same model, that’s 10 different 100GB versions — 1 terabyte of storage! Serving and managing these massive models is inefficient, expensive, and impractical.
This is where LoRA and QLoRA comes in
LoRA (Low-Rank Adaptation): Updating Only What Matters
LoRA stands for Low-Rank Adaptation of Large Language Models, and it’s a clever way to fine-tune models without touching the massive base model weights. Here’s how it works:
Base Model Stays Frozen: During fine-tuning, LoRA freezes the base model’s original weights. You don’t update the entire 100GB model.
Adaptation Layer: Instead, it adds a tiny adaptation layer (extra parameters) to specific parts of the model. These layers capture the changes needed for your specific task.
Lightweight Updates: During training, you only update the weights in the adaptation layers — a small fraction of the total model. This means the storage required for your fine-tuned model is just a few MBs instead of 100GB.
Think of it as snapping on a lens to a high-end camera. The base camera (model) remains the same, but the lens (adapter) customizes it for the picture you want to take (your task).
QLoRA: Making It Even Leaner
QLoRA (Quantized Low-Rank Adaptation) takes LoRA and makes it even more efficient by compressing the base model before fine-tuning. It does this using quantization — reducing the precision of the base model’s weights (e.g., from 16 bits to 4 bits). This:
Reduces memory usage.
Speeds up training and inference.
Still retains near-full performance.
So with QLoRA:
You start with a smaller, quantized version of the base model.
Then apply LoRA’s lightweight adapters for fine-tuning.
It’s like shrinking the size of the camera while keeping its quality intact before snapping on the lens. The result? Fine-tuning becomes much cheaper and more scalable.
How Do We Fine-Tune the Model?
To fine-tune a model, we feed it our specific data and help it learn the patterns relevant to our task. But teaching a model isn’t random — it’s a structured process, and that’s where the following parameters comes in
1. Epochs: Reading the Book again and again
Fine-tuning isn’t something you do in one shot. Epochs are how many times you pass through the entire dataset during training. Think of the dataset as a book. Reading it once might not be enough to fully understand it. The first time, you get the gist. The second time, you catch things you missed. By the third read, you start noticing patterns.
2. Batch Size: Reading in Chunks
Now, you don’t read the entire book in one sitting, right? You read a few pages at a time, pause, and reflect. That’s what batch size is — how many samples the model processes at a time before it stops to adjust (update its weights).
Why not read the whole book at once? It’s too overwhelming and computationally expensive.
Why not read just one page at a time? It’s inefficient and takes forever.
Batch size is about finding the sweet spot: large enough to be efficient, small enough to capture details. Smaller batches give the model more frequent updates (granularity), while larger batches cover more data at once.
3. Training Loss: Am I Learning What’s in the Book?
Every time the model processes a batch, we test it. We ask:
Low training loss: The model is doing well on the data it’s directly learning from.
High training loss: The model is struggling to learn the material.
4. Validation Loss
After reading the book (training), someone asks you related questions that weren’t directly in the text. This measures whether you’ve generalized what you learned. Validation loss tells us how well the model can handle new, unseen data that’s related to the training data.
Low validation loss: The model has truly understood the patterns and can apply them to new tasks.
High validation loss: The model is either overfitting (memorizing the training data) or underfitting (not learning enough).
The Goal: A Balanced Model
So, what’s the end goal? We want a model that’s:
Good at generalizing (base model): The base model provides the foundation — a general-purpose tool that knows a bit about everything.
Better at specializing (fine-tuning): Fine-tuning teaches the model to focus on what matters for your task, without losing its ability to generalize.
In the next blog, I'll include graphs which will help you understand this in a more intuitive way. Until then, think of more use cases like:
We don't want the model to lose it's intelligence. Like base model has it's intelligence, fine tuning has the ability to specialise in a given task. We want both of it, so how do we measure the sweet spot?
Imagine you're building a chatbot. You want your chatbot/fine-tuned model to be intelligent and adaptive. At which layer, will the intelligence be applied? Prompt level or fine-tuning level
If you like the blog and read till here, hit the like and share! Happy learning.