Supervised fine-tuning Google Gemini 1.5 LLM for Enhanced Summarization: A Practical Guide

4 min readJan 13, 2025

The goal of fine-tuning is to further improve the performance of the model for your specific task. Fine-tuning works by providing the model with a training dataset containing many examples of the task. For niche tasks, you can get significant improvements in model performance by tuning the model on a modest number of examples. This kind of model tuning is sometimes referred to as supervised fine-tuning, to distinguish it from other kinds of fine-tuning.

Google’s Gemini 1.5 offers powerful capabilities for various natural language processing tasks. This example demonstrates how to fine-tune a Gemini 1.5 model for abstractive text summarization using the Wikilingua English dataset. The process involves preparing a dataset in the correct JSONL format, uploading it to Google Cloud Storage, fine-tuning the model using the Vertex AI SDK, and evaluating performance using ROUGE scores.

Use cases for fine-tuning LLMs like Gemini extend beyond simple summarization, including improving explainability and tailoring output to specific formats (e.g., JSON, templates for structured documents).

Why?

LLM model tuning refers to the process of adapting a large language model (LLM) to perform better on a specific task or dataset. Instead of training a new model from scratch, which is computationally expensive and requires massive amounts of data, tuning leverages the pre-existing knowledge and capabilities of a pre-trained LLM. This makes it a more efficient and practical approach for many applications.

There are several techniques for LLM model tuning, including:

Fine-tuning: This is the most common method, where a pre-trained LLM’s weights are adjusted using a smaller, task-specific dataset. This allows the model to specialize in the target task while retaining much of its general knowledge. The amount of fine-tuning can range from adjusting only a small portion of the model’s parameters (e.g., adapter modules) to adjusting all the parameters.
Prompt engineering/Prompt tuning: This approach involves carefully crafting input prompts to guide the LLM’s behavior without modifying its internal weights. This can be surprisingly effective for certain tasks, especially when data is limited. Variations include creating prompt templates or learning optimal prompts through techniques like gradient descent.
Parameter-efficient fine-tuning (PEFT): These techniques aim to minimize the number of parameters that need to be updated during tuning. This reduces computational cost and memory requirements. Examples include adapter methods, prefix-tuning, and LoRA (Low-Rank Adaptation).
Instruction tuning: This focuses on training the LLM to follow instructions given in the prompt. The dataset consists of instruction-output pairs, allowing the model to better understand and respond to different types of instructions.

Model Tuning Concept

Fine-tuning lets you get more out of the models available through the API by providing:

Higher quality results than prompting
Ability to train on more examples than can fit in a prompt
Token savings due to shorter prompts
Lower latency requests

At a high level, fine-tuning involves the following steps:

Prepare and upload training data
Train a new fine-tuned model
Evaluate results and go back to step 1 if needed
Use your fine-tuned model

Example — Notebook to compare results from a prompt v fine-tuned response

Key steps:

Upload Test data to GCS Bucket [Step 3]
Test the model pre-tuning [Step 5]
Evaluate precision [Step 6] from step 5 using Rogue-scorer and Rogue-L metric. Compare ground truth v model output.
Fine-tune the model based on Test data from step 3 [Step 7]
Test the model pre-tuning [Step 8]
Evaluate precision [Step 9] from step 8 using Rogue-scorer and Rogue-L metric. Compare ground truth v model output.
Summarise metrics between pre-tuning and fine-tuned model outputs.

Summary

We can clearly see the difference between summary generated pre and post tuning from the example notebook, as tuned summary is more inline with the ground truth format (Note: Pre and Post outputs, might vary based on the set parameters.)

Pre Tuning result : This article describes a method for applying lotion to your own back using your forearms. The technique involves squeezing lotion in a line along your forearms, bending your elbows, and rubbing your arms against your back in a windshield wiper motion. This method may not be suitable for individuals with shoulder pain or limited flexibility.
Post Tuned result: Squeeze a line of lotion onto the top of each forearm. Place your forearms behind your back. Rub your forearms up and down your back
Ground Truth: Squeeze a line of lotion onto the tops of both forearms and the backs of your hands. Place your arms behind your back. Move your arms in a windshield wiper motion.

Evaluation between prompt v fine-tuned response is illustrated in Step 9:

Model tuning has improved the rougeL_precision by 82.75% (result might differ based on each tuning iteration)

Prompt Tuning Vs LLM Model Tuning

Not all LLM Models support Supervised Model tuning, in those cases Prompt Engineering is the only option.

Pros

Customization: Allows for extensive customization, enabling the model to generate responses tailored to specific domains or styles.
Improved Accuracy: By training on a specialized dataset, the model can produce more accurate and relevant responses.
Adaptability: Finetuned models can better handle niche topics or recent information not covered in the original training

Cons

Cost: Fine-tuning requires significant computational resources, making it more expensive than prompting.
Technical Skills: This approach necessitates a deeper understanding of machine learning and language model architectures.
Data Requirements: Effective fine-tuning requires a substantial and well-curated dataset, which can be challenging to compile.

Credit : https://medium.com/@myscale/prompt-engineering-vs-finetuning-vs-rag-cfae761c6d06

Cost

Is there a charge for fine-tuning the models?

Model tuning is free (at the time of publishing this article), but inference on tuned models is charged at the same rate as the base models.

References

https://github.com/rubans/ml/blob/main/playground/fine_tuning_llm.ipynb