
Supervised fine-tuning Google Gemini 1.5 LLM for Enhanced Summarization: A Practical Guide
The goal of fine-tuning is to further improve the performance of the model for your specific task. Fine-tuning works by providing the model with a training dataset containing many examples of the task. For niche tasks, you can get significant improvements in model performance by tuning the model on a modest number of examples. This kind of model tuning is sometimes referred to as supervised fine-tuning, to distinguish it from other kinds of fine-tuning.
Google’s Gemini 1.5 offers powerful capabilities for various natural language processing tasks. This example demonstrates how to fine-tune a Gemini 1.5 model for abstractive text summarization using the Wikilingua English dataset. The process involves preparing a dataset in the correct JSONL format, uploading it to Google Cloud Storage, fine-tuning the model using the Vertex AI SDK, and evaluating performance using ROUGE scores.
Use cases for fine-tuning LLMs like Gemini extend beyond simple summarization, including improving explainability and tailoring output to specific formats (e.g., JSON, templates for structured documents).
Why?
LLM model tuning refers to the process of adapting a large language model (LLM) to perform better on a specific task or dataset. Instead of training a new model from scratch, which is computationally expensive and requires massive amounts of data, tuning leverages the pre-existing knowledge and capabilities of a pre-trained LLM. This makes it a more efficient and practical approach for many applications.
There are several techniques for LLM model tuning, including:
- Fine-tuning: This is the most common method, where a pre-trained LLM’s weights are adjusted using a smaller, task-specific dataset. This allows the model to specialize in the target task while retaining much of its general knowledge. The amount of fine-tuning can range from adjusting only a small portion of the model’s parameters (e.g., adapter modules) to adjusting all the parameters.
- Prompt engineering/Prompt tuning: This approach involves carefully crafting input prompts to guide the LLM’s behavior without modifying its internal weights. This can be surprisingly effective for certain tasks, especially when data is limited. Variations include creating prompt templates or learning optimal prompts through techniques like gradient descent.
- Parameter-efficient fine-tuning (PEFT): These techniques aim to minimize the number of parameters that need to be updated during tuning. This reduces computational cost and memory requirements. Examples include adapter methods, prefix-tuning, and LoRA (Low-Rank Adaptation).
- Instruction tuning: This focuses on training the LLM to follow instructions given in the prompt. The dataset consists of instruction-output pairs, allowing the model to better understand and respond to different types of instructions.
Model Tuning Concept
Fine-tuning lets you get more out of the models available through the API by providing:
- Higher quality results than prompting
- Ability to train on more examples than can fit in a prompt
- Token savings due to shorter prompts
- Lower latency requests
At a high level, fine-tuning involves the following steps:
- Prepare and upload training data
- Train a new fine-tuned model
- Evaluate results and go back to step 1 if needed
- Use your fine-tuned model
Example — Notebook to compare results from a prompt v fine-tuned response
Key steps:
- Upload Test data to GCS Bucket [Step 3]
- Test the model pre-tuning [Step 5]
- Evaluate precision [Step 6] from step 5 using Rogue-scorer and Rogue-L metric. Compare ground truth v model output.
- Fine-tune the model based on Test data from step 3 [Step 7]
- Test the model pre-tuning [Step 8]
- Evaluate precision [Step 9] from step 8 using Rogue-scorer and Rogue-L metric. Compare ground truth v model output.
- Summarise metrics between pre-tuning and fine-tuned model outputs.
Summary
We can clearly see the difference between summary generated pre and post tuning from the example notebook, as tuned summary is more inline with the ground truth format (Note: Pre and Post outputs, might vary based on the set parameters.)
- Pre Tuning result :
This article describes a method for applying lotion to your own back using your forearms. The technique involves squeezing lotion in a line along your forearms, bending your elbows, and rubbing your arms against your back in a windshield wiper motion. This method may not be suitable for individuals with shoulder pain or limited flexibility.
- Post Tuned result:
Squeeze a line of lotion onto the top of each forearm. Place your forearms behind your back. Rub your forearms up and down your back
- Ground Truth:
Squeeze a line of lotion onto the tops of both forearms and the backs of your hands. Place your arms behind your back. Move your arms in a windshield wiper motion.
Evaluation between prompt v fine-tuned response is illustrated in Step 9:
Model tuning has improved the rougeL_precision by 82.75% (result might differ based on each tuning iteration)
Prompt Tuning Vs LLM Model Tuning
Not all LLM Models support Supervised Model tuning, in those cases Prompt Engineering is the only option.
Pros
- Customization: Allows for extensive customization, enabling the model to generate responses tailored to specific domains or styles.
- Improved Accuracy: By training on a specialized dataset, the model can produce more accurate and relevant responses.
- Adaptability: Finetuned models can better handle niche topics or recent information not covered in the original training
Cons
- Cost: Fine-tuning requires significant computational resources, making it more expensive than prompting.
- Technical Skills: This approach necessitates a deeper understanding of machine learning and language model architectures.
- Data Requirements: Effective fine-tuning requires a substantial and well-curated dataset, which can be challenging to compile.
Credit : https://medium.com/@myscale/prompt-engineering-vs-finetuning-vs-rag-cfae761c6d06
Cost
Is there a charge for fine-tuning the models?
Model tuning is free (at the time of publishing this article), but inference on tuned models is charged at the same rate as the base models.
References
https://github.com/rubans/ml/blob/main/playground/fine_tuning_llm.ipynb