Eray.
Back to Projects

Stable Diffusion LoRA Fine-Tuning

Fine-tuning Stable Diffusion with LoRA technique to generate personalized images

•Source Code
Machine LearningStable DiffusionLoRADeep LearningPythonHugging Face
Technologies:PyTorchDiffusersPEFTHugging FaceGoogle Colab

Stable Diffusion LoRA Fine-Tuning

This project uses the LoRA (Low-Rank Adaptation) technique to fine-tune Stable Diffusion v1.5 model with a custom dataset. The model has been trained on dog images and can generate personalized images using the "sks dog" trigger word.


Quick Start

You can start using the trained model right away:

from diffusers import DiffusionPipeline
import torch
 
pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")
 
pipe.load_lora_weights("erdoganeray/finetune-demo")
 
image = pipe("a photo of sks dog wearing sunglasses").images[0]
image.save("output.png")

Hugging Face Model: erdoganeray/finetune-demo


Features

  • Efficient Training: Only 0.5-1% parameter training with LoRA
  • Small Model Size: ~3MB LoRA adapter (instead of ~4GB full model)
  • Easy to Use: Step-by-step training with Jupyter notebook
  • Hugging Face Compatible: Model can be directly uploaded to Hub
  • Detailed Monitoring: Loss graphs and checkpoint mechanism
  • Various Test Prompts: 20 different styles and scenarios

What is LoRA?

LoRA (Low-Rank Adaptation) is a technique that allows us to fine-tune large models efficiently. Instead of updating all model weights:

  1. Freezes the original model weights
  2. Adds small "adapter" matrices to specific layers
  3. Trains only these adapter matrices

Advantages:

  • ~99% less trainable parameters
  • Much faster training
  • Very small model files (~3MB vs ~4GB)
  • Can be easily combined with other LoRAs

Model Architecture

Base Components

ComponentModel
Base ModelStable Diffusion v1.5
Text EncoderCLIP ViT-L/14
VAEAutoencoderKL
UNet2D Conditional UNet

LoRA Configuration

LoraConfig(
    r=32,                    # Rank (capacity)
    lora_alpha=64,           # Scaling factor (rank * 2)
    lora_dropout=0.05,       # Regularization
    target_modules=[         # Attention layers
        "to_k", "to_q", 
        "to_v", "to_out.0"
    ]
)

Trainable Parameters: ~3.1M / 860M (0.36%)


Training Details

Hyperparameters

ParameterValueDescription
Epochs300Total training cycles
Learning Rate5e-5 → 5e-6Cosine annealing
OptimizerAdamWWith weight decay
Batch Size11 random image per epoch
Gradient Clipping1.0For stability
Image Size512×512Standard SD size
AugmentationYesFlip, ColorJitter

Data Processing

Augmentation Pipeline:

  • Random Horizontal Flip (p=0.5)
  • Color Jitter (brightness, contrast, saturation: 0.1)
  • Normalization (mean=0.5, std=0.5)

Dataset: diffusers/dog-example (~5 dog images)

Training Dataset Examples

The model was trained on a small dataset of dog images:

Training Image 1Training Image 2

Results

Loss Metrics

MetricValue
Final Loss0.1804
Average Loss0.0876
Min Loss~0.05

Observations:

  • Loss values decreased steadily
  • No signs of overfitting
  • 300 epochs provided sufficient convergence

Benchmark

MetricValue
Training Time~45 minutes (T4 GPU)
Model Size3.2 MB (LoRA)
Inference Time~5 seconds/image (50 steps)
VRAM Usage~8 GB (fp16)

Training Loss Graph

Training Loss

Usage Examples

Method 1: Using from Hugging Face (Recommended)

from diffusers import DiffusionPipeline
import torch
 
# Load pipeline and add LoRA adapter
pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")
 
pipe.load_lora_weights("erdoganeray/finetune-demo")
 
# Generate image
image = pipe(
    "a photo of sks dog wearing sunglasses",
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]
 
image.save("output.png")

Method 2: Alternative with PEFT

from diffusers import StableDiffusionPipeline
from peft import PeftModel
import torch
 
# Load base pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")
 
# Load LoRA adapter from Hugging Face
pipe.unet = PeftModel.from_pretrained(
    pipe.unet,
    "erdoganeray/finetune-demo"
)
 
# Generate image
image = pipe("sks dog in a cyberpunk city, neon lights").images[0]
image.save("output.png")

Advanced Parameters

image = pipe(
    prompt="sks dog in a cyberpunk city, neon lights",
    negative_prompt="blurry, bad quality",
    num_inference_steps=50,      # More = better quality
    guidance_scale=7.5,          # Prompt adherence
    height=512,
    width=512,
).images[0]

Batch Generation

prompts = [
    "sks dog on a beach",
    "sks dog in winter",
    "sks dog as superhero"
]
 
for i, prompt in enumerate(prompts):
    image = pipe(prompt).images[0]
    image.save(f"output_{i}.png")

Generated Image Examples

The model successfully generates images in various styles. Here is the gallery of generated outputs:

Generated Images Gallery

Sample Outputs

Dog in bucketsks dog in a bucket
Dog in gardensks dog in a garden
Astronaut dogsks dog astronaut
Van Gogh stylesks dog Van Gogh style

Different Environments:

  • Astronaut dog in space
  • Dog on tropical beach
  • Dog by fireplace
  • Dog playing in snow

Art Styles:

  • Van Gogh style oil painting
  • Watercolor painting
  • Renaissance style
  • Pixel art 8-bit

Costumes & Accessories:

  • Dog wearing golden crown
  • Sunglasses and leather jacket
  • Superhero costume
  • Wizard hat

Fantasy Scenarios:

  • Cyberpunk character
  • Magical forest
  • Astronaut on the Moon
  • Underwater scene

Links


License

This project is licensed under MIT License. The Stable Diffusion v1.5 model is under CreativeML OpenRAIL-M license.