back
Save 5000$ on hardware, by running Stable Diffusion setup on Google Cloud virtual machines for free

Unless you have lived under a rock for the past 2 years, you probably have heard about generative models. They are the new hype in the AI community. They are used to generate images, videos, music, text, etc.

This year, I explored different generative models for image creation and I want to share what I learned with you. In this guide, you will discover how to set up a virtual machine with hardware that would be very expensive to have at home. You will also learn how to configure this machine and install the software you need to run it. By the way, Google Cloud offers a $300 trial for three months, so you can try it out for free.

300$ is enough to do do almost endless generation of images, given that you stop the virtual machine between the sessions.

Then I will show you how to install necessary extensions, that will give you much more control of what you are creating.
Note: This guide covers pretty advanced setup and will give you a lot of control over the image creation and training process.
However, if you are a beginner, I would recommend you to start with running the stable diffusion in Google Colab.

What is possible with stable diffusion
Controlnet guided generated image of a dragon.
Prompt: (red dragon, black wings, purple eyes, big teeth , black tongue) crawling on the mountain ,soft light, cinematic light, masterpiece

Step 1: Create a VM in GCP

First you need to create a google cloud account.

At the moment of writing this article, google gives 300$ if credits for 3 months for newely registered users.

Step 1.1: Create a VM

After creating account and heading to the main page, head over to Compute Engine->VM instances->CREATE INSTANCE

Create a VM with the following parameters:

  • Region and zone: (Note: different regions and zones have different GPUs available, so select the one that has the GPU you want)
  • Machine configuration:GPUs
  • GPU: L4 (which is equivalent to rtx4090 and costs 2000$ alone)
  • Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
  • Boot disk: Deep Learning VM with CUDA 11.8 M110 (Note: You will be automatically offfered to swith to this image)
  • Firewall: Allow HTTP traffic
  • Firewall: Allow HTTPS traffic
  • Size (GB): 100+

Click Create and wait for the VM to be created.

It might happen that the region/zone you have selected doesn't have the resources you selected. Then you will receive a notification in the red circle in the red top corner.

Step 1.2: Request quotas

Now, that you have created a VM and you can see it in the list, you will need to request quotas. Google cloud requers you to request the quotas for using certain resources, such as expensive GPUs. To do that, you will need to press request quotas button next to your VM and follow the instruction.Request quotas notification screenshot

Step 1.3: Firewall settings

Now you need to configure firewall settings, so that only you will be able to use the GUI.

  1. Head over to VPC network->Firewall->CREATE FIREWALL RULE
  2. Enter the following parameters:
    • Target tags: http-server; https-server; sd-web-ui;
    • Protocols and ports: Specified protocols and ports: TCP: 7860-7869
    • Source IPv4 ranges: ask google "what is my ip"
  3. Enter the name and description and press create.
  4. Go back to VM instances and click on your VM. Then press edit. Then, at the Network tags, enter the tags we have used in the previous step: http-server; https-server; sd-web-ui; and press save.

Now your VM is only accessible from the internet by you.

Step3: Install software

Now it is time for you to install all the necessary software.
Start your VM and wait until it is ready. Then press SSH and you will be redirected to the terminal of your VM.
When asked to install nvidia drivers, say no.

Step3.1: Install prerequisits

First you need to install necessary libraries, through the shell. They will be required later for things like installing GUIs, building pyhton, etc.

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install gdebi-core git-lfs
sudo apt-get install tk-dev
sudo apt-get -y install libgdbm-dev libsqlite3-dev libssl-dev zlib1g-dev
sudo apt-get -y install liblzma-dev lzma libbz2-dev libffi-dev
sudo apt-get build-dep python

Install python 3.10.10 from source.

export PYTHON_VERSION=3.10.10
export PYTHON_MAJOR=3

curl -O https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz
tar -xvzf Python-${PYTHON_VERSION}.tgz
cd Python-${PYTHON_VERSION}

# add this to rebuild python
./configure \
    --prefix=/opt/python/${PYTHON_VERSION} \
    --enable-shared \
    --enable-optimizations \
    --enable-ipv6 \
    LDFLAGS=-Wl,-rpath=/opt/python/${PYTHON_VERSION}/lib,--disable-new-dtags

make
sudo make install

curl -O https://bootstrap.pypa.io/get-pip.py
sudo /opt/python/${PYTHON_VERSION}/bin/python${PYTHON_MAJOR} get-pip.py

/opt/python/${PYTHON_VERSION}/bin/python${PYTHON_MAJOR} --version

Then add the path to the .profile

export "PATH=/opt/python/3.10.10/bin/:$PATH"
alias python="/opt/python/3.10.10/bin/python3.10"
cd ..
source .profile

Install NVIDIA drivers

curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py \
  --output install_gpu_driver.py
sudo /opt/deeplearning/install-driver.sh

Step3.2: Install GUIs

Stable diffusion has many GUIs for different purposes:

  • AUTOMATIC1111 - for image creation and model training
  • Kohya - best for Lora trining
  • ComfyUI - not so comfy and confusing tool for what can be done with AAUTOMATIC1111
  • and more ...

We are going to install AUTOMATIC1111 and Kohya. First install stable diffusion AUTOMATIC1111 GUI.

wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
chmod +x webui.sh
./webui.sh

Then stop the process with CTRL+C and type:

nano stable-diffusion-webui/webui-user.sh

and uncomment following lines:

  • # export ACCELERATE="True"
  • # export NO_TCMALLOC="True"

Then add a dependency:

LD_LIBRARY_PATH=/usr/local/lib
export LD_LIBRARY_PATH

Then install Kohya GUI.

git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
chmod +x ./setup.sh
./setup.sh

Affter the installation is done, press CTRL+C to stop the process. Then go to the kohya_ss directory and run the following commands:

source venv/bin/activate
accelerate config

Then you will need to setup acceleration parameters. Answer as shown in the picture.

Acceleration setup choices - screenshot

Restart the machine to avoid errors, and configurations! Now you have installed both GUIs, that will be necessary for you to create images and train the models. Now you can launch them.

To launch AUTOMATIC1111 or Kohya, go to the stable-diffusion-webui directory and run:

# AUTOMATIC1111
./webui.sh --listen --enable-insecure-extension-access --xformers

# Kohya
./gui.sh --headless --listen 0.0.0.0

Now you can access the GUIs from your browser. To that, go to your VMs and copy "External IP", then enter it into the browser as x.x.x.x:7860, where x.x.x.x is external ip.
Note: you will need to add the port number 7860, because it is the port, that the GUIs are listening to.
If you you opened both GUIs, then you will need to use different 7860 and 7861 ports for the first and second GUIs in respective opening order.

Step4: Install extensions

Now you need to install extensions for AAUTOMATIC1111.
To do that:

  1. Launh AUTOMATIC1111 GUI
  2. Go to the Extensions tab
  3. Press "Available"
  4. In "Extension index URL" leave the existing URL
  5. Press "Load from"
  6. Install following extensions and restart UI through Settings->Reload UI:
    • stable-diffusion-webui-images-browser - The most important tool. Keeps the history of your images with prompts
    • sd-civitai-browser-plus - lets you get models, Loras and textual inversions easier
    • sd-webui-controlnet - This tool will allow you to guide the image shapes and compositions.
      To install it, go to the ControlNet repo and follow the instructions.
      To download controlnet models, go to the "stable-diffusion-webui/extensions/sd-webui-controlnet/models" and run:
      git clone https://huggingface.co/lllyasviel/ControlNet-v1-1
      mv ControlNet-v1-1/*.pth .
      sudo rm -r ControlNet-v1-1
      which will download all the necessary models needed for ControlNet.
    • sd-webui-photopea-embed - inbuilt pgotoshop-like tool
    • sd-webui-segment-anything - tool that segments an image ainto pieces, so yuoui can replace individual image piecees
    • sd_dreambooth_extension - Allows you to retrain the model with your own images

If you can't fint one the given tools, google them. Then you will find a URL to enter in a step 4. Or you will find github repository that you will need to clone into <stable-diffusion-webui project path>/extensions through the SSH.

Step5: Launch GUIs and create

As a first step, go to https://civitai.com/. This is a website that hosts models from independant artists. You can filter the models to stable diffusion version 1.5 and by model type that you want to get. To use those models you will need to download them into the correct directories. You will have multiple options, but for the begginers, I would recommend to start with the following models:

  • Checkpoints(base models) - the main models which have the biggest size and have the biggest effect on the images. For example realistic photo model, without any other models, will only produce photos. The more models go to the "~/stable-diffusion-webui/models/Stable-diffusion" directory and will have ".cpt" or ".safetensor" extensions.
  • Loras - are smaller size models which are having influence on the base models. Mostly it is used to train charracters. For example, if you use a realistic photo model and Mario charracter Lora, you will get a realistic photo of Mario. Loras are located in the "~/stable-diffusion-webui/models/Lora" directory and have ".safetensors" extension.
  • Textual inversions - are models that are used to invert the text into the image. They work similarly to the Loras, but have less significant effect on the image. You can use combination of them to fine tune your image.

Launch AUTOMATIC1111 and you will see the following view:

To create an Image, you will need to create the prompt and negative prompt to desdcribe what you want and dont want to see in an image.
for this exampole, we will create a prompt of an old man, smoking a cigarette. Then take a pose from another photo and apply it to ours.

Download models

The base model offered by stable diffusion is a not great. It lacks consistency and variet, so we will download a better one from civitai.com.

For this example, I will use this model ands this Lora.from the fo,however, feel free toi experioment with your own models.
To download the models, click on the model you like, right click on Download button, copy link and copy the number id of the model, by the end of the link.
Then open your SSH terminal and go to directory to download the model:

# Go to the base model directory
cd ~/stable-diffusion-webui/models/Stable-diffusion

# And download the model using
# wget https://civitai.com/api/download/models/{modelVersionId} --content-disposition
# Where {modelVersionId} is the number id of the model you want to download
wget https://civitai.com/api/download/models/50908 --content-disposition

Do the same for the Lora model:

# Go to the Lora directory
cd ~/stable-diffusion-webui/models/Lora

# Download our Lora
wget https://civitai.com/api/download/models/5687 --content-disposition

Create a prompt

Now that you have downloaded models, start your AUTOMATIC1111 GUI, or if it is already running, go to Settings->Reload UI.
Then select a Stable Diffusion checkpoint(base model) to be "diffusionBrushEverythingSFWNSFWAll_v10.safetensors" on the top left corner.
Then go to txt2img tab and enter a prompt which will include our Lora "‹lora:smoking_ok:1›", where 1 is a degree to which you want your image to be affected by Lora. on civitai and copy the prompt

a portrait of an old man, smoking cigarette (smoking:1.2),
beautiful painting with highly detailed face by greg rutkowski and magali villanueve,  <lora:smoking_ok:1>
And a negative prompt:
young man, blurry, ugly
Tip: If you want to learn how to write prompts and learn best prompts for speciffic models, go exampole images on civitai.com and copy the prompt.
Then, down in the settings, select the "Batch count" to be 4, to have bigger variation of images.
Then press "Generate" and wait for the images to be generated.
Congratulations! You have created your first 4 images.4 images, generated by me

ControlNet pose

Now we will add a pose to our image.
To do that, leave all the setting from the previous step the same. Then go to the ControlNet tab and enable it.
Then save this image.

To be used to copy pose by ControlNet

Then go to Single image tab and select this image there.
Lastly, select "Depth" as "Control Type" and press generate.
Result of all the techniquesPS. You preview the the pose, by clicking on a little bomb icon betweeen the preprocessoer and the Model.Result of all the techniques

Usefull links

  • Video explaining on how to train your own Lora using Kohya ss
  • Tool that allows you to prepare images for training Lora.
  • Tool, that allows you to create poses for ControlNet.
  • Video, on how to fix the hands on the images, since it is the weakest part of the stable diffusion.
  • Video on more complex compositions for the images.
  • Video on how to use Photopea(photoshoplike tool listed in extension list)