Host a Local AI

Introduction

If you are reading this guide you probably already know the benefits of running a locally hosted AI. I want to keep this concise, so I will not cover that topic here. This guide will focus on running Open WebUI with bundled Ollama support. We will do this via Docker running on an Ubuntu 24.04 server. Depending on your setup, some steps may vary. This guide generally follows this one: https://docs.openwebui.com/getting-started/

You need to be running Docker, and if you want to follow along exactly you will also need Portainer. I have guides on both

Docker NVIDIA Toolkit

If you plan to use a GPU you will need to do a few extra steps. First, ensure you have the latest NVIDIA drivers installed.

Install the Docker NVIDIA Toolkit by first adding the applicable sources

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update APT

sudo apt update

Install package

sudo apt install -y nvidia-container-toolkit

Add the NVIDIA runtime for Docker

sudo nvidia-ctk runtime configure --runtime=docker

Restart Docker (NOTE: will stop all your containers)

sudo systemctl restart docker

You are now ready to use your NVIDIA GPU with Open WebUI

Open WebUI Container

Now open up portainer and create a stack file called open-webui, then paste the following in the web editor for GPU support

GPU

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:ollama
    container_name: open-webui
    restart: always
    ports:
      - "3000:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ollama:/root/.ollama
      - open-webui:/app/backend/data
    runtime: nvidia

volumes:
  ollama:
  open-webui:

CPU

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:ollama
    container_name: open-webui
    restart: always
    ports:
      - "3000:8080"
    volumes:
      - ollama:/root/.ollama
      - open-webui:/app/backend/data

volumes:
  ollama:
  open-webui:

The initial deployment can take a while, the image is really large, so give it some time before getting too worried. If you do not see an error pop up it is probably fine. Once it is deployed you can either access it at http://localhost:3000 or if it is running on another server, at http://SERVER_IP:3000.

Getting Started with Open WebUI

You will be prompted to create an account, this creates an account locally, so no worries about your data being sent externally. After that, it is time to load up a model! Head over to https://ollama.com/library and click the link for llama3.2 (or whatever is the latest model while you are doing this). On the left, select if you want the 1b or 3b model, then look for the name of the model (underlined in red) on the right. Copy it

llama32

Now go back to Open WebUI, at the top right click your avatar, then click “Admin Panel”. Then click the “Settings” tab.

settings

Select “Models”. Then in the input for entering the model tag, add the llama model

pull-model

Then on the far right click the download button to pull the model in.

pull-percent

Now click “New Chat” at the top left, and you should be able to select a model now

select-model

Select the llama model and ask it a question. It should respond (it could take some time depending on your system. If it is taking too long, try the 1b model)

how-assist

If you want to see how much the model is using your GPUs you can install nvtop:

GPU Usage

sudo apt install nvtop

And then run

nvtop

Now ask it a more complex question to see the GPUs get some load.

gpu-load

I am running two 4090s so with a 3b model the responses are almost instant. And that’s it! Go ahead and try some more models, you can download a bunch and easily switch between them (or even run 2 at the same time). There is a lot more you can do with Open WebUI so take a look at the docs. Some interesting things I plan to do videos on is how to integrate SearXNG and image generation. So subscribe to my YouTube channel and give the video a like if you want to see more content like this.

Introduction​

Docker NVIDIA Toolkit​

Open WebUI Container​

GPU​

CPU​

Getting Started with Open WebUI​

GPU Usage​

Introduction

Docker NVIDIA Toolkit

Open WebUI Container

GPU

CPU

Getting Started with Open WebUI

GPU Usage