Contents

Frigate in Kubernetes

Frigate: Kubernetes edition

This is a follow up to my ‘Frigate From Scratch’ post. Most of that post is still relevant today but I have since moved almost all of my workloads into Kubernetes.

This post is intended to document some of the issues I hit while trying to get Frigate running in my k8s cluster. This is not every issue I hit, just the ones that are not directly related to my specific environment/setup/workflows.

I have essentially taken the relevant parts of my notes and lightly edited them into a more organized and structured format.

Overview

There is a helm chart but Frigate - at least outside of the container - is not a particularly complex deployment so there’s no upside to using helm and just a ton of downside:

</rant>

You can still use the helm chart as a reference for the various ports and volume mounts that need to be configured:

1
2
3
4
❯ helm repo add blakeshome https://blakeblackshear.github.io/blakeshome-charts/
"blakeshome" has been added to your repositories
# use --set to enable things like startup probes and coral support to get a better reference manifest
❯ helm template blakeshome blakeshome/frigate --set probes.startup.enabled=true --set coral.enabled=true --set coral.hostPath=/dev/apex_0 > reference.yaml

Pain Points

Below are two pain points I hit while getting Frigate running in k8s that are also likely to be relevant to others. Hopefully the notes here will save you some time!

Coral pass through

Frigate now supports several different hardware acceleration options for object detection. I am still using the PCIe Coral EdgeTPU as the performance is excellent although that will likely change in the future as other options become available.

To prepare for a future where I might have multiple viable hardware acceleration options available in my k8s cluster, I have elected to use NFD (Node Feature Discovery) to label nodes that have Coral TPUs installed. This is not particularly difficult to do but it is also unnecessary if you only have a single node that will run Frigate; regular taint/toleration or nodeSelector approaches will work just fine.

Regardless, the Coral drivers need to be installed on the node(s) that will host the Coral device(s).

The good news is that the process of manually patching and compiling the drivers has been made less tedious by the community! So long as you are on a modern-ish Debian based system, the jnicolson/gasket-builder repo has automated the process of installing the DKMS drivers for the Coral TPU.

You don’t have to:

  • track down a few unmerged patches and apply them
  • figure out how to build the drivers against your current kernel
  • manually load the modules
  • create and load udev rules
  • … etc

You will still need kernel headers and basic build tools installed on the host but beyond that it’s really just a matter of wget ...; dpkg -i ... and a reboot now for good measure.

Once the drivers are installed, you should see the devices show up under /dev/apex_*:

1
2
3
root@panopticon01:/mnt/frigate# ls -lah /dev/apex_*
crw-rw---- 1 root apex 120, 0 Dec 30 10:01 /dev/apex_0
crw-rw---- 1 root apex 120, 1 Dec 30 10:01 /dev/apex_1

Assuming the devices are detected, you can then pass them through to the container as regular hostPath volumes. Unfortunately, there is no k8s equivalent of the --device flag and my attempts to use a tailored securityContext to grant the necessary permissions to access the devices failed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
securityContext:
  privileged: false
  capabilities:
    add:
      # DMA access
      - IPC_LOCK
      # Required for PCIe memory access
      - SYS_RAWIO
      # Permit more things if needed
      - CAP_SYS_ADMIN
  # Allow anything that wants to escalate privileges to do so...
  allowPrivilegeEscalation: true

Kubernetes seems to prefer that hardware vendors create/ship a Device Plugin to safely/cleanly facilitate accessing devices from inside pods but there is only a very experimental one for the Coral TPU at this time.

All of that is to say: you will need to run the container in privileged mode for the device access to work correctly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: apps/v1
kind: Deployment
# <...>
spec:
    template:
        spec:
            containers:
                  image: ghcr.io/blakeblackshear/frigate:0.16.3
                  name: frigate
                  securityContext:
                      # Needed for the Frigate UI to show CPU/TPU metrics (usage/temp...etc)
                      capabilities:
                          add:
                              - CAP_PERFMON
                      # See note above
                      privileged: true
                  volumeMounts:
                      # Covered in the IPv6 section below
                      - mountPath: /usr/local/nginx/templates/listen.gotmpl
                        name: nginx-templates
                        readOnly: true
                        subPath: listen.gotmpl
                      - mountPath: /dev/apex_0
                        name: coral-apex-0
                      - mountPath: /dev/apex_1
                        name: coral-apex-1
                    # <...>
            # Covered in the NFD section below
            nodeSelector:
                feature.node.kubernetes.io/has-pcie-tpu: "true"
            volumes:
                - configMap:
                      name: frigate-nginx-templates
                  name: nginx-templates
                - hostPath:
                      path: /dev/apex_0
                  name: coral-apex-0
                - hostPath:
                      path: /dev/apex_1
                  name: coral-apex-1
                # <...>

NFD

Node Feature Discovery is a straight-forward project that puts a daemonset on each node in your cluster and scans the hardware for features. Depending on what hardware it finds, it will label the node with appropriate labels that can then be used for scheduling.

Over Engineered
As noted above, NFD is not strictly necessary if you only have a single node that will run Frigate, just use a regular nodeSelector or taint/toleration approach instead.

I have some other workloads that can benefit from NFD so I went ahead and set it up cluster-wide. Here is the relevant snippet from my NFD configuration that detects Coral TPUs and applies a label accordingly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
core:
    # The worker is cheap but it defaults to running every 60s which is excessive for a cluster that
    #    is all bare metal and not likely to have hardware changes often.
    sleepInterval: 300s

    # all -> make sure `pci` features are included
    # Disable CPU features to reduce excessive labels
    featureSources:
        - all
        - "-cpu"

    labelSources:
        - all
        - "-cpu"

sources:
    pci:
        # You only need `0880` for Coral but I have some other hardware that I use
        deviceClassWhitelist:
            # Network
            - "02"
            # Display
            - "03"
            # Base class for System peripherals
            - "08"
            # Specific class for Coral
            - "0880"
            # Accelerator
            - "12"

    custom:
        - name: "has-coral-tpu"
          labels:
              "feature.node.kubernetes.io/has-pcie-tpu": "true"
          matchFeatures:
              - feature: pci.device
                matchExpressions:
                    vendor: { op: In, value: ["1ac1"] }
                    device: { op: In, value: ["089a"] }

Once NFD is deployed and running, each kind: Node in your cluster should have a label like so if it has a Coral TPU installed:

1
2
❯ kubectl get nodes -l feature.node.kubernetes.io/has-pcie-tpu -o name
node/panopticon01

Combine that with the nodeSelector as shown in the snippet \above and Frigate will only be scheduled on nodes that have Coral TPUs installed.

IPv6

It’s 2026 and IPv6 should just work by default everywhere but alas, Frigate is still a bit behind the times in this regard.

The docs are very sparse but they do hint at having to make some modification to a file inside the container.

Above, I am providing my own copy of /usr/local/nginx/templates/listen.gotmpl via a ConfigMap volume mount to override the default one that ships with the container:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
kind: ConfigMap
# <...>
data:
    listen.gotmpl: |
        listen 5000;

        {{ if not .enabled }}
        # intended for external traffic, protected by auth
        listen [::]:8971 ipv6only=off;
        {{ else }}
        # intended for external traffic, protected by auth
        listen [::]:8971 ipv6only=off ssl;

        # intended for internal traffic, not protected by auth
        listen [::]:5000 ipv6only=off;

        {{ end }}

The above is more or less taken from this GH issue but modified slightly:

  • The GH issue uses [::1] for some reason which only binds to the loopback interface and does not work for me. I changed it to just [::] to bind to all interfaces which is what I want for my use case.
  • The ACME/TLS stuff is not relevant to me so I removed that; I just handle TLS termination elsewhere in my stack.
  • There’s still a ton of stuff inside the container that is IPv4 only but that’s a problem for another day / does not violate my self-imposed “IPv6 first, ideally exclusively” requirement.

Conclusion

Frigate puts all the complexity into the container so it’s a pretty straight-forward deployment in k8s. Things get a bit trickier when you start adding hardware acceleration into the mix but it’s still manageable with a bit of effort. For reasons that are beyond me, Frigate still does not have first-class IPv6 support so that requires a bit of manual tinkering as well.