Skip to content
Go back

OPA & Gatekeeper: Policy as Code

By SumGuy 11 min read
OPA & Gatekeeper: Policy as Code

The Moment You Need a Bouncer

Your cluster started with just you. You deployed whatever you wanted, ran containers however felt right, and life was blissful chaos. Then the team grew. Then the standards committee got involved. Then someone asked: “Wait, why does our database pod have hostPath access to the entire filesystem?” and suddenly you’re explaining why that wasn’t a good idea at 3 AM.

Welcome to policy enforcement.

Kubernetes gives you incredible power — maybe too much. Any user with pod creation rights can request privileged mode, mount the host filesystem, pull images from anywhere, or spike resource usage to tank the cluster. You could solve this with yelling and Slack messages. Or you could use Open Policy Agent (OPA) and Gatekeeper to automate it. Think of it as hiring a bouncer: policies are the rules, Gatekeeper is the velvet rope, and Rego is the clipboard the bouncer reads from.

This article covers what OPA and Gatekeeper are, how they work together, real-world constraints for a homelab Kubernetes cluster, and why you’d pick them over alternatives like Kyverno.


What Is OPA (Open Policy Agent)?

OPA is a general-purpose policy engine. It’s not Kubernetes-specific — you can use it to enforce policies on Terraform plans, REST APIs, Envoy proxies, or even static files. The magic is in Rego, a declarative language designed for policy logic.

At its core, OPA answers one question: Given an input (a JSON document), does it violate any policies? The answer is always yes or no, and you control what “no” means by writing rules.

A simple Rego rule looks like:

simple.rego
package main
# Deny if the input is invalid
deny[msg] {
not input.name
msg := "name field is required"
}

If input.name is missing, the policy returns a denial with a message. OPA doesn’t care about the structure — it’s all JSON to OPA. That’s why it’s so flexible. Kubernetes just happens to be a really good use case.


Gatekeeper: OPA for Kubernetes Admission Control

Gatekeeper is the Kubernetes operator that turns OPA into a ValidatingAdmissionWebhook. It sits between kubectl and the API server, intercepting every pod, deployment, service, and custom resource you try to create.

When you apply a manifest, Gatekeeper’s webhook receives the request, sends it to OPA, and tells the API server to allow or deny based on OPA’s decision. It’s transparent — users don’t interact with OPA directly; they just get denied if their manifest violates a policy.

Two key Gatekeeper concepts:

  1. ConstraintTemplate — defines a reusable policy (the bouncer’s manual)
  2. Constraint — applies that policy to specific resources (the bouncer at a specific venue)

A ConstraintTemplate contains Rego code and describes what inputs it validates. A Constraint says “enforce this template on all Pods in the default namespace” or “enforce it on all Deployments everywhere.”


Installing Gatekeeper

If you’re running a decent Kubernetes cluster (even a homelab one), installation is straightforward:

Terminal window
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml

This deploys the Gatekeeper controller, the webhook server, and RBAC rules. It lives in a gatekeeper-system namespace and watches your cluster from there.

To verify it’s running:

Terminal window
kubectl get pods -n gatekeeper-system
kubectl get validatingwebhookconfigurations | grep gatekeeper

You’ll see something like gatekeeper-validating-webhook-configuration. That’s the hook that intercepts API requests.


ConstraintTemplate and Constraint: The Two-Part Enforcement

Here’s where the abstraction pays off. A ConstraintTemplate is the policy definition — it’s YAML + Rego code. A Constraint is a lightweight YAML file that references a ConstraintTemplate and says which resources to enforce it on.

Why split it? Because you might want the same policy rule (e.g., “no privileged pods”) applied in different ways (all namespaces vs. just production, pods only vs. all workloads, etc.). The template is the logic; the constraint is the binding.

Example 1: Disallow Privileged Pods

template.yaml
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
deny[msg] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf("Privileged container %v not allowed", [container.name])
}

This ConstraintTemplate:

The Rego logic is straightforward: iterate through all containers (input.review.object.spec.containers[_]), and if any has privileged == true, deny with a message.

Now apply this template:

Terminal window
kubectl apply -f template.yaml

The template alone doesn’t enforce anything — it just makes the policy available. To actually enforce it, create a Constraint:

constraint.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: block-privileged-pods
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces: ["kube-system", "gatekeeper-system"]
parameters:
exemptions: ["kube-system"]

This Constraint says: “Apply the K8sRequiredLabels policy to all Pods, except those in kube-system and gatekeeper-system.”

Try to create a privileged pod:

Terminal window
kubectl run test-pod --image=nginx --privileged

You’ll get:

Error from server (Forbidden): admission webhook "validation.gatekeeper.sh" denied the request: Privileged container test-pod not allowed

It worked. The bouncer saw a rule violation and sent the request away.


Real-World Constraints for a Homelab

Let’s build a few practical policies you’d want in a real cluster.

Constraint 2: Require Resource Limits

Uncontrolled pods can starve your cluster. Require every container to declare CPU and memory requests/limits:

policy.rego
package k8srequiredresources
deny[msg] {
container := input.review.object.spec.containers[_]
not container.resources.limits
msg := sprintf("Container %v must define resource limits", [container.name])
}
deny[msg] {
container := input.review.object.spec.containers[_]
not container.resources.requests
msg := sprintf("Container %v must define resource requests", [container.name])
}

Create the ConstraintTemplate (similar structure as before, just swap the Rego code), then create a Constraint that applies it to Pods and Deployments.

Constraint 3: Require Specific Labels

For billing, ownership, or cost-center tracking, enforce that certain labels are present:

label-policy.rego
package k8srequiredlabels
required_labels := ["owner", "cost-center", "app"]
deny[msg] {
missing_label := required_labels[_]
not input.review.object.metadata.labels[missing_label]
msg := sprintf("Label %v is required", [missing_label])
}

This iterates through your required labels and denies if any are missing. Users can’t deploy anything without tagging it properly.

Constraint 4: Restrict Image Registries

Block images from untrusted registries. Only allow images from your private registry or a curated list:

registry-policy.rego
package k8sallowedregistries
allowed_registries := [
"docker.io/library/",
"ghcr.io/",
"registry.internal:5000/"
]
deny[msg] {
container := input.review.object.spec.containers[_]
image := container.image
not startswith_allowed(image)
msg := sprintf("Image %v from untrusted registry. Allowed: %v", [image, allowed_registries])
}
startswith_allowed(image) {
prefix := allowed_registries[_]
startswith(image, prefix)
}

Now when someone tries to deploy ghcr.io/sketchy-project/backdoor:latest, Gatekeeper stops it.

Constraint 5: Block hostPath Volumes

HostPath volumes bypass container isolation. In a multi-tenant or security-conscious cluster, ban them:

hostpath-policy.rego
package k8snohostpath
deny[msg] {
volume := input.review.object.spec.volumes[_]
volume.hostPath
msg := sprintf("hostPath volume %v not allowed", [volume.name])
}

Testing Policies: The Dryrun Approach

Before enforcing a policy cluster-wide, test it in audit mode first. Gatekeeper can evaluate policies without blocking requests — violations just get logged.

In your Constraint, add:

spec:
enforcementAction: audit
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]

With audit mode, denied requests still get created, but you get a warning in the status. Roll out this way for a week, check the audit logs, tune your policy, then switch to enforce:

spec:
enforcementAction: enforce

This dryrun pattern prevents surprise outages. You’ll catch badly-written policies or overly broad constraints before users hit them in production.


Understanding Rego: The Policy Language

Rego is declarative — you describe facts and rules, not steps. Here’s what you need to know:

Packages organize rules:

package k8srules

Rules define policy logic. A deny rule blocks a request:

deny[msg] {
condition
msg := "reason"
}

Input is the resource being evaluated. In Kubernetes, it’s always input.review.object:

input.review.object.spec.containers[_] # iterate containers
input.review.object.metadata.labels # access labels
input.review.object.metadata.namespace # namespace

Iteration uses [_] to iterate over arrays:

deny[msg] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := "no privileged containers"
}

This loops through every container and denies if any is privileged.

Built-in functions like startswith(), contains(), sprintf(), and regex.match() let you do string operations:

deny[msg] {
image := input.review.object.spec.containers[0].image
regex.match("^docker.io/", image) # matches docker.io/* images
msg := "no Docker Hub images"
}

The Rego docs (rego.dev) are excellent. You don’t need to memorize everything — most policies are just iterations and condition checks.


Mutation: Assigning Defaults

Beyond enforcement, Gatekeeper can mutate resources — rewrite them to meet policy standards. For example, automatically add resource limits or assign namespace-based labels.

Mutations are not defined in ConstraintTemplates and Rego cannot perform mutations in Gatekeeper. Instead, Gatekeeper provides separate dedicated CRD resources for mutation:

An Assign that injects a default memory limit looks like this:

assign-limits.yaml
apiVersion: mutations.gatekeeper.sh/v1alpha1
kind: Assign
metadata:
name: assign-default-memory-limit
spec:
applyTo:
- groups: [""]
kinds: ["Pod"]
versions: ["v1"]
match:
scope: Namespaced
kinds:
- apiGroups: ["*"]
kinds: ["Pod"]
location: "spec.containers[name:*].resources.limits.memory"
parameters:
assign:
value: "256Mi"

This rewrites every container that is missing a memory limit, injecting the default. Gatekeeper applies it as a MutatingAdmissionWebhook before the object is persisted.

This is powerful but also risky — a bad mutation can silently change resource configurations. Test mutations in a staging cluster and audit the results before rolling out to production.


OPA Beyond Kubernetes

OPA’s real superpower is that it’s not Kubernetes-specific. You can use OPA for:

This means your policy rules can be consistent across your entire infrastructure. The Rego code for “no privileged containers” works in Gatekeeper, Conftest, and API policies. One language, many contexts.


Kyverno: The Friendly Cousin

If Gatekeeper feels complex, meet Kyverno — a Kubernetes-native policy engine that uses YAML instead of Rego.

A Kyverno ClusterPolicy looks like:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-requests-limits
spec:
validationFailureAction: audit
rules:
- name: check-resources
match:
resources:
kinds:
- Pod
validate:
message: "CPU and memory limits required."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
requests:
memory: "?*"
cpu: "?*"

This is easier to read if you’re not comfortable with Rego. The downside: Kyverno is less expressive. Complex logic (regex matches, fuzzy iteration, conditional mutations) is harder in pure YAML.

OPA wins if: You need expressive policy logic, want to reuse rules across platforms, or plan to use Conftest for static checks.

Kyverno wins if: Your policies are simple, you prefer YAML, and you don’t want to learn Rego.

For a homelab, either works. For large organizations, OPA’s flexibility usually wins.


Monitoring Policy Violations

Gatekeeper logs denials to the API server audit log. You can also query the Gatekeeper metrics:

Terminal window
kubectl logs -n gatekeeper-system deployment/gatekeeper-audit

This shows violations that were caught in audit mode. To see all constraints and their status:

Terminal window
kubectl get constraints
kubectl describe constraint block-privileged-pods

The status section shows violations. In a mature cluster, you might have a dashboard (Prometheus + Grafana) scraping Gatekeeper’s /metrics endpoint to alert on policy changes.


The Reality of Policy as Code

Here’s the honest part: policy enforcement is cultural before it’s technical. The best policy engine fails if your team doesn’t understand or agree with the policies. Before rolling out Gatekeeper:

  1. Document your policies — Why does every pod need resource limits? Because uncontrolled pods starve the cluster. Tell your team.
  2. Start with audit mode — Enforce in production only after your team has seen real violations and understands the rules.
  3. Provide escape hatches — Allow policy exceptions in specific namespaces (e.g., kube-system) or require an owner comment for overrides.
  4. Iterate — Your policies will change as your cluster matures. Gatekeeper makes that easy.

The bouncer metaphor only works if everyone agrees the bouncer’s rules are fair.


Wrapping Up

OPA and Gatekeeper turn policy enforcement from a hope-for-the-best exercise into a codified guardrail. You define what’s allowed, the system enforces it, and violators get immediate feedback instead of surprises in production.

For a homelab Kubernetes cluster, starting with three constraints makes sense:

  1. No privileged pods (security)
  2. Resource limits required (stability)
  3. Specific labels required (operations)

From there, add constraints as pain points emerge. Your 3 AM self will thank you.

Now go forth and let OPA be the bouncer your cluster deserves.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Minisforum vs Beelink vs Trigkey Homelab
Next Post
SigNoz vs Uptrace Self-Hosted Observability

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts