Kubeaid-blog

I want to be upfront about something: I am not a Kubernetes expert. I am a developer who decided one day that I wanted to run KubeAid — a full GitOps management suite for Kubernetes — on my local Windows machine using WSL. What followed was one of the most humbling, frustrating, and ultimately satisfying technical experiences I've had in a while.

This is the real story. Not the clean version. The one with the errors and the moments where I genuinely thought it was impossible.

THE BEGINNING

Why KubeAid?

I came across a event where i was told about this service by Akshay Sir obmondo/KubeAid on GitHub — a Kubernetes management suite built around GitOps principles. It bundles ArgoCD, Sealed Secrets, Prometheus monitoring, and a CLI tool that provisions everything automatically. The idea of having a fully declarative, Git-driven Kubernetes setup on my local machine was too good to pass up.

The plan seemed simple enough: install the CLI, generate a config, run a bootstrap command. Easy, right?

It was not easy.

The Setup

My environment going into this:

Machine: Windows 11 with WSL2 (Ubuntu)
Docker: Docker Desktop 4.57.0 with WSL integration enabled
Goal: Run KubeAid locally with ArgoCD accessible in the browser

THE JOURNEY

Step 1 — Installing the CLI

This part actually went fine. The KubeAid team provides a one-liner install script:

curl -fsSL https://raw.githubusercontent.com/Obmondo/kubeaid-cli/main/install.sh | sh

Installed kubeaid-cli v0.20.0 to /usr/local/bin/kubeaid-cli

Great start. I also installed kubectl and verified Docker was working inside WSL. Then I generated the config:

kubeaid-cli config generate local

This created outputs/configs/general.yaml and secrets.yaml. I edited them to point at my forked kubeaid-config repo on GitHub, set up my SSH keys, and felt confident.

Step 2 — The Validation Wall

I ran the bootstrap command and immediately hit a wall. A very large wall:

ERROR: Struct validation failed for general config
Key: 'GeneralConfig.Cloud.AWS.Region' — Field validation for 'Region' failed on 'notblank'
Key: 'GeneralConfig.Cloud.Azure.TenantID' — Field validation for 'TenantID' failed on 'notblank'
Key: 'GeneralConfig.Cloud.Hetzner.HCloud.Zone' — Field validation for 'Zone' failed on 'notblank'
...30 more errors...

The CLI was demanding AWS, Azure, and Hetzner credentials — even though I was deploying locally. This is a bug in the CLI: it validates all cloud provider fields regardless of which provider you're actually using. I spent a long time trying to fill dummy values into the config to bypass the validator.

The validator also expected inline private key content (not just file paths) for Azure and Hetzner SSH key pairs — fields that the generated template doesn't even show. It took a while to reverse-engineer the exact nested YAML structure the Go struct expected.

Step 3 — Going to the Source Code

After hitting the same validation errors in a loop for too long, I decided to fix the problem properly. I cloned the CLI source and found the culprit:

# pkg/config/parser/validate.go, line 55
err = validator.Struct(config.ParsedGeneralConfig)

This line validates the entire config struct — including all cloud providers — before it even checks which provider you're using. The provider-specific validation happens later and correctly skips irrelevant providers. But the global struct validation runs first and fails.

The fix was one line — comment out the global validation:

// err = validator.Struct(config.ParsedGeneralConfig)
err = nil

Then I rebuilt the CLI binary from source:

go build \
  -ldflags "-X github.com/.../version.Version=v0.20.0" \
  -o kubeaid-cli-patched \
  ./cmd/kubeaid-cli/

Step 4 — The Docker API Mismatch

New binary, new error:

ERROR: Failed attaching the KubeAid Core container to the host program — unable to upgrade to tcp, received 404

The CLI was compiled with Docker client library v28, which uses API version 1.54. But Docker Desktop on my machine only supported up to 1.52. I patched the Docker client initialization to pin the API version:

dockerCLI, err := client.NewClientWithOpts(
    client.WithHostFromEnv(),
    client.WithAPIVersionNegotiation(),
    client.WithVersion("1.52"),  // ← added this
)

Step 5 — The ServiceMonitor Problem

After fixing the Docker API issue, the bootstrap finally started doing real work. K3D spun up, the cluster came online — and then:

ERROR: Failed installing Helm chart — resource mapping not found for name "argocd-application-controller": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" — ensure CRDs are installed first

The ArgoCD Helm chart in KubeAid has ServiceMonitor resources enabled by default, which require the Prometheus Operator CRDs to be installed first. But we were skipping monitoring setup. I hit this same error many times while trying different approaches to fix it.

The core issue: the bootstrap logic runs entirely inside a Docker container (ghcr.io/obmondo/kubeaid-core), so patching local Go source files doesn't affect it unless you rebuild the container image. Once I figured that out, the fix was to:

1. Patch the ArgoCD Helm install call in the source to disable ServiceMonitors via Helm values

2. Rebuild the container image with our patches

3. Retag it so the CLI uses our local image instead of pulling from the registry

# Patch argo.go to pass --set values disabling ServiceMonitors
# Then rebuild the container image
docker build \
  --no-cache \
  --build-arg VERSION=v0.20.0 \
  -t ghcr.io/obmondo/kubeaid-core:v0.20.0-local .

# Rebuild CLI to use our custom tag
go build \
  -ldflags "-X .../version.Version=v0.20.0-local" \
  -o kubeaid-cli-patched ./cmd/kubeaid-cli/

Step 6 — It Finally Worked

After all of that, I ran the bootstrap one more time:

./kubeaid-cli-patched devenv create \
  --skip-pr-workflow \
  --configs-directory /home/param/outputs/configs

And watched the logs scroll by. K3D cluster created. Sealed secrets installed. ArgoCD Helm chart installing...

Installing Helm chart.... release-name=argocd ✓
ArgoCD installed successfully. Cluster is ready.

Step 7 — Connecting from WSL

One last WSL-specific hurdle: the kubeconfig used host.docker.internal as the cluster endpoint, which doesn't resolve in WSL. I found the exposed port and pointed kubectl directly at localhost:

CLUSTER_PORT=$(docker port k3d-kubeaid-bootstrapper-serverlb | grep 6443 | awk -F: '{print $2}')
kubectl config set-cluster k3d-kubeaid-bootstrapper \
  --server=https://localhost:$CLUSTER_PORT

Then verified everything was running:

kubectl get pods -A --insecure-skip-tls-verify

# Output:
# sealed-secrets   sealed-secrets-controller   1/1   Running
# default          argocd-server               1/1   Running
# default          argocd-redis                1/1   Running
# kube-system      coredns                     1/1   Running

THE RESULT

ArgoCD is Live

Port-forwarded the ArgoCD service and opened it in my browser:

kubectl port-forward svc/argocd-server -n default 8081:443 --insecure-skip-tls-verify

Navigated to https://localhost:8081 — and there it was. The ArgoCD UI. Running on my local machine. Connected to a real Kubernetes cluster. Ready to deploy applications via GitOps.

KubeAid is deployed. ArgoCD is live at https://localhost:8081. The whole thing runs in Docker on my Windows machine through WSL.

LESSONS LEARNED

What I Learned

1. Read the source code early

When a tool behaves unexpectedly, the source code is always the truth. I wasted hours trying config variations before just reading the validator code and finding the single line causing all my problems.

2. WSL has real networking quirks

host.docker.internal doesn't work in WSL the same way it does on native Linux or macOS. Always check exposed ports and connect via localhost instead.

3. Container images are the real runtime

KubeAid CLI proxies most of its work to a Docker container. Patching the Go source doesn't do anything unless you also rebuild the container image. This caught me off guard and cost a lot of time.

4. Persistence beats knowledge

I didn't know Go. I didn't know the KubeAid internals. But I could read error messages, search the source, form hypotheses, and test them. That loop — read, hypothesize, test — is the only thing that got me through.

"The tools aren't broken. They just weren't tested for your exact situation. Figure out why and fix it."

Final Setup Summary

Cluster: K3D (K3s in Docker) via kubeaid-cli devenv
GitOps: ArgoCD v2.6 connected to github.com/param20h/kubeaid-config
Secrets: Sealed Secrets controller running
Access: https://localhost:8081
OS: Windows 11 / WSL2 Ubuntu / Docker Desktop

If you're trying to do the same thing and hitting the same validation errors — you're not alone, and it is fixable. The KubeAid project is genuinely impressive once it's running. GitOps on local Kubernetes is a great way to learn how production deployments actually work.

Now if you'll excuse me, I have applications to deploy. Wave next to linuxaid