K8S Cluster with Flannel CNI using KinD
This article provides the steps to create a K8S cluster using KinD and set it up with Flannel CNI
In an earlier blog Kubernetes in Docker I have walked through the steps required to build a K8S cluster on top of docker using containers as worker nodes. In this article, I will show how we can build a 3 node K8S cluster with 1 Master and 2 Worker nodes and configure the networking for the cluster using Flannel CNI. I will cover a deep dive of the inner workings of the Flannel network in another article.
Three node K8S cluster – KinD configuration
Lets build a 3 node K8S cluster having 1 Master node and 2 Worker nodes. To do this, create a file kind-config.yaml
and add the below configuration to it
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
Kind uses kindnetd based around standard CNI plugins (ptp, host-local, …) and simple netlink routes as the default networking networking implementation for the cluster. We need to disable it in order to install our custom flannel CNI networking. We do this by adding disableDefaultCNI
parameter to the configuration.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
# the default CNI will not be installed
disableDefaultCNI: true
kubernetes-cni
package is built and the latest changes require us to do a bit more.The problem
A bit of a back story
Earlier, kubernetes-cni package built from https://github.com/containernetworking/plugins repo included the Flannel CNI plugin in addition to all the other basic CNI plugins such as loopback, vlan, macvlan, ptp, host-device e.t.c and Kubeadm package depended on that. Kubelet also depends on the existence of the CNI “loopback” plugin to correctly configure the “lo” interface in containers.
Building and maintaining the CNI plugins was a big pain for the sig-release team and they have decided to stop building kubernetes-cli package. What this means is that the CNI implementations will need to take care of installing all the required plugins themselves. The discussion around this can be found here
The solution
The latest Flannel’s deployment file takes care of installing the flannel CNI plugin but due to some reason the cluster still fails to come up as it can not find the bridge
plugin. Lets see this in action.
$ kind create cluster --name testcluster --config kind-test.yaml
Creating cluster "testcluster" ...
â Ensuring node image (kindest/node:v1.24.0) đŧ
â Preparing nodes đĻ đĻ đĻ
â Writing configuration đ
â Starting control-plane đšī¸
â Installing StorageClass đž
â Joining worker nodes đ
Set kubectl context to "kind-testcluster"
You can now use your cluster with:
kubectl cluster-info --context kind-testcluster
Thanks for using kind! đ
Set kubectl context by following the instructions displayed above:
$ kubectl cluster-info --context kind-testcluster
Kubernetes control plane is running at https://127.0.0.1:44587
CoreDNS is running at https://127.0.0.1:44587/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Check the pod status
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-c9vr2 0/1 Pending 0 73m
kube-system coredns-6d4b75cb6d-v5gsq 0/1 Pending 0 73m
kube-system etcd-testcluster-control-plane 1/1 Running 0 73m
kube-system kube-apiserver-testcluster-control-plane 1/1 Running 0 73m
kube-system kube-controller-manager-testcluster-control-plane 1/1 Running 0 73m
kube-system kube-proxy-f2jqr 1/1 Running 0 73m
kube-system kube-proxy-mwmlh 1/1 Running 0 73m
kube-system kube-proxy-ndltg 1/1 Running 0 73m
kube-system kube-scheduler-testcluster-control-plane 1/1 Running 0 73m
local-path-storage local-path-provisioner-9cd9bd544-z786n 0/1 Pending 0 73m
As can be seen, the coredns and local-path-storage pods are in pending state. This is expected as the CNI has not been configured yet.
Deploy Flannel CNI
The latest Flannel CNI deployment yaml can be obtained from the flannel git repo
By default Flannel deployment uses VXLAN as the backend mechanism
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
You can change it to one of the supported backend mechanisms based on your requirement.
- vxlan
- host-gw
- udp
- alivpc
- alloc
- awsvpc
- gce
- techtonic cloud vpc
- ipip
- ipsec
- wireguard
Lets apply the deployment file
$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
Now if we see the pod status, we see that the coredns containers are stuck in ContainerCreating
state
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-c9vr2 0/1 ContainerCreating 0 133m
kube-system coredns-6d4b75cb6d-v5gsq 0/1 ContainerCreating 0 133m
kube-system etcd-testcluster-control-plane 1/1 Running 0 133m
kube-system kube-apiserver-testcluster-control-plane 1/1 Running 0 133m
kube-system kube-controller-manager-testcluster-control-plane 1/1 Running 0 133m
kube-system kube-flannel-ds-c9q4k 1/1 Running 0 2m32s
kube-system kube-flannel-ds-j2gv9 1/1 Running 0 2m32s
kube-system kube-flannel-ds-wgbgb 1/1 Running 0 2m32s
kube-system kube-proxy-f2jqr 1/1 Running 0 133m
kube-system kube-proxy-mwmlh 1/1 Running 0 133m
kube-system kube-proxy-ndltg 1/1 Running 0 133m
kube-system kube-scheduler-testcluster-control-plane 1/1 Running 0 133m
local-path-storage local-path-provisioner-9cd9bd544-z786n 0/1 ContainerCreating 0 133m
A snippet of the error on the coredns pod:
$ kubectl describe pod coredns-6d4b75cb6d-c9vr2 -n kube-system
Name: coredns-6d4b75cb6d-c9vr2
Namespace: kube-system
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m34s (x26 over 132m) default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
Normal Scheduled 4m24s default-scheduler Successfully assigned kube-system/coredns-6d4b75cb6d-c9vr2 to testcluster-worker
...
...
...
Warning FailedCreatePodSandBox 2m39s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cde73425ddab3522e243e810b75fac3cda51724a8f1f3c45f4a58c6df05bb613": plugin type="flannel" failed (add): failed to delegate add: failed to find plugin "bridge" in path [/opt/cni/bin]
Warning FailedCreatePodSandBox 7s (x12 over 2m28s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fc18a7232cce32804a88edface3219f4d7dcaa6ae4cd3d2e6e268b7f4c30b801": plugin type="flannel" failed (add): failed to delegate add: failed to find plugin "bridge" in path [/opt/cni/bin]
So, we are seeing an error saying that the bridge plugin
is not available in the /opt/cni/bin
folder. Let us verify this on one of the worker nodes.
root@testcluster-worker:/# ls /opt/cni/bin
flannel host-local loopback portmap ptp
Ok. We are seeing that the bridge plugin is indeed missing. There are various ways to go about solving the problem but I will follow the technique mentioned in this blog where we build containernetworking/plugins locally and mount them onto our worker nodes.
Check out and build containernetworking/plugins
- git checkout https://github.com/containernetworking/plugins.git
once checked out execute the ./build_linux.sh
from its home directory
plugins$ ./build_linux.sh
Building plugins
bandwidth
firewall
portmap
sbr
tuning
vrf
bridge
host-device
ipvlan
loopback
macvlan
ptp
vlan
dhcp
host-local
static
Now we need to mount this folder onto our worker nodes. To do this, lets delete our cluster and recreate it
Delete the cluster
$ kind delete cluster --name testcluster
Deleting cluster "testcluster" ...
Update the kind-config.yaml with the CNI plugin mount
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
- hostPath: /home/plugins/bin
containerPath: /opt/cni/bin
- role: worker
extraMounts:
- hostPath: /home/plugins/bin
containerPath: /opt/cni/bin
- role: worker
extraMounts:
- hostPath: /home/plugins/bin
containerPath: /opt/cni/bin
networking:
# the default CNI will not be installed
disableDefaultCNI: true
Recreate the cluster
$ kind create cluster --name testcluster --config kind-config.yaml
Creating cluster "testcluster" ...
â Ensuring node image (kindest/node:v1.24.0) đŧ
â Preparing nodes đĻ đĻ đĻ
â Writing configuration đ
â Starting control-plane đšī¸
â Installing StorageClass đž
â Joining worker nodes đ
Set kubectl context to "kind-testcluster"
You can now use your cluster with:
kubectl cluster-info --context kind-testcluster
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community đ
Deploy the Flannel CNI
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Now lets look at the status of the pods
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-57dff 1/1 Running 0 3m50s
kube-system coredns-6d4b75cb6d-n44n7 1/1 Running 0 3m50s
kube-system etcd-testcluster-control-plane 1/1 Running 0 4m2s
kube-system kube-apiserver-testcluster-control-plane 1/1 Running 0 4m2s
kube-system kube-controller-manager-testcluster-control-plane 1/1 Running 0 4m2s
kube-system kube-flannel-ds-8fk48 1/1 Running 0 81s
kube-system kube-flannel-ds-jzshc 1/1 Running 0 81s
kube-system kube-flannel-ds-tklrh 1/1 Running 0 81s
kube-system kube-proxy-2qvln 1/1 Running 0 3m50s
kube-system kube-proxy-9zbgs 1/1 Running 0 3m46s
kube-system kube-proxy-glgf5 1/1 Running 0 3m33s
kube-system kube-scheduler-testcluster-control-plane 1/1 Running 0 4m2s
local-path-storage local-path-provisioner-9cd9bd544-kfnlk 1/1 Running 0 3m50s
Great!! The setup is up and running.
Test Flannel Network
Now lets deploy two containers and ping between them to ensure that our Flannel CNI has got configured properly and is working as expected.
Deploy two busy box containers. One on each worker node.
kubectl create deployment nwtest --image busybox --replicas 2 -- sleep infinity
deployment.apps/nwtest created
Check the ip address of the pods
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nwtest-65c4d6df6c-dcnrj 1/1 Running 0 3m4s 10.244.1.2 testcluster-worker <none> <none>
nwtest-65c4d6df6c-l9l6r 1/1 Running 0 3m4s 10.244.2.5 testcluster-worker2 <none> <none>
Lets ping 10.244.2.5 from 10.244.1.2
$ kubectl exec -it nwtest-65c4d6df6c-dcnrj -- ping -c 3 10.244.2.5
PING 10.244.2.5 (10.244.2.5): 56 data bytes
64 bytes from 10.244.2.5: seq=0 ttl=62 time=0.331 ms
64 bytes from 10.244.2.5: seq=1 ttl=62 time=0.201 ms
64 bytes from 10.244.2.5: seq=2 ttl=62 time=0.175 ms
--- 10.244.2.5 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.175/0.235/0.331 ms
- Flannel removed from containernetworking/plugins
- https://github.com/kubernetes-sigs/kind/issues/1340
- https://github.com/kubernetes-sigs/kind/commit/281a20c36c91da3347bc514512df10229d053c50
- https://groups.google.com/g/kubernetes-sig-release/c/yhf7hAqJEN0
- https://github.com/containernetworking/plugins
- Flannel CNI Plugin
- https://medium.com/swlh/customise-your-kind-clusters-networking-layer-1249e7916100