Don’t delete cluster-admin clusterrole and clusterrolebinding…. uggh, too late…

Andy Anderson
5 min readMar 4, 2024

--

Have you ever accidently deleted a file or thrown out an important document that got lumped in with a pile of unwanted paperwork? We’ve all been there. I have more than my share of hap-hazard instances where I deleted a file or credential in the course of working with source control or cleaning up after a long night of debugging.

Kubernetes is very forgiving when you delete an object, unless you don’t have permissions to recreate it. In this particular instance I accidently deleted the cluster-admin clusterrole and clusterrole binding to one of my OpenShift clusters. How did I do this? It wasn’t intentional, I can tell you that much. In fact, I didn’t know how or when I deleted the objects either until a bit later on.

What did I do wrong?

The first indication that I did something wrong was when I logged into the OpenShift cluster from the command line and received the dreaded “cluster-admin” role is missing. You will also see, in the OpenShift UI, that you no longer have access to rolebindings:

At the time, I didn’t think this was caused directly by an action that I took. As it turns out, I had been experimenting with Kubestellar’s integration with Open Cluster Manager. I was trying to synchronize an Kuberay’s clusterroles and clusterrolebindings:

apiVersion: control.kubestellar.io/v1alpha1
kind: BindingPolicy
metadata:
name: kuberay-placement
spec:
wantSingletonReportedState: true
clusterSelectors:
- matchLabels:
location-group: edge
downsync:
- apiGroup: ""
resources: [ "namespaces" ]
objectNames: [ "kuberay-operator" ]
- apiGroup: apiextensions.k8s.io
resources: [ customresourcedefinitions ]
namespaceSelectors: []
objectNames: [ "servicemonitors.monitoring.coreos.com", "rayclusters.ray.io", "rayjobs.ray.io", "rayservices.ray.io" ]
- apiGroup: rbac.authorization.k8s.io
resources: [ clusterroles, clusterrolebindings ]
objectNames: ["*"]
- apiGroup: rbac.authorization.k8s.io
resources: [ roles, rolebindings ]
namespaces: [ "kuberay-operator" ]
- apiGroup: apps
resources: [ deployments ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]
- apiGroup: ""
resources: [ services ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]
- apiGroup: ""
resources: [ serviceaccounts ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]

Note the lines above for [clusterroles, clusterrolebindings]. You will see that under the ‘objectnames’ I put “*”. That is the little mistake that I made that cause a multitude of problems. For clusterscoped objects (namespaces, clusterroles, clusterrolebindings, customresourcedefinitions, and etc.) you should never wildcard objectnames for selection to downsync. The downsyncing isn’t the problem, its the deletion or update that happens when you decide to remove KubeStellar or the Workload Descriptor Space (WDS) associated with the objects you synced. I did not have any intention of synchronizing objects other than those in Kuberay’s deployment, but the net effect of what I selected created syncrhonization records for all the clusterroles and clusterrolebindings in the cluster.

How do you avoid this?

Simply replace the “*” selector for objectNames with specific object names. Here is what I did:

apiVersion: control.kubestellar.io/v1alpha1
kind: BindingPolicy
metadata:
name: kuberay-placement
spec:
wantSingletonReportedState: true
clusterSelectors:
- matchLabels:
location-group: edge
downsync:
- apiGroup: ""
resources: [ "namespaces" ]
objectNames: [ "kuberay-operator" ]
- apiGroup: apiextensions.k8s.io
resources: [ customresourcedefinitions ]
namespaceSelectors: []
objectNames: [ "servicemonitors.monitoring.coreos.com", "rayclusters.ray.io", "rayjobs.ray.io", "rayservices.ray.io" ]
- apiGroup: rbac.authorization.k8s.io
resources: [ clusterroles, clusterrolebindings ]
objectNames:
- kuberay-operator
- rayjob-editor-role
- rayjob-viewer-role
- rayservice-editor-role
- rayservice-viewer-role
- kuberay-operator
- apiGroup: rbac.authorization.k8s.io
resources: [ roles, rolebindings ]
namespaces: [ "kuberay-operator" ]
- apiGroup: apps
resources: [ deployments ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]
- apiGroup: ""
resources: [ services ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]
- apiGroup: ""
resources: [ serviceaccounts ]
objectNames: [ "*" ]
namespaces: [ "kuberay-operator" ]

You see that I now have a list of specific object names for clusterroles and clusterrolebindings that will be synchronized — nothing more and nothing less.

What can you do if you're reading this article after the fact?

Stop everything your doing and read carefully. Your cluster-admin clusterrole and clusterrolebinding look very much like these object definitions:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- nonResourceURLs:
- '*'
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:masters

If your suddenly missing these objects on your OpenShift cluster, your in the right place to learn how to recover them.

Start by taking stock of any service accounts that exist in all of your clusters namespaces. You're looking specifically for a service account that created a secret with a token that can be used with ‘oc login’ to get administrative access to the cluster. For me, I found a service account for Open Cluster Management that existed in the ‘open-cluster-management-agent’ namespace. I know that the Open Cluster Management work agent is responsible for creating, reading, updating, and deleting (CRUD) of objects on a cluster where it is installed. This is ideal for our purposes here.

I scrolled down (you can also ‘get’ this using kubectl) to retrieve the value of the “token” key. Note that this key, if retrieved using kubectl, is base64 encoded and you will need to base64 decode it to get the value that can be used by ‘oc login’

Once you have the value of the token from the secret associated with the service account for the project that has CRUD access to your cluster, you can then use that token in your ‘oc login’ command to gain command line access to your cluster. The next step is to create the cluster-admin clusterrole and clusterrolebinding. And that’s it. Now you should be able to see the clusterrole and clusterrolebinding created for the cluster-admin and you should also be able to login to your cluster and use your administrative permissions again.

I hope this helps someone else recover from this type of error. I couldn’t find any documentation and was stuck for a brief period of time without a way to recover. Thanks for stopping by!

--

--

Andy Anderson
Andy Anderson

Written by Andy Anderson

IBM Research - KubeStellar, DevOps, Technology Adoption, and Kubernetes. Views are my own.

No responses yet