Introduction to Kubernetes API - the way to understand the concept of Kubernetes Operators.

Michał Świerczewski
14 min readMay 23, 2019

--

To understand the concept of Kubernetes Operators you have to better understand the Kubernetes API itself. I want to show you where you can find it in your cluster, how it works (without getting too deeply) and how you can extend it. I will also explain what Custom Resources are, what is Custom Resource Definition and API Aggregation and finally what is the advantage of using Operators. It is worth mentioning that it’s not a guide how to use operator development tools (like Operator SDK).

What is the Kubernetes API Server?

Think about it as a gateway to your Kubernetes cluster. It implements a RESTful API over HTTP which basically means that every API request is an HTTP request. Additionally it validates those requests and update corresponding objects in a database (like etcd) if necessary. When we create an object, we send the request to API Server using kubectl command-line interface. API Server will store the state into etcd that is external to API server and interact with other master components to ensure the object exists. Keep in mind that API Server is the only Kubernetes component that connects to etcd. Other components must go through the API Server to work with the cluster state. That’s why API Server itself is stateless.

To remember:
- implements a RESTful API over HTTP
- stateless
- responsible for the authentication and authorization mechanism
- implements a watch mechanism (similar to etcd) for clients to watch for changes
- can be replicated to handle request load and for fault tolerance
- declarative management of Kubernetes objects (Kubernetes APIs are declarative in nature)
- API extensions can be integrated with Kubernetes native authentication
- every request to Kubernetes follows RBAC model

The primary use for API is servicing client http requests but before that user must know how to make such API request. To understand it better you can run kubectl in proxy mode to expose unauthenticated API server on `localhost:8001` running `kubectl proxy` and pointing for example to `localhost:8001/api/v1`.

Additionally here you have some basic request flow schema:
API request >> Authentication (Has this user proven their identity?) >>
Access Control/Authorization (Is this user allowed to perform this action?) >> Admission control (Does this request look good?) >> Process the request

Where the Kubernetes API Server runs?

We have here two main components.
Kubernetes Master which provides cluster’s control plane. Control plane nodes take care of routine tasks to ensure that your cluster maintains your configuration. In addition to the database (etcd) there are three processes running on these nodes, these are: kube-scheduler, kube-controller-manager and kube-apiserver which is I would say a front-end for the control plane. All these components generally make decisions about scheduling, detecting and responding to cluster events.
Kubernetes Node (non-master node) is running two processes, kubelet and kube-proxy. Briefly speaking these nodes maintain running pods, provide the Kubernetes runtime environment, run storage and networking drivers, as well as ingress controllers whenever required.

Kubernetes objects and API components

To work with Kubernetes it’s good to know what Kubernetes objects are.

In official Kubernetes documentation you can find that:

Kubernetes Objects are persistent entities in the Kubernetes system. Kubernetes uses these entities to represent the state of your cluster. Specifically, they can describe:

- What containerized applications are running (and on which nodes)
- The resources available to those applications
- The policies around how those applications behave, such as restart policies, upgrades, and fault-tolerance

A Kubernetes object is a “record of intent” - once you create the object, the Kubernetes system will constantly work to ensure that object exists. By creating an object, you’re effectively telling the Kubernetes system what you want your cluster’s workload to look like; this is your cluster’s desired state. To work with Kubernetes objects–whether to create, modify, or delete them–you’ll need to use the Kubernetes API.

Following Kubernetes documentation you would learn that abstractions that represent the state of your system: deployed containerized applications and workloads, their associated network and disk resources, and other information about what your cluster is doing are represented by
objects in the Kubernetes API.

We can split objects into basic one:
- Pod
- Service
- Volume
- Namespace

and more high-level objects built on the logic of the basic ones:
- ReplicaSet
- Deployment
- StatefulSet
- Job

The latter type of objects are also called controllers because Kubernetes use them to create and manage the desired state of pods for us.

For better understanding you can think about object as a representation of `group + version + type(kind)`, for example `/api + v1 + Pod`. Where:

API group:
is specified in a REST path and in the apiVersion field of a serialized object.
- the core (also called legacy) group, which is at REST path /api, for example: /api/v1 and is not specified as part of the apiVersion field, for example: apiVersion: v1
- the named groups are at REST path /apis/, for example /apis/$GROUP_NAME/$VERSION and use
apiVersion: $GROUP_NAME/$VERSION, for example apiVersion: batch/v1

API Versioning:
- e.g., v1alpha1, API may be buggy and unstable, disabled in production clusters
- e.g., v1beta1, API may have bugs but generally is stable, may be incompatible between releases, enabled in production clusters
- e.g., v1, API is stable and become generally available, supports backward compatibility, suitable for production

To print the supported API versions you can use kubectl:

$ kubectl api-versions

Most of the time we write an object spec and send that spec to the API Server via kubectl where Kubernetes will try to fulfill that desired state and update the object status but what is the spec and status?
Kubernetes objects include two nested objects that drive the object’s configuration: the object spec and object status.
Object spec, which you must provide in your template/manifest file (written in YAML or JSON) describes the desired state of Kubernetes objects.
Object status, is your actual/current state which is going to be updated once you apply spec (if no errors).

After you created these manifest files you can push them to the Kubernetes API using either kubectl or using the Kubernetes API directly in your own apps base on Client Libraries.

Here you can find the list of available client libraries.

Now I hope you better understand this YAML-formatted spec fragment:

When you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state. It’s important to remember that Control Plane’s component where control loops are running is Controller Manager (CM). Control loops (known as controllers like Deployment Controller or ReplicaSet Controller) through Kubernetes API are observing the state of the cluster, if it gets notified, it makes necessary changes in the current state to reach the desired state.

Simple flow:

1. kubectl, send http request (deploy pod with nginx) to the API Server
2. API Server validates the request and persists it to etcd
3. etcd notifies back the API Server
4. [Controller-manager] Deployment controller is observing the state of the cluster through the API Server
- new desired state described in a Deployment object `Deployment / metadata.name: nginx / spec.replica: 1`
- creates new ReplicaSet object `ReplicaSet / metadata.name: nginx-XYZ` to scale up new Pod
5. API Server invokes the Scheduler
- object `Pod / metadata.name: nginx-XYZ-XXX / spec.nodeName: <null>`
6. Scheduler decides on which Node to run the pod on and return that to the API Server
- run pod on some Node, `Pod / metadata.name: nginx-XYZ-XXX / spec.nodeName: NodeX`
7. API Server persists it to etcd
8. etcd notifies back the API Server
9. API Server invokes the Kubelet in the corresponding node, NodeX
10. Kubelet talks to the container runtime engine like docker
- get image and run container
- update the pod status to the API Server, `Pod / metadata.name: nginx-XYZ-XXX / spec.nodeName: NodeX / status: Pending`
11. Kubelet updates the pod status (Pending, ContainerCreating, Running etc.) to the API Server
12. API Server persists the new state in etcd, `Pod / metadata.name: nginx-XYZ-XXX / spec.nodeName: NodeX / status: Running`

Ways to extend Kubernetes API

Sometimes, you would like to add entirely new API resource types to your cluster. In Kubernetes you are allowed to take an existing cluster, with a collection of built-in API types, like Pods, Services, and Deployments, and add new types that look and feel exactly as if they had been built in.
At the highest level, you can think of this method as a kind of extension i.e. adding new API objects to the Kubernetes API server that look as if they have been compiled into Kubernetes.

Important keywords:

resource, is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind, for example, the built-in pods resource contains a collection of Pod objects. You remember that object can be described like group + version + kind, so specific URL to get this object is a resource, taking pod as a example to get a list of v1 pod objects you refer to `/api/v1/pods` resource or `/api/v1/namespaces/<namespace>/pods/<pod>` resource.

custom resource (CR), is an object that adds objects to the existing Kubernetes API or allows you to introduce your own API into a project or a cluster. Once a custom resource is installed, users can create and access its objects using kubectl, just as they do for built-in resources like Pods.

custom controller, the state of a Kubernetes cluster is fully defined by the state of the resources it contains. The controller responsible for specific resource continually works to make the state of your Kubernetes cluster match the state declared in the resource so if a resource changes, its controller as a result will affect cluster to reflect those changes. Its role can be split into:
- observe/watch (is running X-number of pods)
- analyze/check the difference (N-number of pods should run instead of X-number)
- act/update (start missing Pod)

Kubernetes provides two ways to add custom resources to your cluster:

Custom Resource Definition (CRD), more about here, CRD API resource allows you to define custom resources. Defining a CRD object creates a new custom resource with a name and schema that you specify. The Kubernetes API serves and handles the storage of your custom resource. This frees you from writing your own API server to handle the custom resource, but the generic nature of the implementation means you have less flexibility than with API server aggregation. So:
- no additional service to run; CRs are handled by API Server,
- CRDs allow users to create new types of resources without adding another API Server. Regardless of how they are installed, the new resources are referred to as Custom Resources to distinguish them from built-in Kubernetes resources like Pods,
- deploying a CRD into the cluster causes the Kubernetes API server to begin serving the specified custom resource,
- with a CRD in place, users gain access to a significant subset of Kubernetes API functionality, such as CRUD, RBAC, lifecycle hooks, and garbage collection.

Here you can find description how to create your own CRD and corresponding CR with details.

Simple flow:

API Aggregation (AA), usually, each resource in the Kubernetes API requires code that handles REST requests and manages persistent storage of objects. The main Kubernetes API server handles built-in resources like pods and services, and can also handle custom resources in a generic way through CRDs.
The aggregation layer allows you to provide specialized implementations for your custom resources by writing and deploying your own standalone API server. The main API server delegates requests to you for the custom resources that you handle, making them available to all of its clients.
- if you just want to add a resource to your Kubernetes cluster, then consider using Custom Resource Definition. They require less coding and rebasing.
- If you want to build an Extension API server, consider using apiserver-builder like kubebuilder.

Some info about aggregation layer itself:
- configuring the aggregation layer allows the Kubernetes API Server to be extended with additional APIs, which are not part of the core Kubernetes APIs
- it enables you to create/build Kubernetes-style APIs within a cluster. These APIs can be user-generated/created, for example through kubebuilder
- alternatively, these extensions could come as a third-party kits/solutions such as Service Catalog, a service that enables you to provision cloud services from within native Kubernetes tooling

Some keywords related to AA:
aggregated APIs, are subordinate API Servers that sit behind the primary API server, which acts as a proxy
extension-apiserver, API Servers are created using the library kube-apiserver. It provides a way of defining custom APIs that are deeply integrated with core Kubernetes API machinery. Compared to CRDs, it offers more features and options for dealing with performance, policy and customization
delegated authN/Z, the mechanism where-in API calls to the extension-apiserver are authenticated & authorized by the core Kubernetes API Server (kubernetes master)

When you register CRDs or Aggregated APIs, a new API-version is added to the list.

# get all
$ kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
events ev true Event
...
# get all + verbs (create, update, get etc.)
$ kubectl api-resources -o wide
...
# get base on specific APIGroup
# e.g., APIGroup for Job is batch, so apiVersion: batch/v1 / kind: Job, in comparison to Pod apiVersion: v1 / kind: Pod
$ kubectl api-resources --api-group=batch
NAME SHORTNAMES APIGROUP NAMESPACED KIND
cronjobs cj batch true CronJob
jobs batch true Job
...
# get more info about the particular object
$ kubectl explain jobs
KIND: Job
VERSION: batch/v1
DESCRIPTION:
Job represents the configuration of a single job.
...

Additional step before Operators

How the processes running inside a Pod access the API?
At first, you might ask yourself why a process running in the context of a Pod might require API access.

A Kubernetes cluster is a state machine made up of a collection of controllers.
Each of these controllers is responsible for reconciling the state of the user-specified resources. When you create a new CRD, the Kubernetes API Server reacts by creating a new RESTful resource path, that can be accessed by an entire cluster or a single namespace. To allow/grant access you have to use cluster role aggregation. Cluster role aggregation allows the insertion of custom policy rules into these cluster roles. This behavior integrates the new resource into the cluster’s RBAC policy as if it was a built-in resource. The way that Kubernetes handles these use cases is using the ServiceAccount resource.
You can think of ServiceAccounts as namespaced user accounts for all Pod resources.

Base on Kubernets built-in objects and using one of the client libraries (client-go) I want to show you simple app (it’s some kind of controller) based on ServiceAccount events. My point is not to focus on how to correctly create controller and show best GO practices but rather present how you can take advantage of the gathered Kubernetes knowledge and use it in your code.
Here you can find related repo.

Use case:
Controller is watching ServiceAccount (SA) events in specific namespace (by default in default one). If Operator creates in the default namespace SA with suffix `-dev` for example `mytestsa-dev`, developer who is using this SA is allowed to have full access to already existing namespace `namespace-dev` dedicated for such SA. It's possible because under the hood controller will setup rbac to allow access to this namespace. If you want to limit the access you can change that in function `setupRbac` for example from full access to list and watch pods etc.
If SA was removed from the default namespace, rbac would be removed too.

If you look into the code, this part:

is very similar to YAML definition of Role object:

So it’s pretty easy to understand at least this line of code : )

# [console 1] run app
$ ./serviceaccount
...
# [console 2] create ServiceAccount
$ kubectl create serviceaccount test-dev
serviceaccount/test-dev created
# [console 1] check
2019/05/21 09:20:33 Event added, name: test-dev, secrets: []
2019/05/21 09:20:33 Event modified, name: test-dev, secrets: [{ test-dev-token-tbswz }] in namespace: default, age: 2019-05-21 09:20:17 +0200 CEST
2019/05/21 09:20:33 Role: test-dev-r, created in namespace: namespace-dev
2019/05/21 09:20:34 Role binding: test-dev-rb, created in namespace: namespace-dev
2019/05/21 09:20:34 Service account added to default namespace.
# [console 2] delete ServiceAccount
$ kubectl delete serviceaccount test-dev
serviceaccount "test-dev" deleted
# [console 1] check
2019/05/21 09:25:03 Event deleted, name: test-dev in namespace: default
2019/05/21 09:25:03 Role: test-dev-r, deleted from namespace: namespace-dev
2019/05/21 09:25:04 Role binding: test-dev-rb, deleted from namespace: namespace-dev

Another reason why I showed you that is because if you do it like that you have to take care of this app on your own in case of failures etc…

Operators

Operators concept was introduced by CoreOS:

An Operator is an application-specific controller that extends the Kubernetes API to create, configure and manage instances of complex STATEFUL applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts, but also includes domain or application-specific knowledge to automate common tasks better managed by computers.

We use Operators because managing stateful applications, like databases, caches and monitoring systems, is a big challenge, especially at massive scale. These systems require human operational knowledge to correctly scale, upgrade and reconfigure while at the same time protecting against data loss and unavailability.

Additionally I would add that operator is a customized controller implemented with CRD running in pods and managed with Kubernetes API.
It follows the same pattern like build-in controllers (i.e. watch, diff, act).
We talk about operators if they have controller pattern + API extension + single-app focus. Application itself is deployed using built-in objects like Deployment, ReplicaSet etc. For me the difference between controller and operator is that controller (core/built-in Kubernetes component) watches and reacts to native Kubernetes objects where operator (extension to Kubernetes) is dedicated for specific app like `coreos/prometheus-operator`.

What for?
When you scale your statefull application (i.e etcd-cluster) you have to manage creation of new DNS entry for new member and tell the cluster about this member so basically you have to handle internal configuration, communication with a clustering mechanism, DNS etc. All that can be handled automatically in your operator and your task would be only to scale your app.

Advantage of operators:
- handling updates from one version to another
- handling failure recovery if it’s needed, scaling the application up and down depending on use cases
- without Operators, many applications need intervention to deploy, scale, reconfigure, upgrade, or recover from faults

Challenging task for developer is to create such operator but this can be covered (partially because logic of your app should be done by you) using operator development tools:
- operator-sdk
- kubebuilder
- metacontroller
Additionally you can follow the list of already existing operators which you can find here (awesome-operators).

I hope this article was helpful to you. Thank you for reading!

--

--

Responses (1)