Introduction

Consider a scenario where you have a MySQL pod that your application uses, data can be added, updated, and deleted as the situation demands. By default, when you restart the MySQL pod, all the data will be gone because Kubernetes does not provide data persistence out of the box. You have to explicitly configure this for each application that needs to persist data between pod restarts.

You need

A storage that is not dependent on pod lifecycle
A storage that is available on all nodes
A highly available storage i.e storage that can survive cluster crashes

Kubernetes allows you to define how you want to persist data and how to access them. It also provides several persistence options like local, cloud, network file system - NFS, etc.

In this article, you will learn how to persist data in Kubernetes using abstractions like persistent volume, persistence volume claim, storage classes, and how each component is created and used for data persistence. You will also learn how to use config maps and secrets to define configuration files for your applications.

Persistent Volumes - PV

A persistent volume is a cluster resource that is used to store data. It can be created using a YAML file. It's an abstraction that needs actual physical storage like a local hard drive, NFS storage, cloud storage for persisting data. Storage in Kubernetes needs to be managed by an administrator as Kubernetes only provides the interface for storing data and doesn't manage them. You can have multiple storages configured for your cluster where one application uses local disk storage, NFS server, or cloud storage. It's also possible for one application to use different storage backends. The different storage options can be configured under the spec section of the Persistent volume configuration file e.g This is a sample persistent volume with nfs as the storage backend

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-name
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
    - hard
    - nfsvers=4.0
  nfs:
    path: /dir/path/on/nfs/server
    server: nfs-server-ip-address

An example of persistent volume with google cloud storage as the backend

apiVersion: v1
kind: PersistentVolume
metadata:
  name: test-volume
  labels:
    failure-domain.beta.kubernetes.io/zone: us-central1-a__us-central1-b
spec:
  capacity:
    storage: 400Gi
  accessModes:
  - ReadWriteOnce
  gcePersistentDisk:
    pdName: my-data-disk
    fsType: ext4

Depending on the storage type the spec attribute of the YAML configuration will be different because it's specific to the storage type. Kubernetes supports over 25 storage backends for your persistent volumes. You can check them out here

Persistent volumes are not namespaced, which means that they are accessible to the whole cluster. Unlike other Kubernetes objects like pods, deployments, replicaset, etc. Persistent volumes can be accessed from anywhere in the cluster

Persistent Volume Claims - PVC

A persistent volume claim is a request for storage by a pod. A Kubernetes administrator creates a Persistent volume that access data using local or external storage. Then, the Persistent volume claim requests for resources from the Persistent Volume created by the administrator. The pod can now access the local or external storage by using the persistent volume claim. Storage will not be assigned when the requested storage exceeds available storage defined in the Persistent volume

For example, the PVC below is trying to claim 10Gi of storage from the Persistent volume in the cluster

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-name
spec:
  storageClassName: manual
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

So the process involved in claiming storage from the Persistent Volume includes:

Pod requests for volume through the Persistent Volume Claim
The Persistent Volume Claim tries to find the Persistent Volume in the cluster that satisfies the requirements
The Persistent Volume contains the actual resources and is only released when the claim satisfies the available resource requirements

It's important to note that, claims must exist in the same namespace as the pod. Once the pod finds the matching persistent volume through the persistent volume claim, the volume is then mounted into the pod

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: pvc-name

Storage Classes

Storage classes allow you to provision Persistent Volumes dynamically whenever a Persistent Volume Claim, claims it. Storage class can also be created using YAML configuration file e.g

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: storage-class-name
provisioner: kubernetes.io/aws-ebs
parameters:
  type: io1
  iopsPerGB: "10"
  fsType: ext4

The provisioner attribute on the storage class configuration file is what is used to determine the persistent volume storage backend. Each storage backend in the Persistent Volume has its own provisioner

Storage classes are another abstraction level, that abstracts the underlying storage provider and parameters for that storage. They can then be used to provision Persistent Volumes dynamically as the situation demands. Storage classes are usually requested by a Persistent Volume Claim. You can think of the flow for claiming storage using the steps below:

Pod claims storage via a Persistent Volume Claim
Persistent Volume Claim requests storage from Storage Class
Storage class creates Persistent Volume that satisfies the claim's requirement using provisioner from the actual storage backend

An example of how Persistent Volume Claim claims storage from the Storage class.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
     name: mypvc
spec:
     accessModes:
     - ReadWriteOnce
     resources:
       requests:
         storage: 100Gi
     storageClassName: storage-class-name

Config Maps

With config maps, you can store non-confidential data as key-value pairs which can be used as configurations files for your applications. These configuration files can be consumed by Pods as environment variables, files, command-line arguments, or as a volume.

It is important to note that Config maps do not provide encryption or secrecy of data. If you need to store confidential data you can consider using a Kubernetes secret

# Sample config map
apiVersion: v1
kind: ConfigMap
metadata:
  name: special-config
  namespace: default
data:
  special.how: very
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: env-config
  namespace: default
data:
  log_level: INFO

How the config map is consumed in the pod

apiVersion: v1
kind: Pod
metadata:
  name: dapi-test-pod
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "/bin/sh", "-c", "env" ]
      env:
        - name: SPECIAL_LEVEL_KEY
          valueFrom:
            configMapKeyRef:
              name: special-config
              key: special.how
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: env-config
              key: log_level
  restartPolicy: Never

There are several other configurations you can set when creating a config map like creating config maps from files, passing values value command-line arguments, etc. To read more about this, you can check here

Secrets

With Kubernetes secrets, you can store confidential information. It prevents you from hardcoding sensitive data into your code. Secrets are similar to config maps but differ in terms of data security. There are different kinds of secrets, namely:

Opaque secrets
Service account token secrets
Docker config Secrets
Basic authentication Secret
SSH authentication secrets
TLS secrets
Bootstrap token Secrets

You can read more about each type of secret here

NB: It's important to note that anyone with access to your Kubernetes API or etcd can safely modify secrets as they are stored unencrypted in the API Server. To prevent this you can consider

Enabling Encryption at Rest for Secrets.
Configuring Role-based access control (RBAC) rules to limit who can create and access secrets

Secrets can be used as files in a volume mounted on one or more containers. It can also be used as an environment variable.

An example of a secret

apiVersion: v1
kind: Secret
metadata:
    name: mysecret
type: Opaque
data:
    username: YWRtaW4=
    password: MWYyZDFlMmU2N2Rm

The values in the data section should be in base64. To convert string to base64, you can run

echo -n 'admin' | base64
# YWRtaW4=

echo -n '1f2d1e2e67df' | base64
# MWYyZDFlMmU2N2Rm

Where admin is the username and 1f2d1e2e67df is the password.

To use the secret as environment variables in a pod

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myapp
      image: ubuntu
      env:
        - name: USERNAME
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: username
        - name: PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: password

Conclusion

Storage is one of the components of modern applications. Understanding how to configure different storage backends and how to access them is very crucial. Kubernetes provides several levels of abstractions like persistent volumes, persistent volume claims, storage classes, secrets, config maps for defining how data can be requested and used in your applications. Also, setting permissions and putting appropriate measures to prevent data breaches or intrusion is of utmost importance. You can read more on RBAC and Encrypting secrets at rest