Storage in Kubernetes
Introduction
Consider a scenario where you have a MySQL pod that your application uses, data can be added, updated, and deleted as the situation demands. By default, when you restart the MySQL pod, all the data will be gone because Kubernetes does not provide data persistence out of the box. You have to explicitly configure this for each application that needs to persist data between pod restarts.
You need
- A storage that is not dependent on pod lifecycle
- A storage that is available on all nodes
- A highly available storage i.e storage that can survive cluster crashes
Kubernetes allows you to define how you want to persist data and how to access them. It also provides several persistence options like local, cloud, network file system - NFS, etc.
In this article, you will learn how to persist data in Kubernetes using abstractions like persistent volume, persistence volume claim, storage classes, and how each component is created and used for data persistence. You will also learn how to use config maps and secrets to define configuration files for your applications.
Persistent Volumes - PV
A persistent volume is a cluster resource that is used to store data. It can be created using a YAML file. It's an abstraction that needs actual physical storage like a local hard drive, NFS storage, cloud storage for persisting data. Storage in Kubernetes needs to be managed by an administrator as Kubernetes only provides the interface for storing data and doesn't manage them. You can have multiple storages configured for your cluster where one application uses local disk storage, NFS server, or cloud storage. It's also possible for one application to use different storage backends. The different storage options can be configured under the spec
section of the Persistent volume configuration file e.g
This is a sample persistent volume with nfs
as the storage backend
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-name
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: slow
mountOptions:
- hard
- nfsvers=4.0
nfs:
path: /dir/path/on/nfs/server
server: nfs-server-ip-address
An example of persistent volume with google cloud storage as the backend
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-volume
labels:
failure-domain.beta.kubernetes.io/zone: us-central1-a__us-central1-b
spec:
capacity:
storage: 400Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: my-data-disk
fsType: ext4
Depending on the storage type the spec
attribute of the YAML configuration will be different because it's specific to the storage type. Kubernetes supports over 25 storage backends for your persistent volumes. You can check them out here
Persistent volumes are not namespaced, which means that they are accessible to the whole cluster. Unlike other Kubernetes objects like pods, deployments, replicaset, etc. Persistent volumes can be accessed from anywhere in the cluster
Persistent Volume Claims - PVC
A persistent volume claim is a request for storage by a pod. A Kubernetes administrator creates a Persistent volume that access data using local or external storage. Then, the Persistent volume claim requests for resources from the Persistent Volume created by the administrator. The pod can now access the local or external storage by using the persistent volume claim. Storage will not be assigned when the requested storage exceeds available storage defined in the Persistent volume
For example, the PVC below is trying to claim 10Gi of storage from the Persistent volume in the cluster
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-name
spec:
storageClassName: manual
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
So the process involved in claiming storage from the Persistent Volume includes:
- Pod requests for volume through the Persistent Volume Claim
- The Persistent Volume Claim tries to find the Persistent Volume in the cluster that satisfies the requirements
- The Persistent Volume contains the actual resources and is only released when the claim satisfies the available resource requirements
It's important to note that, claims must exist in the same namespace as the pod. Once the pod finds the matching persistent volume through the persistent volume claim, the volume is then mounted into the pod
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: pvc-name
Storage Classes
Storage classes allow you to provision Persistent Volumes dynamically whenever a Persistent Volume Claim, claims it. Storage class can also be created using YAML configuration file e.g
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: storage-class-name
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "10"
fsType: ext4
The provisioner
attribute on the storage class configuration file is what is used to determine the persistent volume storage backend. Each storage backend in the Persistent Volume has its own provisioner
Storage classes are another abstraction level, that abstracts the underlying storage provider and parameters for that storage. They can then be used to provision Persistent Volumes dynamically as the situation demands. Storage classes are usually requested by a Persistent Volume Claim. You can think of the flow for claiming storage using the steps below:
- Pod claims storage via a Persistent Volume Claim
- Persistent Volume Claim requests storage from Storage Class
- Storage class creates Persistent Volume that satisfies the claim's requirement using provisioner from the actual storage backend
An example of how Persistent Volume Claim claims storage from the Storage class.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: storage-class-name
Config Maps
With config maps, you can store non-confidential data as key-value pairs which can be used as configurations files for your applications. These configuration files can be consumed by Pods as environment variables, files, command-line arguments, or as a volume.
It is important to note that Config maps do not provide encryption or secrecy of data. If you need to store confidential data you can consider using a Kubernetes secret
# Sample config map
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
special.how: very
---
apiVersion: v1
kind: ConfigMap
metadata:
name: env-config
namespace: default
data:
log_level: INFO
How the config map is consumed in the pod
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "/bin/sh", "-c", "env" ]
env:
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
name: special-config
key: special.how
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: env-config
key: log_level
restartPolicy: Never
There are several other configurations you can set when creating a config map like creating config maps from files, passing values value command-line arguments, etc. To read more about this, you can check here
Secrets
With Kubernetes secrets, you can store confidential information. It prevents you from hardcoding sensitive data into your code. Secrets are similar to config maps but differ in terms of data security. There are different kinds of secrets, namely:
- Opaque secrets
- Service account token secrets
- Docker config Secrets
- Basic authentication Secret
- SSH authentication secrets
- TLS secrets
- Bootstrap token Secrets
You can read more about each type of secret here
NB: It's important to note that anyone with access to your Kubernetes API or etcd
can safely modify secrets as they are stored unencrypted in the API Server. To prevent this you can consider
- Enabling Encryption at Rest for Secrets.
- Configuring Role-based access control (RBAC) rules to limit who can create and access secrets
Secrets can be used as files in a volume mounted on one or more containers. It can also be used as an environment variable.
An example of a secret
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
username: YWRtaW4=
password: MWYyZDFlMmU2N2Rm
The values in the data section should be in base64. To convert string to base64, you can run
echo -n 'admin' | base64
# YWRtaW4=
echo -n '1f2d1e2e67df' | base64
# MWYyZDFlMmU2N2Rm
Where admin
is the username and 1f2d1e2e67df
is the password.
To use the secret as environment variables in a pod
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myapp
image: ubuntu
env:
- name: USERNAME
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
Conclusion
Storage is one of the components of modern applications. Understanding how to configure different storage backends and how to access them is very crucial. Kubernetes provides several levels of abstractions like persistent volumes, persistent volume claims, storage classes, secrets, config maps for defining how data can be requested and used in your applications. Also, setting permissions and putting appropriate measures to prevent data breaches or intrusion is of utmost importance. You can read more on RBAC and Encrypting secrets at rest