Kubernetes: Handling Persistent Storage in StatefulSets

Updated: February 1, 2024 By: Guest Contributor Post a comment

Introduction

Kubernetes, also known as K8s, is a powerful orchestration tool for managing containerized applications at scale. When dealing with stateful applications that require persistent storage, such as databases, Kubernetes offers StatefulSets as a resource tailored for this purpose.

This tutorial delves into the nuances of handling persistent storage in StatefulSets within Kubernetes. We’ll explore the key concepts, steps, and best practices to equip you with the knowledge needed to manage stateful workloads effectively.

Understanding StatefulSets

StatefulSets are the Kubernetes resource of choice when needing stable, persistent identification for each Pod in a set. Unlike Deployments, which are stateless by nature, pods created by StatefulSets have a unique identifier that is maintained across rescheduling.

When you use a StatefulSet, each Pod has a sticky identity which comprises:

  • Stable hostname, derived from the StatefulSet’s name and Pod’s ordinal index
  • Persistent storage, attached to each Pod based on PersistentVolumeClaims (PVCs)
  • Stable network identity

Prerequisites

Before diving into StatefulSets, ensure you have:

  • A working Kubernetes cluster
  • kubectl command-line tool installed and configured

Working with Persistent Storage in StatefulSets

To handle persistent storage in StatefulSets, you’ll work with the following primary components:

  • PersistentVolume (PV): Representing a piece of storage that has been provisioned manually or via dynamic provisioning.
  • PersistentVolumeClaim (PVC): The request for storage by a user, which can bind to a PV.
  • VolumeClaimTemplates: Part of a StatefulSet manifest used to generate PVCs for each Pod automatically.

Creating a StatefulSet with Persistent Storage

Here’s an example manifest for a StatefulSet with persistent storage:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 1Gi

This manifest sets up a simple web application using nginx. Notice the volumeClaimTemplates section, which automates the creation of PVCs for each Pod.

Handling Storage with VolumeClaimTemplates

Using VolumeClaimTemplates is straightforward:

  1. The metadata.name in the VolumeClaimTemplate defines the PVC name prefix for each Pod.
  2. Each Pod gets its own PVC, named with the prefix followed by the Pod’s ordinal index.
  3. The spec details the requested storage specifications as it would in any regular PVC.

When a StatefulSet is scaled up, new PVCs are created for the new Pods. Scaling down, however, does not delete them, enabling manual recovery or data retention.

Dynamic Volume Provisioning

Dynamic provisioning allows for automatic PV creation to fulfill PVC requests. Here’s an example of a StorageClass to define dynamic provisioning for your cluster:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zones: us-west-2a, us-west-2b
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  - debug
volumeBindingMode: Immediate

Remember to match the storageClassName in your VolumeClaimTemplate with this user-defined storage class for it to be used.

Best Practices for StatefulSets

Incorporate these best practices:

  • Use stable storage like EBS or NFS that can be reattached if a Pod moves to a different node.
  • Although scaling down does not delete PVCs, always have a data backup strategy.
  • Take advantage of statefulset.kubernetes.io/pod-name as the label selector for service endpoints corresponding to each Pod.

Conclusion

In this guide, we have discussed how Kubernetes handles persistent storage in StatefulSets. We’ve covered the creation and management of storage with VolumeClaimTemplates, dynamic provisioning, and iterated over best practices. Now, equipped with this knowledge, you can confidently manage stateful applications in Kubernetes.

For further learning, ensure to dive into more advanced topics such as StatefulSets update strategies, handling Pod affinity, and exploring monitoring and logging for stateful apps in a Kubernetes cluster.