Stateful Services on Kubernetes, and How to Use Them


Stateful workloads, such as those that rely on a database, are highly facilitated in Kubernetes; like Deployment resources, there are reconciliation loops that wrap Pods for services like Postgres and MongoDB, that are commonly clustered, and require consistent, ordinal naming, and high-levels of automation for provisioning and locating volumes for your persistent data in these volumes.

An example I’d like to share is based on the StatefulSet resource, which like a Deployment, runs a given number of replicas, has a high level of control over resource consumption, and full access to the interfaces for storage, but also provides consistent naming for these replicas, so as in MongoDB ReplicaSets, you can anticipate the name of the adjacent replica node, and it makes spinning up a cluster of a known size easy to define in however your manifests are generated and stored (i.e. YAML generated by a CI/CD system or config management tooling).

Let’s say I have a MongoDB-backed service, and my requirement is that I be able to flexibly scale the size of the replicaSet, but at minimum I’d have 3 nodes. Each of these nodes would need a PersistentVolume, so when these Pods are recycled, I can reattach the volume, so I start by defining my PersistentVolume, in this case from my provider’s storage class:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mongo-db-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: csi-packet-standard
  resources:
    requests:
      storage: 100Gi

and if using a Deployment, I reference this PersistentVolume in my volumeMounts:

...
    spec:
      containers:
      - image: mongo
        name: mongo
        volumeMounts:
          - mountPath: /data/db
            name: mypvc
      volumes:
      - name: mongo-db-data-volume
        persistentVolumeClaim:
          claimName: mongo-db-data
          readOnly: false
...

In a StatefulSet, however, I can make this PersistentVolumeClaim inline of my definition, in order to treat it as a template for the StatefulSet’s spin-up as it scales, recycles, etc.:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  selector:
    matchLabels:
      app: mongo 
  serviceName: "mongo"
  replicas: 3
  template:
    metadata:
      labels:
        app: mongo 
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mongodb
        image: mongo:3.2
        ports:
        - containerPort: 27017
          name: web
        command: ["mongod"]
        args: ["--replSet","rs0","--smallfiles","--oplogSize","128","--storageEngine=mmapv1"]
        volumeMounts:
        - name: mongo-persistent-storage
          mountPath: /var/lib/mongodb
  volumeClaimTemplates:
  - name: mongo-persistent-storage
    annotations:
      volume.beta.kubernetes.io/storage-class: "csi-packet-standard"
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

Where, much like in our Deployment we reference the claim name, we’re creating a template for volume claims, and in our volumeMount requesting, in this case, mongo-persistent-storage, and each of these nodes is deployed with nametags like mongo-[1..Xreplicas] rather than a random Pod ID as you might see in other replication controllers, within the namespace you deployed this StatefulSet to.

As a result, apps consuming this MongoDB service, would use a resulting connection string like:

mongodb://mongo-service-0.default,mongo-service-1.default,mongo-service-2.mongo:27017

My use case for MongoDB was RocketChat, so I created a Deployment that takes a MongoDB connection string as a variable:

---
kind: Service
apiVersion: v1
metadata:
  name: rocketchat-server-service
spec:
  selector:
    app: rocketchat-server
  ports:
  - protocol: TCP
    port: 443
    targetPort: 3000
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rocketchat-server-deployment
  labels:
    app: rocketchat-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rocketchat-server
  template:
    metadata:
      labels:
        app: rocketchat-server
    spec:
      containers:
      - name: rocketchat-server
        image: rocketchat/rocket.chat:latest
        env:
          - name: PORT
            value: "3000"
          - name: ROOT_URL
            value: "[http://localhost:3000](http://localhost:3000)"
          - name: MONGO_URL
            value: "mongodb://mongo-service-0.default,mongo-service-1.default,mongo-service-2.mongo:27017/rocketchat" 
          - name: MONGO_OPLOG_URL
            value: "mongodb://mongo-service-0.default,mongo-service-1.default,mongo-service-2.mongo:27017/local"
        ports:
        - containerPort: 3000

where MONGO_URL and MONGO_OPLOG_URL both refer to this replica set, and resolve it using the KubeDNS entry created with your StatefulSet-controlled MongoDB nodes, and it stores users, messages, and logging activity into MongoDB.