Murat Celep

Nov 20, 2020

4 min read

Backup/Restore with Velero on vSphere

velero icon

Velero(formerly known as Heptio Ark) is arguably the most popular backup/restore solution for Kubernetes. It was created by Heptio and Velero continues to be actively developed as an open source project. Here is the github project and this is the official website.

In this blog post, we will present different options to backup/restore Kubernetes clusters running on vSphere with Velero and we will use S3 API based Object Store.

NOTE: All the files used in this article can be found on https://github.com/mcelep/blog/tree/master/velero-1

You need a S3 API compatible object storage to use Velero. If you have already have a AWS(Amazon Web services) account, you can use a S3 bucket from your AWS account or if you want to just ‘kick the tires’, you can deploy an open-source alternative such as MinIO on your Kubernetes cluster.

Other than a Kubernetes cluster access, you also need the following tools:

Velero is based on plugin design pattern and depending on what cloud your kubernetes runs on and what Object Store you use, right plugins need to be installed and configured. In Velero terminology, this cloud plugins are called providers, you can see the list of plugins and their supported features here.

We will focus on a couple of different provider combinations in this blog post. The Object Store we will use will always be S3 based as it’s the most common solution out there and we will use the AWS plugin for Object Store. For snapshotting volumes, we will talk about the following options:

  1. Restic: Restic is a popular open-source tool and because is not tied to a specific storage platform, it gives you some flexilibity to migrate data between different cloud providers. It has some limitations too though which you can read here
  2. vSphere plugin: vSphere has its own volume snapshot plugin: velero-plugin-for-vsphere. This plugin backups kubernetes persistent volumes to a S3 bucket.
  3. CSI VolumeSnapshots: Container Storage Interface (CSI) has been promoted to GA in the Kubernetes v1.13 release and features that rely on CSI are being added to Kubernetes. One such feature is called Volume Snaphots and this feature has been in beta state as of Kubernetes v1.17. In order to use this plugin, you have to make sure that CSI is configured correctly for storage provider of your kubernetes cluster e.g. if you use TKGI(Tanzu Kubernetes Grid Integrate), you can follow the steps explained here. As of 18.11.2020 CSI Volume Snapshots is not supported by vsphere-csi-driver; here is a relevant issue.

In this section, we will create a step-by-step tutorial to:

  • Install Velero with Restic enabled
  • Create a test application with a Persistent Volume
  • Create a backup of the application
  • Delete the application
  • Restore the application from the backup

We will install velero server side components into a namespace called velero, let’s create a new namespace:

kubectl create ns velero

Add your s3 Bucket access credential to creds.txt file. Replace the placeholders <aws_access_key_id> and <aws_secret_access_key> with actual values and create a new K8S secret with the content of the file:

kubectl -n velero create secret generic cloud-credentials --from-file=cloud=creds.txt

We need to opt-in for Restic installation in values.yaml with deployRestic: true and enable privileged mode to access Hostpath by using the following parameters: restic.podVolumePath and restic.privileged.

For configuring a S3 bucket, we use configuration.backupStorageLocation.bucket and configuration.backupStorageLocation.config.region parameters. Note that, we also use a parameter called configuration.backupStorageLocation.prefix. This parameter comes in handy, if we are to use the same S3 bucket for multiple clusters. With the help of the prefix, we can differentiate the cluster specific content easily, so it would make sense to use a prefix that clearly identifiers a Kubernetes cluster. Before executing the command below, make sure you replace the placeholders with the right values.

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm install velero vmware-tanzu/velero -f values.yaml \
-n velero --version 2.13.6 \
--set configuration.backupStorageLocation.bucket=<your-bucket> \
--set configuration.backupStorageLocation.config.region=<aws-region> \
--set configuration.backupStorageLocation.prefix=<some-prefix> \
--set restic.podVolumePath=/var/lib/kubelet/pods \
--set restic.privileged=true

(Note: for TKG hostpath should be /var/lib/kubelet/pods, where as for TKGI(formerly known as PKS) restic.podVolumePath value should read /var/vcap/data/kubelet/pods)

When using restic to backup you need to add annotations to your pods which specify the volumes to backup. See nginx-with-pv.yaml for an example. Here is annotation:

annotations:
backup.velero.io/backup-volumes: nginx-logs

Here are the steps to create a backup & restore from backup:

# Create application resources
kubectl apply -f example-app-with-pv.yaml
kubectl -n example-app get pods -w # wait till pod is running
# Write some data into persistent volume(PV)
kubectl -n example-app exec -it "$(kubectl get pods -n example-app -o name)" -- bash -c "echo 'I persisted' > /opt/my-pvc/hi.txt"
# Check if data has persisted into PV
kubectl -n example-app exec -it "$(kubectl get pods -n example-app -o name)" -- bash -c "cat /opt/my-pvc/hi.txt"
# Start velero backup
velero backup create backup1 --include-namespaces example-app --storage-location aws --snapshot-volumes
# Delete application
kubectl delete namespaces example-app
# Make sure PV is gone
kubectl get pv -A | grep my-pvc #check no pv
# Restore the latest backup
velero restore create --from-backup backup1
kubectl get pods -n example-app # wait till pod is running
# Check if data has been restored
kubectl -n example-app exec -it "$(kubectl get pods -n example-app -o name)" -- bash -c "cat /opt/my-pvc/hi.txt"
kubectl delete ns velero
kubectl delete ns example-app

We will cover vSphere plugin based velero configuration in a different blog post.

We will cover CSI VolumeSnapshots based velero configuration in a different blog post once it’s available.