Elasticsearch, Kubernetes, and Persistent Volumes
The Elasticsearch Operator on Kubernetes
Deploying and running high performance Elasticsearch-as-a-service is about to become much easier!
The Elastic Cloud on Kubernetes (ECK) operator is now in beta and already shows great promise in simplifying Elastic-as-a-service operations; deploying an Elasticsearch cluster on Kubernetes is easier than ever. The ECK operator simplifies the process of expanding, upgrading, and modifying existing clusters by allowing simple declarative changes to a cluster yaml file. Then, the operator automatically handles all the work to non-disruptively provision nodes, upgrade software, and migrate data, in order to reach the specified cluster state.
ECK clusters can use PersistentVolumeClaims for the storage backing the Elastic nodes, seamlessly connecting with storage provisioners like the Pure Service Orchestrator CSI plugin. The CSI plugin can provision storage on either FlashArray or FlashBlade based on a single yaml storageClassName field, getting access to the all-flash performance needed to run ingest and query workloads with predictable performance. Even better, a single PSO installation can manage a fleet of FlashArrays AND FlashBlades. Those FlashBlades can also be a snapshot repository for Elasticsearch to protect your indexed data.
A key change in the ECK beta release is the use of a StatefulSet based deployment. The PersistentVolumes are reused during an upgrade instead of migrating all data from each node during a rolling upgrade. The result is an upgrade process that takes minutes instead of hours or days, reducing risk and simplifying operations.
The goal of this blog post is to show how easy it is to use ECK and PSO, demonstrating the benefits of using PersistentVolumes during software upgrades. In a followup post, I will also show examples of how to use Elastic for monitoring Kubernetes, running the rally benchmark in Kubernetes, and indexed filesystem listings with the Rapidfile toolkit.
ECK: Deployment and Basic Interaction
Installation of both ECK and PSO are simple and they work together with no extra effort. For PSO, download and configure the Pure-backed storage using the CSI operator plugin.
For the ECK beta operator, follow the instructions here, where installation is as simple as:
> kubectl apply -f https://download.elastic.co/downloads/eck/1.0.0-beta1/all-in-one.yaml
And then deploy your first cluster using the following example yaml file:
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.5.0
nodeSets:
- count: 3
config:
node.master: true
node.data: true
node.ingest: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Ti
storageClassName: pure-file
Behind the scenes, the ECK operator automatically sets up and manages the statefulset for the Elasticsearch cluster.
> kubectl get statefulsets
NAME READY AGE
quickstart-es-all-nodes 3/3 48d
The storageClassName field highlighted in bold is the only part required to leverage a FlashBlade for each node’s storage. Simply switching from “pure-file” to “pure-block” uses a FlashArray instead.
If the “pure-file” storageClass is set as default, then even this line is not necessary. Changing the default storageclass can be done as follows, but be careful to have only one default storageclass selected:
> kubectl patch storageclass pure-file -p ‘{“metadata”: {“annotations”:{“storageclass.kubernetes.io/is-default-class”:”true”}}}’storageclass.storage.k8s.io/pure-file patched
Once deployed, check on the status of all Elasticsearch clusters:
> kubectl get elasticsearch --all-namespaces
NAMESPACE NAME HEALTH NODES VERSION PHASE AGE
kube-system quickstart green 3 7.5.0 Operational 3m31s
For Elasticsearch, you may need to change the max_map_count setting to a higher number on your Kubernetes hosts with a command like the following:
> sudo sysctl -w vm.max_map_count=262144
The above setting does not persist across restarts, so it is suggested to use your favorite method (e.g., using /etc/sysctl.conf) to ensure this is always set correctly.
Accessing the resulting Elasticsearch cluster can be either through a Kibana instance, also deployed via ECK, or via curl-based REST commands.
To issue commands to an Elasticsearch cluster from outside of Kubernetes, first retrieve the password from a Kubernetes secret and then send RPCs with curl:
> PASSWORD=$(kubectl get secret quickstart-es-elastic-user -o=jsonpath=’{.data.elastic}’ | base64 --decode)
>
> kubectl port-forward service/quickstart-es-http 9200
>
> curl -u “elastic:$PASSWORD” -k “https://localhost:9200/_cat/indices?v"
> curl -u “elastic:$PASSWORD” -k “https://localhost:9200/_cat/health?v"
> curl -u “elastic:$PASSWORD” -k “https://localhost:9200/_cat/nodes?v"
In the example above, I leverage port-forwarding to temporarily expose the Elasticsearch service outside of my Kubernetes cluster. For production systems, there are better options, like ingress controllers and load balancers, for ingesting documents from a non-Kubernetes source.
Personally, I prefer to run a standalone pod, called “terminal,” with a basic container image (ubuntu:18.04) that I can use to tunnel commands using “kubectl exec”. This pod spec is a trivial image that stays running and allows me to access Kubernetes similar to establishing a remote session via ssh:
> kubectl exec -it pod/terminal -- curl -u “elastic:$PASSWORD” -k “https://quickstart-es-http:9200/_cat/indices?v"
The operation above issues a single command via “kubectl exec” but I can also use “/bin/bash” for the equivalent of a remote session for debugging.
An example of useful command to run this way, other than health checks, would be to enable replicas for a particular index (nyc_taxis):
> kubectl exec -it pod/terminal -- curl -XPUT -u “elastic:$PASSWORD” -k “https://quickstart-es-http:9200/nyc_taxis/_settings?pretty" -H ‘Content-Type: application/json’ -d ‘{“number_of_replicas”:1}’
Enabling replicas after indexing all documents in an index is more efficient because it does not require re-indexing each document on each replica (computationally expensive work) and instead just copies the completed Lucene shards to the nodes hosting the replicas.
A more advanced operation is to deal with an index with too few shards (and therefore not enough concurrency). You can split the index by 1) first stopping writes and then 2) issuing a split command:
> kubectl exec -it pod/terminal -- curl -X PUT -u “elastic:$PASSWORD” -k “https://quickstart-es-http:9200/nyc_taxis/_settings?pretty" -H ‘Content-Type: application/json’ -d’{ “settings”: { “index.blocks.write”: true } }’
>
> kubectl exec -it pod/terminal -- curl -X POST -u “elastic:$PASSWORD” -k “https://quickstart-es-http:9200/nyc_taxis/_split/nyc_taxis_new?pretty" -H ‘Content-Type: application/json’ -d’{ “settings”: { “index.number_of_shards”: 40 } }’
There are some restrictions around splitting an index; the new number of shards needs to be a whole multiple of the original number of shards, e.g. from 10 to 40. The end result is a better distribution of data, especially useful if the node count has grown since the index was first created. Once the operation finishes, it is possible to reenable writes.
Next, let’s look at more advanced deployment scenarios with hot-warm architectures. The yaml below is a simple hot/warm Elastic cluster using a FlashBlade for persistent storage:
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.5.0
nodeSets:
- name: hot-nodes
count: 5
podTemplate:
spec:
nodeSelector:
env: prod
config:
node.attr.data: hot
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
storageClassName: pure-file
- name: warm-nodes
count: 5
podTemplate:
spec:
nodeSelector:
env: prod
config:
node.attr.data: warm
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Ti
storageClassName: pure-file
By using a tiered architecture, possibly also including a cold tier, you are able to select different resource profiles and volume sizes for nodes in different tiers. Mixing hot and warm nodes on the same or different hardware with affinity rules provides better flexibility to fully utilize resources. Not only can you select different volume sizes for storage, you can also choose different storage backends for each tier. As an example, by using labels, you could place the hot nodes on one FlashBlade (or FlashArray!) and the warm nodes on a different FlashBlade.
Finally, notice that the simplicity and ease of the ECK operator is available with the performance and scalability of Pure without any complexity; the CSI plugin and PersistentVolumeClaims work behind the scenes to combine the powers of both!
Next, I will measure one of the key benefits of the new ECK version: significantly faster software upgrades.
Results: Upgrading Software Version with ECK
The first beta release of the ECK Kubernetes operator handles software upgrades differently. The operator leverages PersistentVolumes and shared storage to simplify the upgrade process while maintaining continuous availability.
Or, this is why the combo of Kubernetes operators and shared storage makes administering Elastic easy!
In previous versions of Elastic Cloud as well as traditional bare-metal Elasticsearch clusters, data shards will be migrated to new nodes one at a time during a rolling upgrade. This legacy upgrade strategy is common amongst most DAS applications, not just Elasticsearch. Instead, the ECK operator upgrades nodes in a rolling upgrade fashion but without migrations, preserving the PersistentVolume for the upgraded node. The result is a much faster upgrade.
To demonstrate the impact of this change, I measured overall upgrade time for 4 different Elasticsearch clusters, varying both node count and total index size. On the largest test, the upgrade with ECK/PSO is 17x faster than the legacy DAS upgrade strategy.
In the chart below, the upgrade timings are compared with two different sized clusters (5 nodes and 10 nodes) and different amounts of indexed data (75GB or 150GB). In all cases, the software upgrade is from 7.2.0 to 7.2.1.
In the traditional DAS strategy, nodes are upgrade one-at-a-time, with a full migration happening for each node. Therefore, the total upgrade time is a function of both the per-node upgrade as well as the per-node migration time, with the latter dominating. In contrast, ECK relies on the PersistentVolume to protect data in case of a failed upgrade, so the total cluster upgrade time is a function only of the time taken to upgrade the software on each node.
In fact, for the ECK/PSO setup, the upgrade time is a linear function in node count and not related to the size of the indices in the cluster. Conversely, the traditional DAS upgrade time is dominated by the total shard sizes as this dictates the time it takes to migrate each node’s data.
The reason ECK is able to improve upgrades so much is that the backing store for the PersistentVolume ensures the shard node data is protected even if the compute fails, thus removing the need for a migration per-node.
The command I used to track upgrade progress was:
> while true; do echo “$(date) $(kubectl get elasticsearch quickstart | grep quickstart)” | tee -a esupgrade.log; sleep 1; done
Note that when doing software upgrades with ECE (the non-Kubernetes Elastic Cloud), the normal case is better than the results above. This is because the default upgrade behavior is to double the cluster size (e.g., from 7 nodes to 14 nodes) in order to migrate all shards once and in parallel. This results in a much faster upgrade time on DAS as long as there is sufficient headroom in the infrastructure. In contrast, using the PersistentVolumes in ECK means that clusters do not have to be provisioned 2x to afford this headroom.
Summary
Operating Elastic-as-a-service is dramatically simpler with the combination of Elastic Cloud for Kubernetes (ECK) and Pure Service Orchestrator (PSO). Now, you can easily administer many Elasticsearch clusters, all backed by high-performance all-flash storage, to meet the performance demands of multiple ingest and query workloads.
Part two of this blog dives into three example uses for ECK.