Log Analytics Pipelines as-a-Service

  • More efficient resource usage by avoiding deploying extra nodes just to increase storage and no longer needing full replicas for data protection.
  • Faster failure handling by making pods (brokers or data nodes) near-stateless. With small, bounded amounts of storage attached to a pod, rebalance operations are orders of magnitude faster.
  • Support fast historical searches with the predictable all-flash performance of FlashBlade.
Confluent Kafka and Elasticsearch PersistentVolumes orchestrated by Portworx while also using S3 buckets for long-term shared storage. The PortWorx storage can be backed by local drives, FlashArray volumes, or FlashBlade NFS.
  • Log analytics as a service, so each team and project can create and operate independently with just the resources they need. The alternative is custom infrastructure silos for each team, all configured and managed slightly differently.
  • Easily scale up or down cluster resources (compute or storage) as needed and in a self-service manner
  • Modify resource requirements without changing hardware, e.g., more compute for one cluster and less storage for another
  • Run multiple heterogeneous clusters on a shared hardware pool

How Shared Storage Simplifies aaS Log Pipelines

Object Storage

PersistentVolume Dynamic Provisioning

  • Provisioning of storage is decoupled from CPU and RAM, meaning that Kubernetes can schedule pods only considering CPU and RAM without introducing an additional constraint.
  • Pod and node failure recovery is orders of magnitude faster because Kubernetes will restart a failed pod on a different node while reattaching to the same remote volume, thus avoiding expensive rebalances.
  • Volumes can be dynamically grown as needed without the restrictions of physical drives and drive bays.

Log Pipeline Components

Prerequisites

FlashBlade Configuration

Flog: Synthetic Log Generator

> docker run -it --rm mingrammer/flog

137.97.114.3 — — [27/Aug/2020:19:50:11 +0000] “HEAD /brand HTTP/1.1” 416 16820
252.219.8.157 — — [27/Aug/2020:19:50:11 +0000] “PUT /maximize/synergize HTTP/1.0” 501 4208

Confluent Kafka

FileBeats

Elasticsearch

Example values.yaml File

flashblade:
datavip: "10.62.64.200"
mgmtvip: "10.62.64.20"
token: "T-XXXXXX-YYYYY-ZZZZ-QQQQQ-813e5a9c1222"
zookeeper:
storageclass: "px-nvme"
kafka:
cpVersion: 6.1.1
storageclass: "px-nvme"
nodecount: 4
elasticsearch:
nodecount: 6
version: 7.12.1
storageclass: "pure-file"
beats:
nodecount: 12
flog:
nodecount: 1

How to Adapt For Your Use Case

  • Disable the flog generators and replace with real data sources sent to a topic in Kafka
  • Edit the filebeats configmap and change the “topics” setting to reflect your real topic(s)
  • Edit the node counts in values.yaml to achieve the needed indexing performance
  • Modify the snapshot policy (SLM) in post-install-es-snaps.yaml to meet your protection/recovery requirements

Storage Usage Visualized

Write (orange) and Read (blue) performance for Elasticsearch ingest
Write spikes to S3 as indices are moved to the Frozen Tier

Conclusion

--

--

--

Data science, software engineering, hacking

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How I started out with open source…

The Red Fedora

How I Connected with 10,000 people over a couple of weeks

My Journey as a Front-End Developer Intern

Containerizing — How to make your mongodb database as docker container

Top Game Programming Languages

Choreo observability for improved developer productivity

READ/DOWNLOAD%< Oracle Certified Professional Java SE 8 Programmer Exam 1Z0–809: A Comprehensive…

Changing system unit dependencies from Grub’s kernel command line

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joshua Robinson

Joshua Robinson

Data science, software engineering, hacking

More from Medium

This is an Apache DolphinScheduler E2E Automation Test Tutorial, welcome to join in the testing!

Export JMX metrics from Confluent KSQL to Datadog

Create and Browse Reusable Datasets in Your Private S3 Buckets with quilt3

How to deploy Apache Zeppelin in K8s with S3 support