bitnami/dataplatform-bp1 (#5216)

* initial version of dataplaform blueprint1 * updated wavefront param * updated values and added readme * updated README * updated values and README * removed unnecessary parameters * Update Chart.yaml and values.yaml * Use templates for externalZookeeper * Update Zookeeper hosts * updated README with parameters and description * updated cluster requirements in cluster * updated cluster requirements in cluster * updated Parameters section * updated wavefront instructions * updated zookeeper host * Fix Zookeeper service reference * removed wavefront references for first release * updated verbiage for resources * Update README.md Test change * updated parameters section * updated values and install command * updated comments * added notes * added in detail notes for all sub-charts * Set on values-metrics only the not default params * Beautify README.md * Point Solr to the bitnami registry * Add several modifications * Delete wavefront from the NOTES * Fix indentation issues * Fix indentation error * Addres changes * Update bitnami/dataplatform-bp1/README.md Co-authored-by: Carlos Rodríguez Hernández <carrodher1179@gmail.com> * Update README.md Co-authored-by: Miguel A. Cabrera Minagorri <macabrera@bitnami.com> Co-authored-by: hvanderl <hvanderl@users.noreply.github.com> Co-authored-by: Miguel Ángel Cabrera Miñagorri <devgorri@gmail.com> Co-authored-by: Carlos Rodríguez Hernández <carrodher1179@gmail.com> Co-authored-by: Carlos Rodríguez Hernández <carlosrh@vmware.com>
2026-03-10 15:07:49 +08:00 · 2021-02-15 20:43:37 +05:30
parent e16aa93a07
commit 6d298ec8e6
5 changed files with 484 additions and 0 deletions
--- a/bitnami/dataplatform-bp1/Chart.lock
+++ b/bitnami/dataplatform-bp1/Chart.lock
@@ -0,0 +1,18 @@
+dependencies:
+- name: kafka
+  repository: https://charts.bitnami.com/bitnami
+  version: 12.7.5
+- name: spark
+  repository: https://charts.bitnami.com/bitnami
+  version: 5.1.2
+- name: solr
+  repository: https://charts.bitnami.com/bitnami
+  version: 0.1.0
+- name: zookeeper
+  repository: https://charts.bitnami.com/bitnami
+  version: 6.4.0
+- name: common
+  repository: https://charts.bitnami.com/bitnami
+  version: 1.3.9
+digest: sha256:8326e46b3915a947b127b901f8de190dfa55fd36a9c5d3723b38971f5b35c2e3
+generated: "2021-02-11T11:47:44.576918765Z"
--- a/bitnami/dataplatform-bp1/Chart.yaml
+++ b/bitnami/dataplatform-bp1/Chart.yaml
@@ -0,0 +1,50 @@
+annotations:
+  category: Infrastructure
+apiVersion: v2
+appVersion: 1.0.0
+dependencies:
+  - condition: kafka.enabled
+    name: kafka
+    repository: https://charts.bitnami.com/bitnami
+    version: 12.x.x
+  - condition: spark.enabled
+    name: spark
+    repository: https://charts.bitnami.com/bitnami
+    version: 5.x.x
+  - condition: solr.enabled
+    name: solr
+    repository: https://charts.bitnami.com/bitnami
+    version: 0.x.x
+  - condition: zookeeper.enabled
+    name: zookeeper
+    repository: https://charts.bitnami.com/bitnami
+    version: 6.x.x
+  - name: common
+    repository: https://charts.bitnami.com/bitnami
+    tags:
+      - bitnami-common
+    version: 1.x.x
+description: OCTO Data platform Kafka-Spark-Solr Helm Chart
+engine: gotpl
+home: https://github.com/bitnami/charts/tree/master/bitnami/dataplatform
+keywords:
+  - dataplatform
+  - kafka
+  - spark
+  - solr
+  - zookeeper
+  - apache
+maintainers:
+  - email: containers@bitnami.com
+    name: Bitnami
+name: dataplatform
+sources:
+  - https://github.com/bitnami/bitnami-docker-kafka
+  - https://kafka.apache.org/
+  - https://github.com/bitnami/bitnami-docker-spark
+  - https://spark.apache.org/
+  - https://lucene.apache.org/solr/
+  - https://github.com/bitnami/bitnami-docker-solr
+  - https://zookeeper.apache.org/
+  - https://github.com/bitnami/bitnami-docker-zookeeper
+version: 0.1.0
--- a/bitnami/dataplatform-bp1/README.md
+++ b/bitnami/dataplatform-bp1/README.md
@@ -0,0 +1,193 @@
+# Data Platform Blueprint 1 with Kafka-Spark-Solr
+
+Enterprise applications increasingly rely on large amounts of data, that needs be distributed, processed, and stored.
+Open source and commercial supported software stacks are available to implement a data platform, that can offer
+common data management services, accelerating the development and deployment of data hungry business applications.
+
+This Helm chart enables the fully automated Kubernetes deployment of such multi-stack data platform, covering the following software components:
+
+-   Apache Kafka – Data distribution bus with buffering capabilities
+-   Apache Spark – In-memory data analytics
+-   Solr – Data persistence and search
+
+These containerized stateful software stacks are deployed in multi-node cluster configurations, which is defined by the
+Helm chart blueprint for this data platform deployment, covering:
+
+-   Pod placement rules – Affinity rules to ensure placement diversity to prevent single point of failures and optimize load distribution
+-   Pod resource sizing rules – Optimized Pod and JVM sizing settings for optimal performance and efficient resource usage
+-   Default settings to ensure Pod access security
+
+In addition to the Pod resource optimizations, this blueprint is validated and tested to provide Kubernetes node count and sizing recommendations [(see Kubernetes Cluster Requirements)](#kubernetes-cluster-requirements) to facilitate cloud platform capacity planning. The goal is optimize the number of required Kubernetes nodes in order to optimize server resource usage and, at the same time, ensuring runtime and resource diversity.
+
+The first release of this blueprint defines a small size data platform deployment, deployed on 3 Kubernetes application nodes with physical diverse underlying server infrastructure.
+
+Use cases for this small size data platform setup include: data and application evaluation, development, and functional testing.
+
+## TL;DR
+
+```console
+$ helm repo add bitnami https://charts.bitnami.com/bitnami
+$ helm install my-release bitnami/dataplatform-bp1
+```
+
+## Introduction
+
+This chart bootstraps Data Platform Blueprint-1 deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
+
+The "Small" size data platform in default configuration deploys the following:
+1. Zookeeper with 3 nodes to be used for both Kafka and Solr
+2. Kafka with 3 nodes using the zookeeper deployed above
+3. Solr with 2 nodes using the zookeeper deployed above
+4. Spark with 1 Master and 2 worker nodes
+
+Bitnami charts can be used with [Kubeapps](https://kubeapps.com/) for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of [Bitnami Kubernetes Production Runtime](https://kubeprod.io/) (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.
+
+## Prerequisites
+
+- Kubernetes 1.12+
+- Helm 3.1.0
+- PV provisioner support in the underlying infrastructure
+
+## Kubernetes Cluster requirements
+
+Below are the minimum Kubernetes Cluster requirements for "Small" size data platform:
+
+| Data Platform Size | Kubernetes Cluster Size                                                      | Usage                                                                       |
+|:-------------------|:-----------------------------------------------------------------------------|:----------------------------------------------------------------------------|
+| Small              | 1 Master Node (2 CPU, 4Gi Memory) <br /> 3 Worker Nodes (4 CPU, 32Gi Memory) | Data and application evaluation, development, and functional testing <br /> |
+
+## Installing the Chart
+
+To install the chart with the release name `my-release`:
+
+```console
+$ helm repo add bitnami https://charts.bitnami.com/bitnami
+$ helm install my-release bitnami/dataplatform-bp1
+```
+
+These commands deploy Data Platform on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) section lists recommended configurations of the parameters to bring up an optimal and resilient data platform. Please refer the individual charts for the remaining set of configurable parameters.
+
+> **Tip**: List all releases using `helm list`
+
+## Uninstalling the Chart
+
+To uninstall/delete the `my-release` deployment:
+
+```console
+$ helm delete my-release
+```
+
+The command removes all the Kubernetes components associated with the chart and deletes the release.
+
+## Parameters
+
+The following tables lists the recommended configurations for each application used in the data platform. If you need to configure any other parameters apart from the ones mentioned below, you can refer to the corresponding chart and update the values.yaml accordingly.
+
+### Global parameters
+
+| Parameter                 | Description                                     | Default                                                 |
+|:--------------------------|:------------------------------------------------|:--------------------------------------------------------|
+| `global.imageRegistry`    | Global Docker image registry                    | `nil`                                                   |
+| `global.imagePullSecrets` | Global Docker registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
+| `global.storageClass`     | Global storage class for dynamic provisioning   | `nil`                                                   |
+
+### Zookeeper chart parameters
+
+Parameters below are set as per the recommended values, they can be overwritten if required.
+
+| Parameter                      | Description                                                                     | Default                                                                              |
+|:-------------------------------|:--------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+| `zookeeper.enabled`            | Switch to enable or disable the Zookeeper helm chart                            | `true`                                                                               |
+| `zookeeper.replicaCount`       | Number of Zookeeper nodes                                                       | `3`                                                                                  |
+| `zookeeper.heap`               | Zookeepers's Java Heap size                                                     | Zookeeper Java Heap size set for optimal resource usage                              |
+| `zookeeper.resources.limits`   | The resources limits for Zookeeper containers                                   | `{}`                                                                                 |
+| `zookeeper.resources.requests` | The requested resources for Zookeeper containers for a small kubernetes cluster | Zookeeper pods Resource requests for optimal resource usage size                     |
+| `zookeeper.affinity`           | Affinity for pod assignment                                                     | Zookeeper pods Affinity rules for best possible resiliency (evaluated as a template) |
+
+### Kafka chart parameters
+
+Parameters below are set as per the recommended values, they can be overwritten if required.
+
+| Parameter                   | Description                                                                 | Default                                                                              |
+|:----------------------------|:----------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+| `kafka.enabled`             | Switch to enable or disable the Kafka helm chart                            | `true`                                                                               |
+| `kafka.replicaCount`        | Number of Kafka nodes                                                       | `3`                                                                                  |
+| `kafka.heapOpts`            | Kafka's Java Heap size                                                      | Kafka Java Heap size set for optimal resource usage                                  |
+| `kafka.resources.limits`    | The resources limits for Kafka containers                                   | `{}`                                                                                 |
+| `kafka.resources.requests`  | The requested resources for Kafka containers for a small kubernetes cluster | Kafka pods Resource requests set for optimal resource usage                          |
+| `kafka.affinity`            | Affinity for pod assignment                                                 | Kafka pods affinity rules set for best possible resiliency (evaluated as a template) |
+| `kafka.zookeeper.enabled`   | Switch to enable or disable the Zookeeper helm chart                        | `false` Common Zookeeper deployment used for kafka and solr                          |
+| `kafka.externalZookeeper.servers` | Server or list of external Zookeeper servers to use                         | Zookeeper installed as a subchart to be used                                         |
+
+### Solr chart parameters
+
+Parameters below are set as per the recommended values, they can be overwritten if required.
+
+| Parameter                        | Description                                         | Default                                                                             |
+|:---------------------------------|:----------------------------------------------------|:------------------------------------------------------------------------------------|
+| `solr.enabled`                   | Switch to enable or disable the Solr helm chart     | `true`                                                                              |
+| `solr.replicaCount`              | Number of Solr nodes                                | `2`                                                                                 |
+| `solr.authentication.enabled`    | Enable Solr authentication                          | `true`                                                                              |
+| `solr.resources.limits`          | The resources limits for Solr containers            | `{}`                                                                                |
+| `solr.resources.requests`        | The requested resources for Solr containers         | Solr pods resource requests set for optimal resource usage                          |
+| `solr.javaMem`                   | Java memory options to pass to the Solr container   | Solr Java Heap size set for optimal resource usage                                  |
+| `solr.heap`                      | Java Heap options to pass to the solr container     | `nil`                                                                               |
+| `solr.affinity`                  | Affinity for Solr pods assignment                   | Solr pods Affinity rules set for best possible resiliency (evaluated as a template) |
+| `solr.zookeeper.enabled`         | Enable Zookeeper deployment. Needed for Solr cloud. | `false` common zookeeper used between kafka and solr                                |
+| `solr.externalZookeeper.servers` | Servers for an already existing Zookeeper.          | Zookeeper installed as a subchart to be used                                        |
+
+### Spark chart parameters
+
+Parameters below are set as per the recommended values, they can be overwritten if required.
+
+| Parameter                   | Description                                      | Default                                                                                     |
+|:----------------------------|:-------------------------------------------------|:--------------------------------------------------------------------------------------------|
+| `spark.enabled`             | Switch to enable or disable the Spark helm chart | `true`                                                                                      |
+| `spark.master.affinity`     | Spark master affinity for pod assignment         | Spark master pod Affinity rules set for best possible resiliency (evaluated as a template)  |
+| `spark.master.resources`    | CPU/Memory resource requests/limits for Master   | Spark master pods resource requests set for optimal resource usage                          |
+| `spark.worker.javaOptions`  | Set options for the JVM in the form `-Dx=y`      | No default                                                                                  |
+| `spark.worker.replicaCount` | Set the number of workers                        | `2`                                                                                         |
+| `spark.worker.affinity`     | Spark worker affinity for pod assignment         | Spark worker pods Affinity rules set for best possible resiliency (evaluated as a template) |
+| `spark.worker.resources`    | CPU/Memory resource requests/limits for worker   | Spark worker pods resource requests set for optimal resource usage                          |
+
+Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
+
+```console
+$ helm install my-release \
+  --set kafka.replicaCount=3 \
+  bitnami/dataplatform-bp1
+```
+
+The above command deploys the data platform with Kafka with 3 nodes (replicas).
+
+In case you need to deploy the data platform skipping any component, you can specify the 'enabled' parameter using the `--set <component>.enabled=false` argument to `helm install`. For Example,
+
+```console
+$ helm install my-release \
+  --set solr.enabled=false \
+  bitnami/dataplatform-bp1
+```
+
+The above command deploys the data platform without Solr.
+
+Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
+
+```console
+$ helm install my-release -f values.yaml bitnami/dataplatform-bp1
+```
+
+> **Tip**: You can use the default [values.yaml](values.yaml)
+
+## Configuration and installation details
+
+### [Rolling VS Immutable tags](https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/)
+
+It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
+
+Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
+
+## Troubleshooting
+
+Find more information about how to deal with common errors related to Bitnami’s Helm charts in [this troubleshooting guide](https://docs.bitnami.com/general/how-to/troubleshoot-helm-chart-issues).
+
+In order to render complete information about the deployment including all the sub-charts, please use --render-subchart-notes flag while installing the chart.
--- a/bitnami/dataplatform-bp1/templates/NOTES.txt
+++ b/bitnami/dataplatform-bp1/templates/NOTES.txt
@@ -0,0 +1,51 @@
+** Data Platform Blueprint 1 is being deployed, it could take some time to be ready **
+
+The following components are being deployed to your cluster:
+
+{{- if .Values.kafka.enabled }}
+
+***********
+** Kafka **
+***********
+
+To access the Kafka service from your local machine execute the following:
+
+   kubectl port-forward --namespace {{ .Release.Namespace }} svc/kafka-{{ include "common.names.fullname" . }} 9092:9092 &
+   echo "Kafka service available at : http://127.0.0.1:9092"
+{{- end -}}
+
+{{- if .Values.solr.enabled }}
+
+**********
+** Solr **
+**********
+
+To access the Solr service from your local machine execute the following:
+
+   kubectl port-forward --namespace {{ .Release.Namespace }} svc/solr-{{ include "common.names.fullname" . }} 8983:8983 &
+   echo "Solr service available at : http://127.0.0.1:8983"
+{{- end -}}
+
+{{- if .Values.spark.enabled }}
+
+***********
+** Spark **
+***********
+
+To access the Spark service from your local machine execute the following:
+
+   kubectl port-forward --namespace {{ .Release.Namespace }} svc/spark-{{ include "common.names.fullname" . }} 8080:80 &
+   echo "Spark service available at : http://127.0.0.1:8080"
+{{- end -}}
+
+{{- if .Values.zookeeper.enabled }}
+
+***************
+** Zookeeper **
+***************
+
+To access the Zookeeper service from your local machine execute the following:
+
+   kubectl port-forward --namespace {{ .Release.Namespace }} svc/zookeeper-{{ include "common.names.fullname" . }} 2181:2181 &
+   echo "Zookeeper service available at : http://127.0.0.1:2181"
+{{- end -}}
--- a/bitnami/dataplatform-bp1/values.yaml
+++ b/bitnami/dataplatform-bp1/values.yaml
@@ -0,0 +1,172 @@
+## Global Docker image parameters
+## Please, note that this will override the image parameters, including dependencies, configured to use the global value
+## Current available global Docker image parameters: imageRegistry and imagePullSecrets
+##
+# global:
+#   imageRegistry: myRegistryName
+#   imagePullSecrets:
+#     - myRegistryKeySecretName
+#   storageClass: myStorageClass
+
+zookeeper:
+  replicaCount: 3
+  ## Size in MB for the Java Heap options (Xmx and XMs). This env var is ignored if Xmx an Xms are configured via JVMFLAGS
+  ##
+  heapSize: 4096
+  resources:
+    ## Recommended values for cpu and memory requests
+    ##
+    limits: {}
+    requests:
+      cpu: 250m
+      memory: 5120Mi
+  ## Anti Affinity rules set for resiliency
+  ##
+  affinity:
+    podAntiAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        - labelSelector:
+            matchExpressions:
+              - key: app.kubernetes.io/name
+                operator: In
+                values:
+                  - zookeeper
+          topologyKey: "kubernetes.io/hostname"
+
+## Kafka Subchart parameters
+##
+kafka:
+  enabled: true
+  replicaCount: 3
+  ## Kafka's Java Heap size
+  ##
+  heapOpts: -Xmx4096m -Xms4096m
+  resources:
+    ## Recommended values for cpu and memory requests
+    ##
+    limits: {}
+    requests:
+      cpu: 250m
+      memory: 5120Mi
+  ## Anti Affinity rules set for resiliency and Affinity rules set for optimal performance
+  ##
+  affinity:
+    podAntiAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        - labelSelector:
+            matchExpressions:
+              - key: app.kubernetes.io/component
+                operator: In
+                values:
+                  - kafka
+          topologyKey: "kubernetes.io/hostname"
+    podAffinity:
+      preferredDuringSchedulingIgnoredDuringExecution:
+        - weight: 50
+          podAffinityTerm:
+            labelSelector:
+              matchExpressions:
+                - key: app.kubernetes.io/name
+                  operator: In
+                  values:
+                    - zookeeper
+            topologyKey: "kubernetes.io/hostname"
+  zookeeper:
+    enabled: false
+  ## This value is only used when zookeeper.enabled is set to false.
+  ##
+  externalZookeeper:
+    ## Server or list of external zookeeper servers to use. This is set to the zookeeper deployed as part of this chart
+    ##
+    servers:
+      - '{{ .Release.Name }}-zookeeper'
+
+## Spark Subchart parameters
+##
+spark:
+  enabled: true
+  ## Spark master specific configuration
+  ##
+  master:
+    resources:
+      ## Recommended values for cpu and memory requests
+      ##
+      limits: {}
+      requests:
+        cpu: 250m
+        memory: 5120Mi
+    ## Anti affinity rules set for resiliency
+    ##
+    affinity:
+      podAntiAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          - labelSelector:
+              matchExpressions:
+                - key: app.kubernetes.io/component
+                  operator: In
+                  values:
+                    - worker
+            topologyKey: "kubernetes.io/hostname"
+  ## Spark worker specific configuration
+  ##
+  worker:
+    replicaCount: 2
+    ## Anti affinity rules set for resiliency
+    ##
+    affinity:
+      podAntiAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          - labelSelector:
+              matchExpressions:
+                - key: app.kubernetes.io/component
+                  operator: In
+                  values:
+                    - worker
+                    - master
+            topologyKey: "kubernetes.io/hostname"
+    resources:
+      ## Recommended values for cpu and memory requests
+      ##
+      limits: {}
+      requests:
+        cpu: 250m
+        memory: 5120Mi
+
+## Solr Subchart parameters
+##
+solr:
+  enabled: true
+  replicaCount: 2
+
+  ## Java memory options recommended value
+  ##
+  javaMem: -Xmx4096m -Xms4096m
+
+  ## Anti affinity rules set for resiliency
+  ##
+  affinity:
+    podAntiAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+        - labelSelector:
+            matchExpressions:
+              - key: app.kubernetes.io/component
+                operator: In
+                values:
+                  - solr
+          topologyKey: "kubernetes.io/hostname"
+  resources:
+    ## Recommended values for cpu and memory requests
+    ##
+    limits: {}
+    requests:
+      cpu: 250m
+      memory: 5120Mi
+  zookeeper:
+    enabled: false
+  ## This value is only used when zookeeper.enabled is set to false
+  ##
+  externalZookeeper:
+    ## Server or list of external zookeeper servers to use. In this case, it is set to the zookeeper deployed as part of this chart.
+    ##
+    servers:
+      - '{{ .Release.Name }}-zookeeper'