Files
charts/bitnami/spring-cloud-dataflow/README.md
Alexey Zhokhov f49b5a1ade [bitnami/spring-cloud-data-flow] Fixes and improvements (#2945)
* Ignore .DS_Store

* Spring Cloud Data Flow chart fixes and improvements:
1) added ability to configure Hibernate dialect
2) fixed issue with locating application.yaml in kubernetes cluster
3) added JDWP support
4) removed MariaDB from database comments, because external database can be PostgreSQL as well
5) fixed some wrong properties (externalDatabase.server -> externalDatabase.dataflow)

* Added changes to values-production.yaml.

* Bumped spring-cloud-dataflow to 0.1.3 version.

* Added hibernateDialect and jdwp variables to README.md.

* Fixed "missing starting space in comment"

* Fixed "java.sql.SQLInvalidAuthorizationSpecException: Access denied for user 'dataflow'@'10.244.0.5' (using password: YES)".

* Bump version to 0.2.0.

* Added support for external RabbitMQ.

* Added externalRabbitmq config to README.
Fixed typo: rabbitmq.rabbitmq -> rabbitmq.auth.

* Fixed config location, use /opt/bitnami/**/conf.
2020-07-06 14:52:47 +02:00

43 KiB

Spring Cloud Data Flow

Spring Cloud Data Flow is a microservices-based Streaming and Batch data processing pipeline in Cloud Foundry and Kubernetes.

TL;DR;

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/spring-cloud-dataflow

Introduction

This chart bootstraps a Spring Cloud Data Flow deployment on a Kubernetes cluster using the Helm package manager.

Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters.

Prerequisites

  • Kubernetes 1.12+
  • Helm 2.12+ or Helm 3.0-beta3+
  • PV provisioner support in the underlying infrastructure

Installing the Chart

To install the chart with the release name my-release:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/spring-cloud-dataflow

These commands deploy Spring Cloud Data Flow on the Kubernetes cluster with the default configuration. The parameters section lists the parameters that can be configured during installation.

Tip

: List all releases using helm list

Uninstalling the Chart

To uninstall/delete the my-release chart:

helm uninstall my-release

Parameters

The following tables lists the configurable parameters of the Spring Cloud Data Flow chart and their default values per section/component:

Global parameters

Parameter Description Default
global.imageRegistry Global Docker image registry nil
global.imagePullSecrets Global Docker registry secret names as an array [] (does not add image pull secrets to deployed pods)
global.storageClass Global storage class for dynamic provisioning nil

Common parameters

Parameter Description Default
nameOverride String to partially override scdf.fullname nil
fullnameOverride String to fully override scdf.fullname nil
clusterDomain Default Kubernetes cluster domain cluster.local
deployer.resources.limits Streaming applications resource limits { cpu: "500m", memory: "1024Mi" }
deployer.resources.requests Streaming applications resource requests {}
deployer.resources.readinessProbe Streaming applications readiness probes requests Check values.yaml file
deployer.resources.livenessProbe Streaming applications liveness probes requests Check values.yaml file

Dataflow Server parameters

Parameter Description Default
server.image.registry Spring Cloud Dataflow image registry docker.io
server.image.repository Spring Cloud Dataflow image name bitnami/spring-cloud-dataflow
server.image.tag Spring Cloud Dataflow image tag {TAG_NAME}
server.image.pullPolicy Spring Cloud Dataflow image pull policy IfNotPresent
server.image.pullSecrets Specify docker-registry secret names as an array [] (does not add image pull secrets to deployed pods)
server.configuration.streamingEnabled Enables or disables streaming data processing true
server.configuration.batchEnabled Enables or disables bath data (tasks and schedules) processing true
server.configuration.accountName The name of the account to configure for the Kubernetes platform default
server.configuration.trustK8sCerts Trust K8s certificates when querying the Kubernetes API false
server.configuration.containerRegistries Container registries configuration {} (check values.yaml for more information)
server.existingConfigmap Name of existing ConfigMap with Dataflow server configuration nil
server.extraEnvVars Extra environment variables to be set on Dataflow server container {}
server.extraEnvVarsCM Name of existing ConfigMap containing extra env vars nil
server.extraEnvVarsSecret Name of existing Secret containing extra env vars nil
server.replicaCount Number of Dataflow server replicas to deploy 1
server.strategyType Deployment Strategy Type RollingUpdate
server.affinity Affinity for pod assignment {} (evaluated as a template)
server.nodeSelector Node labels for pod assignment {} (evaluated as a template)
server.tolerations Tolerations for pod assignment [] (evaluated as a template)
server.priorityClassName Controller priorityClassName nil
server.podSecurityContext Dataflow server pods' Security Context { fsGroup: "1001" }
server.containerSecurityContext Dataflow server containers' Security Context { runAsUser: "1001" }
server.resources.limits The resources limits for the Dataflow server container {}
server.resources.requests The requested resources for the Dataflow server container {}
server.podAnnotations Annotations for Dataflow server pods {}
server.livenessProbe Liveness probe configuration for Dataflow server Check values.yaml file
server.readinessProbe Readiness probe configuration for Dataflow server Check values.yaml file
server.customLivenessProbe Override default liveness probe nil
server.customReadinessProbe Override default readiness probe nil
server.service.type Kubernetes service type ClusterIP
server.service.port Service HTTP port 8080
server.service.nodePort Service HTTP node port nil
server.service.clusterIP Dataflow server service clusterIP IP None
server.service.externalTrafficPolicy Enable client source IP preservation Cluster
server.service.loadBalancerIP loadBalancerIP if service type is LoadBalancer nil
server.service.loadBalancerSourceRanges Address that are allowed when service is LoadBalancer []
server.service.annotations Annotations for Dataflow server service {}
server.ingress.enabled Enable ingress controller resource false
server.ingress.certManager Add annotations for cert-manager false
server.ingress.hostname Default host for the ingress resource dataflow.local
server.ingress.annotations Ingress annotations []
server.ingress.extraHosts[0].name Additional hostnames to be covered nil
server.ingress.extraHosts[0].path Additional hostnames to be covered nil
server.ingress.extraTls[0].hosts[0] TLS configuration for additional hostnames to be covered nil
server.ingress.extraTls[0].secretName TLS configuration for additional hostnames to be covered nil
server.ingress.secrets[0].name TLS Secret Name nil
server.ingress.secrets[0].certificate TLS Secret Certificate nil
server.ingress.secrets[0].key TLS Secret Key nil
server.initContainers Add additional init containers to the Dataflow server pods {} (evaluated as a template)
server.sidecars Add additional sidecar containers to the Dataflow server pods {} (evaluated as a template)
server.pdb.create Enable/disable a Pod Disruption Budget creation false
server.pdb.minAvailable Minimum number/percentage of pods that should remain scheduled 1
server.pdb.maxUnavailable Maximum number/percentage of pods that may be made unavailable nil
server.autoscaling.enabled Enable autoscaling for Dataflow server false
server.autoscaling.minReplicas Minimum number of Dataflow server replicas nil
server.autoscaling.maxReplicas Maximum number of Dataflow server replicas nil
server.autoscaling.targetCPU Target CPU utilization percentage nil
server.autoscaling.targetMemory Target Memory utilization percentage nil
server.jdwp.enabled Enable Java Debug Wire Protocol (JDWP) false
server.jdwp.port JDWP TCP port 5005

Dataflow Skipper parameters

Parameter Description Default
skipper.enabled Enable Spring Cloud Skipper component true
skipper.image.registry Spring Cloud Skipper image registry docker.io
skipper.image.repository Spring Cloud Skipper image name bitnami/spring-cloud-dataflow
skipper.image.tag Spring Cloud Skipper image tag {TAG_NAME}
skipper.image.pullPolicy Spring Cloud Skipper image pull policy IfNotPresent
skipper.image.pullSecrets Specify docker-registry secret names as an array [] (does not add image pull secrets to deployed pods)
skipper.configuration.accountName The name of the account to configure for the Kubernetes platform default
skipper.configuration.trustK8sCerts Trust K8s certificates when querying the Kubernetes API false
skipper.existingConfigmap Name of existing ConfigMap with Skipper server configuration nil
skipper.extraEnvVars Extra environment variables to be set on Skipper server container {}
skipper.extraEnvVarsCM Name of existing ConfigMap containing extra env vars nil
skipper.extraEnvVarsSecret Name of existing Secret containing extra env vars nil
skipper.replicaCount Number of Skipper server replicas to deploy 1
skipper.strategyType Deployment Strategy Type RollingUpdate
skipper.affinity Affinity for pod assignment {} (evaluated as a template)
skipper.nodeSelector Node labels for pod assignment {} (evaluated as a template)
skipper.tolerations Tolerations for pod assignment [] (evaluated as a template)
skipper.priorityClassName Controller priorityClassName nil
skipper.podSecurityContext Skipper server pods' Security Context { fsGroup: "1001" }
skipper.containerSecurityContext Skipper server containers' Security Context { runAsUser: "1001" }
skipper.resources.limits The resources limits for the Skipper server container {}
skipper.resources.requests The requested resources for the Skipper server container {}
skipper.podAnnotations Annotations for Skipper server pods {}
skipper.livenessProbe Liveness probe configuration for Skipper server Check values.yaml file
skipper.readinessProbe Readiness probe configuration for Skipper server Check values.yaml file
skipper.customLivenessProbe Override default liveness probe nil
skipper.customReadinessProbe Override default readiness probe nil
skipper.service.type Kubernetes service type ClusterIP
skipper.service.port Service HTTP port 8080
skipper.service.nodePort Service HTTP node port nil
skipper.service.clusterIP Skipper server service clusterIP IP None
skipper.service.externalTrafficPolicy Enable client source IP preservation Cluster
skipper.service.loadBalancerIP loadBalancerIP if service type is LoadBalancer nil
skipper.service.loadBalancerSourceRanges Address that are allowed when service is LoadBalancer []
skipper.service.annotations Annotations for Skipper server service {}
skipper.initContainers Add additional init containers to the Skipper pods {} (evaluated as a template)
skipper.sidecars Add additional sidecar containers to the Skipper pods {} (evaluated as a template)
skipper.pdb.create Enable/disable a Pod Disruption Budget creation false
skipper.pdb.minAvailable Minimum number/percentage of pods that should remain scheduled 1
skipper.pdb.maxUnavailable Maximum number/percentage of pods that may be made unavailable nil
skipper.autoscaling.enabled Enable autoscaling for Skipper server false
skipper.autoscaling.minReplicas Minimum number of Skipper server replicas nil
skipper.autoscaling.maxReplicas Maximum number of Skipper server replicas nil
skipper.autoscaling.targetCPU Target CPU utilization percentage nil
skipper.autoscaling.targetMemory Target Memory utilization percentage nil
skipper.jdwp.enabled Enable Java Debug Wire Protocol (JDWP) false
skipper.jdwp.port JDWP TCP port 5005
externalSkipper.host Host of a external Skipper Server localhost
externalSkipper.port External Skipper Server port number 7577

RBAC parameters

Parameter Description Default
serviceAccount.create Enable the creation of a ServiceAccount for Dataflow server and Skipper server pods true
serviceAccount.name Name of the created serviceAccount Generated using the scdf.fullname template
rbac.create Weather to create & use RBAC resources or not true

Metrics parameters

Parameter Description Default
metrics.metrics Enable the export of Prometheus metrics false
metrics.image.registry Prometheus Rsocket Proxy image registry docker.io
metrics.image.repository Prometheus Rsocket Proxy image name bitnami/prometheus-rsocket-proxy
metrics.image.tag Prometheus Rsocket Proxy image tag {TAG_NAME}
metrics.image.pullPolicy Prometheus Rsocket Proxy image pull policy IfNotPresent
metrics.image.pullSecrets Specify docker-registry secret names as an array [] (does not add image pull secrets to deployed pods)
metrics.kafka.service.httpPort Prometheus Rsocket Proxy HTTP port 8080
metrics.kafka.service.rsocketPort Prometheus Rsocket Proxy Rsocket port 8080
metrics.kafka.service.annotations Annotations for Prometheus Rsocket Proxy service Check values.yaml file
metrics.serviceMonitor.enabled if true, creates a Prometheus Operator ServiceMonitor (also requires metrics.enabled to be true) false
metrics.serviceMonitor.namespace Namespace in which Prometheus is running nil
metrics.serviceMonitor.interval Interval at which metrics should be scraped. nil (Prometheus Operator default value)
metrics.serviceMonitor.scrapeTimeout Timeout after which the scrape is ended nil (Prometheus Operator default value)

Init Container parameters

Parameter Description Default
waitForBackends.enabled Wait for the database and other services (such as Kafka or RabbitMQ) used when enabling streaming true
waitForBackends.image.registry Init container wait-for-backend image registry docker.io
waitForBackends.image.repository Init container wait-for-backend image name bitnami/kubectl
waitForBackends.image.tag Init container wait-for-backend image tag {TAG_NAME}
waitForBackends.image.pullPolicy Init container wait-for-backend image pull policy IfNotPresent
waitForBackends.image.pullSecrets Specify docker-registry secret names as an array [] (does not add image pull secrets to deployed pods)
waitForBackends.resources.limits Init container wait-for-backend resource limits {}
waitForBackends.resources.requests Init container wait-for-backend resource requests {}

Database parameters

Parameter Description Default
mariadb.enabled Enable/disable MariaDB chart installation true
mariadb.replication.enabled MariaDB replication enabled false
mariadb.db.name Name for new database to create dataflow
mariadb.db.user Username of new user to create dataflow
mariadb.db.password Password for the new user change-me_
mariadb.initdbScripts Dictionary of initdb scripts Check values.yaml file
externalDatabase.host Host of the external database localhost
externalDatabase.port External database port number 3306
externalDatabase.password Password for the above username ""
externalDatabase.dataflow.user Existing username in the external db to be used by Dataflow server dataflow
externalDatabase.dataflow.database Name of the existing database to be used by Dataflow server dataflow
externalDatabase.skipper.user Existing username in the external db to be used by Skipper server skipper
externalDatabase.skipper.database Name of the existing database to be used by Skipper server skipper
externalDatabase.hibernateDialect Hibernate Dialect used by Dataflow/Skipper servers ""

RabbitMQ chart parameters

Parameter Description Default
rabbitmq.enabled Enable/disable RabbitMQ chart installation true
rabbitmq.auth.username RabbitMQ username user
rabbitmq.auth.password RabbitMQ password random 40 character alphanumeric string
externalRabbitmq.enabled Enable/disable external RabbitMQ false
externalRabbitmq.host Host of the external RabbitMQ localhost
externalRabbitmq.port External RabbitMQ port number 5672
externalRabbitmq.username External RabbitMQ username guest
externalRabbitmq.password External RabbitMQ password guest

Kafka chart parameters

Parameter Description Default
kafka.enabled Enable/disable Kafka chart installation false
kafka.replicaCount Number of Kafka brokers 1
kafka.offsetsTopicReplicationFactor Kafka Secret Key 1
kafka.zookeeper.enabled Enable/disable Zookeeper chart installation nil
kafka.zookeeper.replicaCount Number of Zookeeper replicas 1

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,

helm install my-release --set server.replicaCount=2 bitnami/spring-cloud-dataflow

The above command install Spring Cloud Data Flow chart with 2 Dataflow server replicas.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

helm install my-release -f values.yaml bitnami/spring-cloud-dataflow

Tip

: You can use the default values.yaml

Configuration and installation details

Rolling VS Immutable tags

It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.

Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.

Production configuration

This chart includes a values-production.yaml file where you can find some parameters oriented to production configuration in comparison to the regular values.yaml. You can use this file instead of the default one.

  • Enable Pod Disruption Budget for Server and Skipper:
- server.pdb.create: false
+ server.pdb.create: true
- skipper.pdb.create: false
+ skipper.pdb.create: true
  • Enable exposing Prometheus Metrics via Prometheus Rsocket Proxy:
- metrics.enabled: false
+ metrics.enabled: true
  • Force users to specify a password and mount secrets as volumes instead of using environment variables on MariaDB:
- mariadb.rootUser.forcePassword: false
- mariadb.rootUser.injectSecretsAsVolume: false
+ mariadb.rootUser.forcePassword: true
+ mariadb.rootUser.injectSecretsAsVolume: true
- mariadb.db.forcePassword: false
- mariadb.db.injectSecretsAsVolume: false
+ mariadb.db.forcePassword: true
+ mariadb.db.injectSecretsAsVolume: true

Features

If you only need to deploy tasks and schedules, streaming and Skipper can be disabled:

server.configuration.batchEnabled=true
server.configuration.streamingEnabled=false
skipper.enabled=false
rabbitmq.enabled=false

If you only need to deploy streams, tasks and schedules can be disabled:

server.configuration.batchEnabled=false
server.configuration.streamingEnabled=true
skipper.enabled=true
rabbitmq.enabled=true

NOTE: Both server.configuration.batchEnabled and server.configuration.streamingEnabled should not be set to false at the same time.

Messaging solutions

There are two supported messaging solutions in this chart:

  • RabbitMQ (default)
  • Kafka

To change the messaging layer to Kafka, use the the following parameters:

rabbitmq.enabled=false
kafka.enabled=true

Only one messaging layer can be used at a given time.

Using an external database

Sometimes you may want to have Spring Cloud components connect to an external database rather than installing one inside your cluster, e.g. to use a managed database service, or use run a single database server for all your applications. To do this, the chart allows you to specify credentials for an external database under the externalDatabase parameter. You should also disable the MariaDB installation with the mariadb.enabled option. For example with the following parameters:

mariadb.enabled=false
externalDatabase.host=myexternalhost
externalDatabase.port=3306
externalDatabase.password=mypassword
externalDatabase.dataflow.user=mydataflowuser
externalDatabase.dataflow.database=mydataflowdatabase
externalDatabase.dataflow.user=myskipperuser
externalDatabase.dataflow.database=myskipperdatabase

Note also if you disable MariaDB per above you MUST supply values for the externalDatabase connection.

Adding extra flags

In case you want to add extra environment variables to any Spring Cloud component, you can use XXX.extraEnvs parameter(s), where XXX is placeholder you need to replace with the actual component(s). For instance, to add extra flags to Spring Cloud Data Flow, use:

server:
  extraEnvs:
    - name: FOO
      value: BAR

Using custom Dataflow configuration

This helm chart supports using custom configuration for Dataflow server.

You can specify the configuration for Dataflow server setting the server.existingConfigmap parameter to an external ConfigMap with the configuration file.

Using custom Skipper configuration

This helm chart supports using custom configuration for Skipper server.

You can specify the configuration for Skipper server setting the skipper.existingConfigmap parameter to an external ConfigMap with the configuration file.

Sidecars and Init Containers

If you have a need for additional containers to run within the same pod as Dataflow or Skipper components (e.g. an additional metrics or logging exporter), you can do so via the XXX.sidecars parameter(s), where XXX is placeholder you need to replace with the actual component(s). Simply define your container according to the Kubernetes container spec.

server:
  sidecars:
    - name: your-image-name
      image: your-image
      imagePullPolicy: Always
      ports:
        - name: portname
          containerPort: 1234

Similarly, you can add extra init containers using the XXX.initContainers parameter(s).

server:
  initContainers:
    - name: your-image-name
      image: your-image
      imagePullPolicy: Always
      ports:
        - name: portname
          containerPort: 1234

Ingress

This chart provides support for ingress resources. If you have an ingress controller installed on your cluster, such as nginx-ingress or traefik you can utilize the ingress controller to serve your Spring Cloud Data Flow server.

To enable ingress integration, please set server.ingress.enabled to true

Hosts

Most likely you will only want to have one hostname that maps to this Spring Cloud Data Flow installation. If that's your case, the property server.ingress.hostname will set it. However, it is possible to have more than one host. To facilitate this, the server.ingress.extraHosts object is can be specified as an array. You can also use server.ingress.extraTLS to add the TLS configuration for extra hosts.

For each host indicated at server.ingress.extraHosts, please indicate a name, path, and any annotations that you may want the ingress controller to know about.

For annotations, please see this document. Not all annotations are supported by all ingress controllers, but this document does a good job of indicating which annotation is supported by many popular ingress controllers.

Upgrading

It's necessary to set the mariadb.rootUser.password parameter when upgrading for readiness/liveness probes to work properly. When you install this chart for the first time, unless you indicate set this parameter, a random value will be generated. Inspect the MariaDB secret to obtain the root password, then you can upgrade your chart using the command below:

helm upgrade my-release bitnami/spring-cloud-dataflow --set mariadb.rootUser.password=[MARIADB_ROOT_PASSWORD]

If you enabled RabbitMQ chart to be used as the messaging solution for Skipper to manage streaming content, then it's necessary to set the rabbitmq.auth.password and rabbitmq.auth.erlangCookie parameters when upgrading for readiness/liveness probes to work properly. Inspect the RabbitMQ secret to obtain the password and the Erlang cookie, then you can upgrade your chart using the command below:

helm upgrade my-release bitnami/spring-cloud-dataflow --set mariadb.rootUser.password=[MARIADB_ROOT_PASSWORD] --set rabbitmq.auth.password=[RABBITMQ_PASSWORD] --set rabbitmq.auth.erlangCookie=[RABBITMQ_ERLANG_COOKIE]