* [bitnami/airflow] Add category to Chart.yaml * [bitnami/apache] Add category to Chart.yaml * [bitnami/cassandra] Add category to Chart.yaml * [bitnami/common] Add category to Chart.yaml * [bitnami/consul] Add category to Chart.yaml * [bitnami/contour] Add category to Chart.yaml * [bitnami/discourse] Add category to Chart.yaml * [bitnami/dokuwiki] Add category to Chart.yaml * [bitnami/drupal] Add category to Chart.yaml * [bitnami/ejbca] Add category to Chart.yaml * [bitnami/elasticsearch] Add category to Chart.yaml * [bitnami/etcd] Add category to Chart.yaml * [bitnami/external-dns] Add category to Chart.yaml * [bitnami/fluentd] Add category to Chart.yaml * [bitnami/ghost] Add category to Chart.yaml * [bitnami/grafana] Add category to Chart.yaml * [bitnami/harbor] Add category to Chart.yaml * [bitnami/influxdb] Add category to Chart.yaml * [bitnami/jasperreports] Add category to Chart.yaml * [bitnami/jenkins] Add category to Chart.yaml * [bitnami/joomla] Add category to Chart.yaml * [bitnami/kafka] Add category to Chart.yaml * [bitnami/kibana] Add category to Chart.yaml * [bitnami/kong] Add category to Chart.yaml * [bitnami/kube-state-metrics] Add category to Chart.yaml * [bitnami/kubewatch] Add category to Chart.yaml * [bitnami/logstash] Add category to Chart.yaml * [bitnami/magento] Add category to Chart.yaml * [bitnami/mariadb] Add category to Chart.yaml * [bitnami/mariadb-galera] Add category to Chart.yaml * [bitnami/mediawiki] Add category to Chart.yaml * [bitnami/memcached] Add category to Chart.yaml * [bitnami/metallb] Add category to Chart.yaml * [bitnami/metrics-server] Add category to Chart.yaml * [bitnami/minio] Add category to Chart.yaml * [bitnami/mongodb] Add category to Chart.yaml * [bitnami/mongodb-sharded] Add category to Chart.yaml * [bitnami/moodle] Add category to Chart.yaml * [bitnami/mxnet] Add category to Chart.yaml * [bitnami/nats] Add category to Chart.yaml * [bitnami/nginx] Add category to Chart.yaml * [bitnami/nginx-ingress-controller] Add category to Chart.yaml * [bitnami/node] Add category to Chart.yaml * [bitnami/node-exporter] Add category to Chart.yaml * [bitnami/odoo] Add category to Chart.yaml * [bitnami/opencart] Add category to Chart.yaml * [bitnami/orangehrm] Add category to Chart.yaml * [bitnami/osclass] Add category to Chart.yaml * [bitnami/owncloud] Add category to Chart.yaml * [bitnami/parse] Add category to Chart.yaml * [bitnami/phabricator] Add category to Chart.yaml * [bitnami/phpbb] Add category to Chart.yaml * [bitnami/phpmyadmin] Add category to Chart.yaml * [bitnami/postgresql] Add category to Chart.yaml * [bitnami/postgresql-ha] Add category to Chart.yaml * [bitnami/prestashop] Add category to Chart.yaml * [bitnami/prometheus-operator] Add category to Chart.yaml * [bitnami/pytorch] Add category to Chart.yaml * [bitnami/rabbitmq] Add category to Chart.yaml * [bitnami/redis] Add category to Chart.yaml * [bitnami/redis-cluster] Add category to Chart.yaml * [bitnami/redmine] Add category to Chart.yaml * [bitnami/spark] Add category to Chart.yaml * [bitnami/spring-cloud-dataflow] Add category to Chart.yaml * [bitnami/suitecrm] Add category to Chart.yaml * [bitnami/tensorflow-resnet] Add category to Chart.yaml * [bitnami/testlink] Add category to Chart.yaml * [bitnami/thanos] Add category to Chart.yaml * [bitnami/tomcat] Add category to Chart.yaml * [bitnami/wildfly] Add category to Chart.yaml * [bitnami/wordpress] Add category to Chart.yaml * [bitnami/zookeeper] Add category to Chart.yaml * Add Github action * Standarise E-xxx categories * Standarise lint action yaml * Add invalid category to test action * Fetch repo before action * Revert "Add invalid category to test action" This reverts commit 0495382cb286be8d276d17b1ddaa89b75b3626b1. * Add CHART_CATEGORIES file
Apache Spark
Apache Spark is a high-performance engine for large-scale computing tasks, such as data processing, machine learning and real-time data streaming. It includes APIs for Java, Python, Scala and R.
TL;DR;
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/spark
Introduction
This chart bootstraps a spark deployment on a Kubernetes cluster using the Helm package manager.
Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of Bitnami Kubernetes Production Runtime (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.
Prerequisites
- Kubernetes 1.12+
- Helm 2.12+ or Helm 3.0-beta3+
Installing the Chart
To install the chart with the release name my-release:
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/spark
These commands deploy Spark on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured during installation.
Tip
: List all releases using
helm list
Uninstalling the Chart
To uninstall/delete the my-release statefulset:
$ helm delete my-release
The command removes all the Kubernetes components associated with the chart and deletes the release. Use the option --purge to delete all persistent volumes too.
Parameters
The following tables lists the configurable parameters of the spark chart and their default values.
| Parameter | Description | Default |
|---|---|---|
global.imageRegistry |
Global Docker image registry | nil |
global.imagePullSecrets |
Global Docker registry secret names as an array | [] (does not add image pull secrets to deployed pods) |
image.registry |
spark image registry | docker.io |
image.repository |
spark Image name | bitnami/spark |
image.tag |
spark Image tag | {TAG_NAME} |
image.pullPolicy |
spark image pull policy | IfNotPresent |
image.pullSecrets |
Specify docker-registry secret names as an array | [] (does not add image pull secrets to deployed pods) |
nameOverride |
String to partially override spark.fullname template with a string (will prepend the release name) | nil |
fullnameOverride |
String to fully override spark.fullname template with a string | nil |
master.debug |
Specify if debug values should be set on the master | false |
master.webPort |
Specify the port where the web interface will listen on the master | 8080 |
master.clusterPort |
Specify the port where the master listens to communicate with workers | 7077 |
master.daemonMemoryLimit |
Set the memory limit for the master daemon | No default |
master.configOptions |
Optional configuration if the form -Dx=y |
No default |
master.securityContext.enabled |
Enable security context | true |
master.securityContext.fsGroup |
Group ID for the container | 1001 |
master.securityContext.runAsUser |
User ID for the container | 1001 |
master.podAnnotations |
Annotations for pods in StatefulSet | {} (The value is evaluated as a template) |
master.nodeSelector |
Node affinity policy | {} (The value is evaluated as a template) |
master.tolerations |
Tolerations for pod assignment | [] (The value is evaluated as a template) |
master.affinity |
Affinity for pod assignment | {} (The value is evaluated as a template) |
master.resources |
CPU/Memory resource requests/limits | {} |
master.extraEnvVars |
Extra environment variables to pass to the master container | {} |
master.extraVolumes |
Array of extra volumes to be added to the Spark master deployment (evaluated as template). Requires setting master.extraVolumeMounts |
nil |
master.extraVolumeMounts |
Array of extra volume mounts to be added to the Spark master deployment (evaluated as template). Normally used with master.extraVolumes. |
nil |
master.livenessProbe.enabled |
Turn on and off liveness probe | true |
master.livenessProbe.initialDelaySeconds |
Delay before liveness probe is initiated | 10 |
master.livenessProbe.periodSeconds |
How often to perform the probe | 10 |
master.livenessProbe.timeoutSeconds |
When the probe times out | 5 |
master.livenessProbe.failureThreshold |
Minimum consecutive failures for the probe to be considered failed after having succeeded. | 2 |
master.livenessProbe.successThreshold |
Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
master.readinessProbe.enabled |
Turn on and off readiness probe | true |
master.readinessProbe.initialDelaySeconds |
Delay before liveness probe is initiated | 5 |
master.readinessProbe.periodSeconds |
How often to perform the probe | 10 |
master.readinessProbe.timeoutSeconds |
When the probe times out | 5 |
master.readinessProbe.failureThreshold |
Minimum consecutive failures for the probe to be considered failed after having succeeded. | 6 |
master.readinessProbe.successThreshold |
Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
worker.debug |
Specify if debug values should be set on workers | false |
worker.webPort |
Specify the port where the web interface will listen on the worker | 8080 |
worker.clusterPort |
Specify the port where the worker listens to communicate with the master | 7077 |
worker.daemonMemoryLimit |
Set the memory limit for the worker daemon | No default |
worker.memoryLimit |
Set the maximum memory the worker is allowed to use | No default |
worker.coreLimit |
Se the maximum number of cores that the worker can use | No default |
worker.dir |
Set a custom working directory for the application | No default |
worker.javaOptions |
Set options for the JVM in the form -Dx=y |
No default |
worker.configOptions |
Set extra options to configure the worker in the form -Dx=y |
No default |
worker.replicaCount |
Set the number of workers | 2 |
worker.autoscaling.enabled |
Enable autoscaling depending on CPU | false |
worker.autoscaling.CpuTargetPercentage |
k8s hpa cpu targetPercentage | 50 |
worker.autoscaling.replicasMax |
Maximum number of workers when using autoscaling | 5 |
worker.securityContext.enabled |
Enable security context | true |
worker.securityContext.fsGroup |
Group ID for the container | 1001 |
worker.securityContext.runAsUser |
User ID for the container | 1001 |
worker.podAnnotations |
Annotations for pods in StatefulSet | {} |
worker.nodeSelector |
Node labels for pod assignment. Used as a template from the values. | {} |
worker.tolerations |
Toleration labels for pod assignment | [] |
worker.affinity |
Affinity and AntiAffinity rules for pod assignment | {} |
worker.resources |
CPU/Memory resource requests/limits | Memory: 256Mi, CPU: 250m |
worker.livenessProbe.enabled |
Turn on and off liveness probe | true |
worker.livenessProbe.initialDelaySeconds |
Delay before liveness probe is initiated | 10 |
worker.livenessProbe.periodSeconds |
How often to perform the probe | 10 |
worker.livenessProbe.timeoutSeconds |
When the probe times out | 5 |
worker.livenessProbe.failureThreshold |
Minimum consecutive failures for the probe to be considered failed after having succeeded. | 2 |
worker.livenessProbe.successThreshold |
Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
worker.readinessProbe.enabled |
Turn on and off readiness probe | true |
worker.readinessProbe.initialDelaySeconds |
Delay before liveness probe is initiated | 5 |
worker.readinessProbe.periodSeconds |
How often to perform the probe | 10 |
worker.readinessProbe.timeoutSeconds |
When the probe times out | 5 |
worker.readinessProbe.failureThreshold |
Minimum consecutive failures for the probe to be considered failed after having succeeded. | 6 |
worker.readinessProbe.successThreshold |
Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
master.extraEnvVars |
Extra environment variables to pass to the worker container | {} |
worker.extraVolumes |
Array of extra volumes to be added to the Spark worker deployment (evaluated as template). Requires setting worker.extraVolumeMounts |
nil |
worker.extraVolumeMounts |
Array of extra volume mounts to be added to the Spark worker deployment (evaluated as template). Normally used with worker.extraVolumes. |
nil |
security.passwordsSecretName |
Secret to use when using security configuration to set custom passwords | No default |
security.rpc.authenticationEnabled |
Enable the RPC authentication | false |
security.rpc.encryptionEnabled |
Enable the encryption for RPC | false |
security.storageEncryptionEnabled |
Enable the encryption of the storage | false |
security.ssl.enabled |
Enable the SSL configuration | false |
security.ssl.needClientAuth |
Enable the client authentication | false |
security.ssl.protocol |
Set the SSL protocol | TLSv1.2 |
security.certificatesSecretName |
Set the name of the secret that contains the certificates | No default |
service.type |
Kubernetes Service type | ClusterIP |
service.webPort |
Spark client port | 80 |
service.clusterPort |
Spark cluster port | 7077 |
service.nodePort |
Port to bind to for NodePort service type (client port) | nil |
service.nodePorts.cluster |
Kubernetes cluster node port | "" |
service.nodePorts.web |
Kubernetes web node port | "" |
service.annotations |
Annotations for spark service | {} |
service.loadBalancerIP |
loadBalancerIP if spark service type is LoadBalancer |
nil |
ingress.enabled |
Enable the use of the ingress controller to access the web UI | false |
ingress.certManager |
Add annotations for cert-manager | false |
ingress.annotations |
Ingress annotations | {} |
ingress.hosts[0].name |
Hostname to your Spark installation | spark.local |
ingress.hosts[0].path |
Path within the url structure | / |
ingress.hosts[0].tls |
Utilize TLS backend in ingress | false |
ingress.hosts[0].tlsHosts |
Array of TLS hosts for ingress record (defaults to ingress.hosts[0].name if nil) |
nil |
ingress.hosts[0].tlsSecret |
TLS Secret (certificates) | spark.local-tls |
Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,
$ helm install my-release \
--set master.webPort=8081 bitnami/spark
The above command sets the spark master web port to 8081.
Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,
$ helm install my-release -f values.yaml bitnami/spark
Tip
: You can use the default values.yaml
Configuration and installation details
Rolling VS Immutable tags
It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
Production configuration
This chart includes a values-production.yaml file where you can find some parameters oriented to production configuration in comparison to the regular values.yaml. You can use this file instead of the default one.
- Enable ingress controller
- ingress.enabled: false
+ ingress.enabled: true
- Enable RPC authentication and encryption:
- security.rpc.authenticationEnabled: false
- security.rpc.encryptionEnabled: false
+ security.rpc.authenticationEnabled: true
+ security.rpc.encryptionEnabled: true
- Enable storage encryption:
- security.storageEncryptionEnabled: false
+ security.storageEncryptionEnabled: true
- Configure SSL parameters:
- security.ssl.enabled: false
- security.ssl.needClientAuth: false
+ security.ssl.enabled: true
+ security.ssl.needClientAuth: true
- Set a secret name for passwords:
+ security.passwordsSecretName: my-passwords-secret
- Set a secret name for certificates:
+ security.certificatesSecretName: my-certificates-secret
- Enable autoscaling depending on CPU:
- worker.autoscaling.enabled: false
- worker.autoscaling.replicasMax: 5
+ worker.autoscaling.enabled: true
+ worker.autoscaling.replicasMax: 10
Using custom configuration
To use a custom configuration a ConfigMap should be created with the spark-env.sh file inside the ConfigMap. The ConfigMap name must be provided at deployment time, to set the configuration on the master use: master.configurationConfigMap=configMapName
To set the configuration on the worker use: worker.configurationConfigMap=configMapName
It can be set both at the same time with the same ConfigMap or using two ConfigMaps.
Also, you can provide in the ConfigMap a spark-defaults.conf file.
You can use both files without the other.
Submit an application
To submit an application to the cluster use the spark-submit script. You can obtain the script here. For example, to deploy one of the example applications:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://<master-IP>:<master-cluster-port> --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.4.3.jar 1000
Where the master IP and port must be changed by you master IP address and port.
Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue here.
Enable security for spark
Configure ssl communication
In order to enable secure transport between workers and master deploy the helm chart with this options: ssl.enabled=true
How to create the certificates secret
It is needed to create two secrets to set the passwords and certificates. The name of the two secrets should be configured on security.passwordsSecretName and security.certificatesSecretName. To generate certificates for testing purpose you can use this script.
Into the certificates secret, the keys must be spark-keystore.jks and spark-truststore.jks, and the content must be text on JKS format.
To generate the certificates secret, first it is needed to generate the two certificates and rename them as spark-keystore.jks and spark-truststore.jks.
Once the certificates are created, you can create the secret having the file names as keys.
The second secret, the secret for passwords should have four keys: rpc-authentication-secret, ssl-key-password, ssl-keystore-password and ssl-truststore-password.
Now that the two secrets are created, deploy the chart enabling security configuration and setting the name for the certificates secret (my-secret in this case) at the security.certificatesSecretName and setting the name for the passwords secret (my-passwords-secret in this case) at security.passwordsSecretName.
To deploy chart, use the following parameters:
security.certificatesSecretName=my-secret
security.passwordsSecretName=my-passwords-secret
security.rpc.authenticationEnabled=true
security.rpc.encryptionEnabled=true
security.storageEncrytionEnabled=true
security.ssl.enabled=true
security.ssl.needClientAuth=true
Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue here.