mirror of
https://github.com/bitnami/charts.git
synced 2026-03-13 14:57:24 +08:00
272 lines
31 KiB
Markdown
272 lines
31 KiB
Markdown
# Apache Spark
|
|
|
|
[Apache Spark](https://spark.apache.org/) is a high-performance engine for large-scale computing tasks, such as data processing, machine learning and real-time data streaming. It includes APIs for Java, Python, Scala and R.
|
|
|
|
## TL;DR;
|
|
|
|
```console
|
|
$ helm repo add bitnami https://charts.bitnami.com/bitnami
|
|
$ helm install my-release bitnami/spark
|
|
```
|
|
|
|
## Introduction
|
|
|
|
This chart bootstraps a [spark](https://github.com/bitnami/bitnami-docker-spark) deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
|
|
|
|
Bitnami charts can be used with [Kubeapps](https://kubeapps.com/) for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of [Bitnami Kubernetes Production Runtime](https://kubeprod.io/) (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.
|
|
|
|
## Prerequisites
|
|
|
|
- Kubernetes 1.12+
|
|
- Helm 2.11+ or Helm 3.0-beta3+
|
|
|
|
## Installing the Chart
|
|
|
|
To install the chart with the release name `my-release`:
|
|
|
|
```console
|
|
$ helm repo add bitnami https://charts.bitnami.com/bitnami
|
|
$ helm install my-release bitnami/spark
|
|
```
|
|
|
|
These commands deploy Spark on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) section lists the parameters that can be configured during installation.
|
|
|
|
> **Tip**: List all releases using `helm list`
|
|
|
|
## Uninstalling the Chart
|
|
|
|
To uninstall/delete the `my-release` statefulset:
|
|
|
|
```console
|
|
$ helm delete my-release
|
|
```
|
|
|
|
The command removes all the Kubernetes components associated with the chart and deletes the release. Use the option `--purge` to delete all persistent volumes too.
|
|
|
|
## Parameters
|
|
|
|
The following tables lists the configurable parameters of the spark chart and their default values.
|
|
|
|
| Parameter | Description | Default |
|
|
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
|
|
| `global.imageRegistry` | Global Docker image registry | `nil` |
|
|
| `global.imagePullSecrets` | Global Docker registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
|
|
| `image.registry` | spark image registry | `docker.io` |
|
|
| `image.repository` | spark Image name | `bitnami/spark` |
|
|
| `image.tag` | spark Image tag | `{TAG_NAME}` |
|
|
| `image.pullPolicy` | spark image pull policy | `IfNotPresent` |
|
|
| `image.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
|
|
| `nameOverride` | String to partially override spark.fullname template with a string (will prepend the release name) | `nil` |
|
|
| `fullnameOverride` | String to fully override spark.fullname template with a string | `nil` |
|
|
| `master.debug` | Specify if debug values should be set on the master | `false` |
|
|
| `master.webPort` | Specify the port where the web interface will listen on the master | `8080` |
|
|
| `master.clusterPort` | Specify the port where the master listens to communicate with workers | `7077` |
|
|
| `master.daemonMemoryLimit` | Set the memory limit for the master daemon | No default |
|
|
| `master.configOptions` | Optional configuration if the form `-Dx=y` | No default |
|
|
| `master.securityContext.enabled` | Enable security context | `true` |
|
|
| `master.securityContext.fsGroup` | Group ID for the container | `1001` |
|
|
| `master.securityContext.runAsUser` | User ID for the container | `1001` |
|
|
| `master.nodeSelector` | Node affinity policy | `{}` (The value is evaluated as a template) |
|
|
| `master.tolerations` | Tolerations for pod assignment | `[]` (The value is evaluated as a template) |
|
|
| `master.affinity` | Affinity for pod assignment | `{}` (The value is evaluated as a template) |
|
|
| `master.resources` | CPU/Memory resource requests/limits | `{}` |
|
|
| `master.extraEnvVars` | Extra environment variables to pass to the master container | `{}` |
|
|
| `master.extraVolumes` | Array of extra volumes to be added to the Spark master deployment (evaluated as template). Requires setting `master.extraVolumeMounts` | `nil` |
|
|
| `master.extraVolumeMounts` | Array of extra volume mounts to be added to the Spark master deployment (evaluated as template). Normally used with `master.extraVolumes`. | `nil` |
|
|
| `master.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
|
|
| `master.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | 10 |
|
|
| `master.livenessProbe.periodSeconds` | How often to perform the probe | 10 |
|
|
| `master.livenessProbe.timeoutSeconds` | When the probe times out | 5 |
|
|
| `master.livenessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | 2 |
|
|
| `master.livenessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
|
|
| `master.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
|
|
| `master.readinessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | 5 |
|
|
| `master.readinessProbe.periodSeconds` | How often to perform the probe | 10 |
|
|
| `master.readinessProbe.timeoutSeconds` | When the probe times out | 5 |
|
|
| `master.readinessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | 6 |
|
|
| `master.readinessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
|
|
| `worker.debug` | Specify if debug values should be set on workers | `false` |
|
|
| `worker.webPort` | Specify the port where the web interface will listen on the worker | `8080` |
|
|
| `worker.clusterPort` | Specify the port where the worker listens to communicate with the master | `7077` |
|
|
| `worker.daemonMemoryLimit` | Set the memory limit for the worker daemon | No default |
|
|
| `worker.memoryLimit` | Set the maximum memory the worker is allowed to use | No default |
|
|
| `worker.coreLimit` | Se the maximum number of cores that the worker can use | No default |
|
|
| `worker.dir` | Set a custom working directory for the application | No default |
|
|
| `worker.javaOptions` | Set options for the JVM in the form `-Dx=y` | No default |
|
|
| `worker.configOptions` | Set extra options to configure the worker in the form `-Dx=y` | No default |
|
|
| `worker.replicaCount` | Set the number of workers | `2` |
|
|
| `worker.autoscaling.enabled` | Enable autoscaling depending on CPU | `false` |
|
|
| `worker.autoscaling.CpuTargetPercentage` | k8s hpa cpu targetPercentage | `50` |
|
|
| `worker.autoscaling.replicasMax` | Maximum number of workers when using autoscaling | `5` |
|
|
| `worker.securityContext.enabled` | Enable security context | `true` |
|
|
| `worker.securityContext.fsGroup` | Group ID for the container | `1001` |
|
|
| `worker.securityContext.runAsUser` | User ID for the container | `1001` |
|
|
| `worker.nodeSelector` | Node labels for pod assignment. Used as a template from the values. | `{}` |
|
|
| `worker.tolerations` | Toleration labels for pod assignment | `[]` |
|
|
| `worker.affinity` | Affinity and AntiAffinity rules for pod assignment | `{}` |
|
|
| `worker.resources` | CPU/Memory resource requests/limits | Memory: `256Mi`, CPU: `250m` |
|
|
| `worker.livenessProbe.enabled` | Turn on and off liveness probe | `true` |
|
|
| `worker.livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | 10 |
|
|
| `worker.livenessProbe.periodSeconds` | How often to perform the probe | 10 |
|
|
| `worker.livenessProbe.timeoutSeconds` | When the probe times out | 5 |
|
|
| `worker.livenessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | 2 |
|
|
| `worker.livenessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
|
|
| `worker.readinessProbe.enabled` | Turn on and off readiness probe | `true` |
|
|
| `worker.readinessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | 5 |
|
|
| `worker.readinessProbe.periodSeconds` | How often to perform the probe | 10 |
|
|
| `worker.readinessProbe.timeoutSeconds` | When the probe times out | 5 |
|
|
| `worker.readinessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | 6 |
|
|
| `worker.readinessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
|
|
| `master.extraEnvVars` | Extra environment variables to pass to the worker container | `{}` |
|
|
| `worker.extraVolumes` | Array of extra volumes to be added to the Spark worker deployment (evaluated as template). Requires setting `worker.extraVolumeMounts` | `nil` |
|
|
| `worker.extraVolumeMounts` | Array of extra volume mounts to be added to the Spark worker deployment (evaluated as template). Normally used with `worker.extraVolumes`. | `nil` |
|
|
| `security.passwordsSecretName` | Secret to use when using security configuration to set custom passwords | No default |
|
|
| `security.rpc.authenticationEnabled` | Enable the RPC authentication | `false` |
|
|
| `security.rpc.encryptionEnabled` | Enable the encryption for RPC | `false` |
|
|
| `security.storageEncryptionEnabled` | Enable the encryption of the storage | `false` |
|
|
| `security.ssl.enabled` | Enable the SSL configuration | `false` |
|
|
| `security.ssl.needClientAuth` | Enable the client authentication | `false` |
|
|
| `security.ssl.protocol` | Set the SSL protocol | `TLSv1.2` |
|
|
| `security.certificatesSecretName` | Set the name of the secret that contains the certificates | No default |
|
|
| `service.type` | Kubernetes Service type | `ClusterIP` |
|
|
| `service.webPort` | Spark client port | `80` |
|
|
| `service.clusterPort` | Spark cluster port | `7077` |
|
|
| `service.nodePort` | Port to bind to for NodePort service type (client port) | `nil` |
|
|
| `service.nodePorts.cluster` | Kubernetes cluster node port | `""` |
|
|
| `service.nodePorts.web` | Kubernetes web node port | `""` |
|
|
| `service.annotations` | Annotations for spark service | {} |
|
|
| `service.loadBalancerIP` | loadBalancerIP if spark service type is `LoadBalancer` | `nil` |
|
|
| `ingress.enabled` | Enable the use of the ingress controller to access the web UI | `false` |
|
|
| `ingress.certManager` | Add annotations for cert-manager | `false` |
|
|
| `ingress.annotations` | Ingress annotations | `{}` |
|
|
| `ingress.hosts[0].name` | Hostname to your Spark installation | `spark.local` |
|
|
| `ingress.hosts[0].path` | Path within the url structure | `/` |
|
|
| `ingress.hosts[0].tls` | Utilize TLS backend in ingress | `false` |
|
|
| `ingress.hosts[0].tlsHosts` | Array of TLS hosts for ingress record (defaults to `ingress.hosts[0].name` if `nil`) | `nil` |
|
|
| `ingress.hosts[0].tlsSecret` | TLS Secret (certificates) | `spark.local-tls` |
|
|
|
|
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
|
|
|
|
```console
|
|
$ helm install my-release \
|
|
--set master.webPort=8081 bitnami/spark
|
|
```
|
|
|
|
The above command sets the spark master web port to `8081`.
|
|
|
|
Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,
|
|
|
|
```console
|
|
$ helm install my-release -f values.yaml bitnami/spark
|
|
```
|
|
|
|
> **Tip**: You can use the default [values.yaml](values.yaml)
|
|
|
|
## Configuration and installation details
|
|
|
|
### [Rolling VS Immutable tags](https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/)
|
|
|
|
It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
|
|
|
|
Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
|
|
|
|
### Production configuration
|
|
|
|
This chart includes a `values-production.yaml` file where you can find some parameters oriented to production configuration in comparison to the regular `values.yaml`. You can use this file instead of the default one.
|
|
|
|
- Enable ingress controller
|
|
```diff
|
|
- ingress.enabled: false
|
|
+ ingress.enabled: true
|
|
```
|
|
|
|
- Enable RPC authentication and encryption:
|
|
```diff
|
|
- security.rpc.authenticationEnabled: false
|
|
- security.rpc.encryptionEnabled: false
|
|
+ security.rpc.authenticationEnabled: true
|
|
+ security.rpc.encryptionEnabled: true
|
|
```
|
|
|
|
- Enable storage encryption:
|
|
```diff
|
|
- security.storageEncryptionEnabled: false
|
|
+ security.storageEncryptionEnabled: true
|
|
```
|
|
|
|
- Configure SSL parameters:
|
|
```diff
|
|
- security.ssl.enabled: false
|
|
- security.ssl.needClientAuth: false
|
|
+ security.ssl.enabled: true
|
|
+ security.ssl.needClientAuth: true
|
|
```
|
|
|
|
- Set a secret name for passwords:
|
|
```diff
|
|
+ security.passwordsSecretName: my-passwords-secret
|
|
```
|
|
|
|
- Set a secret name for certificates:
|
|
```diff
|
|
+ security.certificatesSecretName: my-certificates-secret
|
|
```
|
|
|
|
- Enable autoscaling depending on CPU:
|
|
```diff
|
|
- worker.autoscaling.enabled: false
|
|
- worker.autoscaling.replicasMax: 5
|
|
+ worker.autoscaling.enabled: true
|
|
+ worker.autoscaling.replicasMax: 10
|
|
```
|
|
|
|
### Using custom configuration
|
|
|
|
To use a custom configuration a ConfigMap should be created with the `spark-env.sh` file inside the ConfigMap. The ConfigMap name must be provided at deployment time, to set the configuration on the master use: ` master.configurationConfigMap=configMapName`
|
|
|
|
To set the configuration on the worker use: `worker.configurationConfigMap=configMapName`
|
|
|
|
It can be set both at the same time with the same ConfigMap or using two ConfigMaps.
|
|
Also, you can provide in the ConfigMap a `spark-defaults.conf` file.
|
|
You can use both files without the other.
|
|
|
|
### Submit an application
|
|
|
|
To submit an application to the cluster use the `spark-submit` script. You can obtain the script [here](https://github.com/apache/spark/tree/master/bin). For example, to deploy one of the example applications:
|
|
```bash
|
|
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://<master-IP>:<master-cluster-port> --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.4.3.jar 1000
|
|
```
|
|
|
|
Where the master IP and port must be changed by you master IP address and port.
|
|
> Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue [here](https://issues.apache.org/jira/browse/SPARK-25078).
|
|
|
|
### Enable security for spark
|
|
|
|
#### Configure ssl communication
|
|
|
|
In order to enable secure transport between workers and master deploy the helm chart with this options: `ssl.enabled=true`
|
|
|
|
#### How to create the certificates secret
|
|
|
|
It is needed to create two secrets to set the passwords and certificates. The name of the two secrets should be configured on `security.passwordsSecretName` and `security.certificatesSecretName`. To generate certificates for testing purpose you can use [this script](https://raw.githubusercontent.com/confluentinc/confluent-platform-security-tools/master/kafka-generate-ssl.sh).
|
|
Into the certificates secret, the keys must be `spark-keystore.jks` and `spark-truststore.jks`, and the content must be text on JKS format.
|
|
To generate the certificates secret, first it is needed to generate the two certificates and rename them as `spark-keystore.jks` and `spark-truststore.jks`.
|
|
Once the certificates are created, you can create the secret having the file names as keys.
|
|
|
|
The second secret, the secret for passwords should have four keys: `rpc-authentication-secret`, `ssl-key-password`, `ssl-keystore-password` and `ssl-truststore-password`.
|
|
|
|
Now that the two secrets are created, deploy the chart enabling security configuration and setting the name for the certificates secret (`my-secret` in this case) at the `security.certificatesSecretName` and setting the name for the passwords secret (`my-passwords-secret` in this case) at `security.passwordsSecretName`.
|
|
|
|
To deploy chart, use the following parameters:
|
|
```bash
|
|
security.certificatesSecretName=my-secret
|
|
security.passwordsSecretName=my-passwords-secret
|
|
security.rpc.authenticationEnabled=true
|
|
security.rpc.encryptionEnabled=true
|
|
security.storageEncrytionEnabled=true
|
|
security.ssl.enabled=true
|
|
security.ssl.needClientAuth=true
|
|
```
|
|
|
|
> Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue [here](https://issues.apache.org/jira/browse/SPARK-25078).
|