mirror of
https://github.com/bitnami/charts.git
synced 2026-03-11 07:17:45 +08:00
T40023 Updated Spark README (#6614)
Co-authored-by: Vikram Vaswani <vvaswani@vmware.com>
This commit is contained in:
@@ -22,4 +22,4 @@ name: spark
|
||||
sources:
|
||||
- https://github.com/bitnami/bitnami-docker-spark
|
||||
- https://spark.apache.org/
|
||||
version: 5.4.4
|
||||
version: 5.4.5
|
||||
|
||||
@@ -11,7 +11,7 @@ $ helm install my-release bitnami/spark
|
||||
|
||||
## Introduction
|
||||
|
||||
This chart bootstraps a [spark](https://github.com/bitnami/bitnami-docker-spark) deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
|
||||
This chart bootstraps an [Apache Spark](https://github.com/bitnami/bitnami-docker-spark) deployment on a [Kubernetes](http://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
|
||||
|
||||
Bitnami charts can be used with [Kubeapps](https://kubeapps.com/) for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of [Bitnami Kubernetes Production Runtime](https://kubeprod.io/) (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.
|
||||
|
||||
@@ -29,7 +29,7 @@ $ helm repo add bitnami https://charts.bitnami.com/bitnami
|
||||
$ helm install my-release bitnami/spark
|
||||
```
|
||||
|
||||
These commands deploy Spark on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) section lists the parameters that can be configured during installation.
|
||||
These commands deploy Apache Spark on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) section lists the parameters that can be configured during installation.
|
||||
|
||||
> **Tip**: List all releases using `helm list`
|
||||
|
||||
@@ -45,7 +45,7 @@ The command removes all the Kubernetes components associated with the chart and
|
||||
|
||||
## Parameters
|
||||
|
||||
The following tables lists the configurable parameters of the spark chart and their default values.
|
||||
The following tables lists the configurable parameters of the Apache Spark chart and their default values.
|
||||
|
||||
### Global parameters
|
||||
|
||||
@@ -250,67 +250,57 @@ $ helm install my-release -f values.yaml bitnami/spark
|
||||
|
||||
## Configuration and installation details
|
||||
|
||||
### [Rolling VS Immutable tags](https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/)
|
||||
### [Rolling vs Immutable tags](https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/)
|
||||
|
||||
It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
|
||||
|
||||
Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
|
||||
|
||||
### Using custom configuration
|
||||
### Define custom configuration
|
||||
|
||||
To use a custom configuration a ConfigMap should be created with the `spark-env.sh` file inside the ConfigMap. The ConfigMap name must be provided at deployment time, to set the configuration on the master use: ` master.configurationConfigMap=configMapName`
|
||||
To use a custom configuration, a ConfigMap should be created with the `spark-env.sh` file inside the ConfigMap. The ConfigMap name must be provided at deployment time.
|
||||
|
||||
To set the configuration on the worker use: `worker.configurationConfigMap=configMapName`
|
||||
To set the configuration on the master use `master.configurationConfigMap=configMapName`. To set the configuration on the worker, use `worker.configurationConfigMap=configMapName`.
|
||||
|
||||
It can be set both at the same time with the same ConfigMap or using two ConfigMaps.
|
||||
Also, you can provide in the ConfigMap a `spark-defaults.conf` file.
|
||||
You can use both files without the other.
|
||||
These values can be set at the same time in a single ConfigMap or using two ConfigMaps. An additional `spark-defaults.conf` file can be provided in the ConfigMap. You can use both files or one without the other.
|
||||
|
||||
### Submit an application
|
||||
|
||||
To submit an application to the cluster use the `spark-submit` script. You can obtain the script [here](https://github.com/apache/spark/tree/master/bin). For example, to deploy one of the example applications:
|
||||
To submit an application to the Apache Spark cluster, use the `spark-submit` script, which is available at [https://github.com/apache/spark/tree/master/bin](https://github.com/apache/spark/tree/master/bin).
|
||||
|
||||
The command below illustrates the process of deploying one of the sample applications included with Apache Spark. Replace the MASTER-IP-ADDRESS and MASTER-PORT placeholders with the correct master IP address and port for your deployment.
|
||||
|
||||
```bash
|
||||
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://<master-IP>:<master-cluster-port> --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.4.3.jar 1000
|
||||
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://MASTER-IP-ADDRESS:MASTER-PORT --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.4.3.jar 1000
|
||||
```
|
||||
|
||||
Where the master IP and port must be changed by you master IP address and port.
|
||||
> Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue [here](https://issues.apache.org/jira/browse/SPARK-25078).
|
||||
For a complete walkthrough of the process using a custom application, refer to the [detailed Apache Spark tutorial](https://docs.bitnami.com/tutorials/process-data-spark-kubernetes/).
|
||||
|
||||
### Enable security for spark
|
||||
> Be aware that it is currently not possible to submit an application to a standalone cluster if RPC authentication is configured. [Learn more about the issue](https://issues.apache.org/jira/browse/SPARK-25078).
|
||||
|
||||
#### Configure ssl communication
|
||||
### Configure security for Apache Spark
|
||||
|
||||
In order to enable secure transport between workers and master deploy the helm chart with this options: `ssl.enabled=true`
|
||||
### Configure SSL communication
|
||||
|
||||
#### How to create the certificates secret
|
||||
In order to enable secure transport between workers and master, deploy the Helm chart with the `ssl.enabled=true` chart parameter.
|
||||
|
||||
It is needed to create two secrets to set the passwords and certificates. The name of the two secrets should be configured on `security.passwordsSecretName` and `security.certificatesSecretName`. To generate certificates for testing purpose you can use [this script](https://raw.githubusercontent.com/confluentinc/confluent-platform-security-tools/master/kafka-generate-ssl.sh).
|
||||
Into the certificates secret, the keys must be `spark-keystore.jks` and `spark-truststore.jks`, and the content must be text on JKS format.
|
||||
To generate the certificates secret, first it is needed to generate the two certificates and rename them as `spark-keystore.jks` and `spark-truststore.jks`.
|
||||
Once the certificates are created, you can create the secret having the file names as keys.
|
||||
### Create certificate and password secrets
|
||||
|
||||
The second secret, the secret for passwords should have four keys: `rpc-authentication-secret`, `ssl-key-password`, `ssl-keystore-password` and `ssl-truststore-password`.
|
||||
It is necessary to create two secrets for the passwords and certificates. The names of the two secrets should be configured using the `security.passwordsSecretName` and `security.certificatesSecretName` chart parameters.
|
||||
|
||||
Now that the two secrets are created, deploy the chart enabling security configuration and setting the name for the certificates secret (`my-secret` in this case) at the `security.certificatesSecretName` and setting the name for the passwords secret (`my-passwords-secret` in this case) at `security.passwordsSecretName`.
|
||||
The keys for the certificate secret must be named `spark-keystore.jks` and `spark-truststore.jks`, and the content must be text in JKS format. Use [this script to generate certificates](https://raw.githubusercontent.com/confluentinc/confluent-platform-security-tools/master/kafka-generate-ssl.sh) for test purposes if required.
|
||||
|
||||
To deploy chart, use the following parameters:
|
||||
```bash
|
||||
security.certificatesSecretName=my-secret
|
||||
security.passwordsSecretName=my-passwords-secret
|
||||
security.rpc.authenticationEnabled=true
|
||||
security.rpc.encryptionEnabled=true
|
||||
security.storageEncrytionEnabled=true
|
||||
security.ssl.enabled=true
|
||||
security.ssl.needClientAuth=true
|
||||
```
|
||||
The secret for passwords should have four keys: `rpc-authentication-secret`, `ssl-key-password`, `ssl-keystore-password` and `ssl-truststore-password`.
|
||||
|
||||
> Be aware that currently is not possible to submit an application to a standalone cluster if RPC authentication is configured. More info about the issue [here](https://issues.apache.org/jira/browse/SPARK-25078).
|
||||
Refer to the [chart documentation for more details on configuring security and an example](https://docs.bitnami.com/kubernetes/infrastructure/spark/administration/configure-security/).
|
||||
|
||||
### Setting Pod's affinity
|
||||
> It is currently not possible to submit an application to a standalone cluster if RPC authentication is configured. [Learn more about this issue](https://issues.apache.org/jira/browse/SPARK-25078).
|
||||
|
||||
This chart allows you to set your custom affinity using the `XXX.affinity` parameter(s). Find more information about Pod's affinity in the [kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
|
||||
### Set Pod affinity
|
||||
|
||||
As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the [bitnami/common](https://github.com/bitnami/charts/tree/master/bitnami/common#affinities) chart. To do so, set the `XXX.podAffinityPreset`, `XXX.podAntiAffinityPreset`, or `XXX.nodeAffinityPreset` parameters.
|
||||
This chart allows you to set your custom affinity using the `XXX.affinity` parameter(s). Find more information about Pod affinity in the [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
|
||||
|
||||
As an alternative, you can use the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the [bitnami/common](https://github.com/bitnami/charts/tree/master/bitnami/common#affinities) chart. To do so, set the `XXX.podAffinityPreset`, `XXX.podAntiAffinityPreset`, or `XXX.nodeAffinityPreset` parameters.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@@ -324,26 +314,9 @@ This version standardizes the way of defining Ingress rules. When configuring a
|
||||
|
||||
### To 4.0.0
|
||||
|
||||
[On November 13, 2020, Helm v2 support was formally finished](https://github.com/helm/charts#status-of-the-project), this major version is the result of the required changes applied to the Helm Chart to be able to incorporate the different features added in Helm v3 and to be consistent with the Helm project itself regarding the Helm v2 EOL.
|
||||
[On November 13, 2020, Helm v2 support formally ended](https://github.com/helm/charts#status-of-the-project). This major version is the result of the required changes applied to the Helm Chart to be able to incorporate the different features added in Helm v3 and to be consistent with the Helm project itself regarding the Helm v2 EOL.
|
||||
|
||||
**What changes were introduced in this major version?**
|
||||
|
||||
- Previous versions of this Helm Chart use `apiVersion: v1` (installable by both Helm 2 and 3), this Helm Chart was updated to `apiVersion: v2` (installable by Helm 3 only). [Here](https://helm.sh/docs/topics/charts/#the-apiversion-field) you can find more information about the `apiVersion` field.
|
||||
- Move dependency information from the *requirements.yaml* to the *Chart.yaml*
|
||||
- After running `helm dependency update`, a *Chart.lock* file is generated containing the same structure used in the previous *requirements.lock*
|
||||
- The different fields present in the *Chart.yaml* file has been ordered alphabetically in a homogeneous way for all the Bitnami Helm Charts
|
||||
|
||||
**Considerations when upgrading to this version**
|
||||
|
||||
- If you want to upgrade to this version from a previous one installed with Helm v3, you shouldn't face any issues
|
||||
- If you want to upgrade to this version using Helm v2, this scenario is not supported as this version doesn't support Helm v2 anymore
|
||||
- If you installed the previous version with Helm v2 and wants to upgrade to this version with Helm v3, please refer to the [official Helm documentation](https://helm.sh/docs/topics/v2_v3_migration/#migration-use-cases) about migrating from Helm v2 to v3
|
||||
|
||||
**Useful links**
|
||||
|
||||
- https://docs.bitnami.com/tutorials/resolve-helm2-helm3-post-migration-issues/
|
||||
- https://helm.sh/docs/topics/v2_v3_migration/
|
||||
- https://helm.sh/blog/migrate-from-helm-v2-to-helm-v3/
|
||||
[Learn more about this change and related upgrade considerations](https://docs.bitnami.com/kubernetes/infrastructure/spark/administration/upgrade-helm3/).
|
||||
|
||||
### To 3.0.0
|
||||
|
||||
|
||||
Reference in New Issue
Block a user