[bitnami/spark] Adds support for Spark metrics (#4605)

* Initial metrics support * Adds different annotations for master and worker * Adds support for prometheus-operator, and update the values-production and README * Fix indentation in values.yaml and values-production.yaml * Fixes annotations, and use PodMonitor instead of ServiceMonitor * Renames servicemonitor to podmonitor Co-authored-by: rafael <rafael@bitnami.com>
2026-03-07 08:07:55 +08:00 · 2020-12-04 15:41:47 +01:00
parent 129c744a38
commit 53bec24c61
8 changed files with 262 additions and 11 deletions
--- a/bitnami/spark/Chart.yaml
+++ b/bitnami/spark/Chart.yaml
@@ -21,4 +21,4 @@ name: spark
 sources:
  - https://github.com/bitnami/bitnami-docker-spark
  - https://spark.apache.org/
-version: 4.1.0
+version: 4.2.0
--- a/bitnami/spark/README.md
+++ b/bitnami/spark/README.md
@@ -47,17 +47,35 @@ The command removes all the Kubernetes components associated with the chart and

 The following tables lists the configurable parameters of the spark chart and their default values.

+
+### Global parameters
+
 | Parameter                                   | Description                                                                                                                                | Default                                                 |
 |---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `global.imageRegistry`                      | Global Docker image registry                                                                                                               | `nil`                                                   |
 | `global.imagePullSecrets`                   | Global Docker registry secret names as an array                                                                                            | `[]` (does not add image pull secrets to deployed pods) |
+
+### Common paramters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
+| `nameOverride`                              | String to partially override common.names.fullname template with a string (will prepend the release name)                                  | `nil`                                                   |
+| `fullnameOverride`                          | String to fully override common.names.fullname template with a string                                                                      | `nil`                                                   |
+
+### Spark parameters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `image.registry`                            | spark image registry                                                                                                                       | `docker.io`                                             |
 | `image.repository`                          | spark Image name                                                                                                                           | `bitnami/spark`                                         |
 | `image.tag`                                 | spark Image tag                                                                                                                            | `{TAG_NAME}`                                            |
 | `image.pullPolicy`                          | spark image pull policy                                                                                                                    | `IfNotPresent`                                          |
 | `image.pullSecrets`                         | Specify docker-registry secret names as an array                                                                                           | `[]` (does not add image pull secrets to deployed pods) |
-| `nameOverride`                              | String to partially override common.names.fullname template with a string (will prepend the release name)                                  | `nil`                                                   |
-| `fullnameOverride`                          | String to fully override common.names.fullname template with a string                                                                      | `nil`                                                   |
+
+### Spark master parameters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `master.debug`                              | Specify if debug values should be set on the master                                                                                        | `false`                                                 |
 | `master.webPort`                            | Specify the port where the web interface will listen on the master                                                                         | `8080`                                                  |
 | `master.clusterPort`                        | Specify the port where the master listens to communicate with workers                                                                      | `7077`                                                  |
@@ -87,6 +105,11 @@ The following tables lists the configurable parameters of the spark chart and th
 | `master.readinessProbe.timeoutSeconds`      | When the probe times out                                                                                                                   | 5                                                       |
 | `master.readinessProbe.failureThreshold`    | Minimum consecutive failures for the probe to be considered failed after having succeeded.                                                 | 6                                                       |
 | `master.readinessProbe.successThreshold`    | Minimum consecutive successes for the probe to be considered successful after having failed                                                | 1                                                       |
+
+### Spark worker parameters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `worker.debug`                              | Specify if debug values should be set on workers                                                                                           | `false`                                                 |
 | `worker.webPort`                            | Specify the port where the web interface will listen on the worker                                                                         | `8080`                                                  |
 | `worker.clusterPort`                        | Specify the port where the worker listens to communicate with the master                                                                   | `7077`                                                  |
@@ -124,6 +147,11 @@ The following tables lists the configurable parameters of the spark chart and th
 | `master.extraEnvVars`                       | Extra environment variables to pass to the worker container                                                                                | `{}`                                                    |
 | `worker.extraVolumes`                       | Array of extra volumes to be added to the Spark worker deployment (evaluated as template). Requires setting `worker.extraVolumeMounts`     | `nil`                                                   |
 | `worker.extraVolumeMounts`                  | Array of extra volume mounts to be added to the Spark worker deployment (evaluated as template). Normally used with `worker.extraVolumes`. | `nil`                                                   |
+
+### Security paramters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `security.passwordsSecretName`              | Secret to use when using security configuration to set custom passwords                                                                    | No default                                              |
 | `security.rpc.authenticationEnabled`        | Enable the RPC authentication                                                                                                              | `false`                                                 |
 | `security.rpc.encryptionEnabled`            | Enable the encryption for RPC                                                                                                              | `false`                                                 |
@@ -132,6 +160,11 @@ The following tables lists the configurable parameters of the spark chart and th
 | `security.ssl.needClientAuth`               | Enable the client authentication                                                                                                           | `false`                                                 |
 | `security.ssl.protocol`                     | Set the SSL protocol                                                                                                                       | `TLSv1.2`                                               |
 | `security.certificatesSecretName`           | Set the name of the secret that contains the certificates                                                                                  | No default                                              |
+
+### Exposure parameters
+
+| Parameter                                   | Description                                                                                                                                | Default                                                 |
+|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `service.type`                              | Kubernetes Service type                                                                                                                    | `ClusterIP`                                             |
 | `service.webPort`                           | Spark client port                                                                                                                          | `80`                                                    |
 | `service.clusterPort`                       | Spark cluster port                                                                                                                         | `7077`                                                  |
@@ -149,6 +182,26 @@ The following tables lists the configurable parameters of the spark chart and th
 | `ingress.hosts[0].tlsHosts`                 | Array of TLS hosts for ingress record (defaults to `ingress.hosts[0].name` if `nil`)                                                       | `nil`                                                   |
 | `ingress.hosts[0].tlsSecret`                | TLS Secret (certificates)                                                                                                                  | `spark.local-tls`                                       |

+### Metrics parameters
+
+| Parameter                                 | Description                                                                                                                                  | Default                                                      |
+|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
+| `metrics.enabled`                         | Start a side-car prometheus exporter                                                                                                         | `false`                                                      |
+| `metrics.service.port`                    | Service Metrics port                                                                                                                         | `9117`                                                       |
+| `metrics.service.annotations`             | Annotations for enabling prometheus to access the metrics endpoints                                                                          | `{prometheus.io/scrape: "true", prometheus.io/port: "9117"}` |
+| `metrics.resources.limits`                | The resources limits for the metrics exporter container                                                                                      | `{}`                                                         |
+| `metrics.resources.requests`              | The requested resources for the metrics exporter container                                                                                   | `{}`                                                         |
+| `metrics.podMonitor.enabled`          | Create PodMonitor Resource for scraping metrics using PrometheusOperator                                                                 | `false`                                                      |
+| `metrics.podMonitor.namespace`        | Namespace where podmonitor resource should be created                                                                                    | `nil`                                                        |
+| `metrics.podMonitor.interval`         | Specify the interval at which metrics should be scraped                                                                                      | `30s`                                                        |
+| `metrics.podMonitor.scrapeTimeout`    | Specify the timeout after which the scrape is ended                                                                                          | `nil`                                                        |
+| `metrics.masterAnnotations`               | Annotations for enabling prometheus to access the metrics endpoint of the master nodes                                                       | `{prometheus.io/scrape: "true", prometheus.io/port: "8080"}` |
+| `metrics.workerAnnotations`               | Annotations for enabling prometheus to access the metrics endpoint of the worker nodes                                                       | `{prometheus.io/scrape: "true", prometheus.io/port: "8081"}` |
+| `metrics.prometheusRule.enabled`          | Set this to true to create prometheusRules for Prometheus                                                                                    | `false`                                                      |
+| `metrics.prometheusRule.additionalLabels` | Additional labels that can be used so prometheusRules will be discovered by Prometheus                                                       | `{}`                                                         |
+| `metrics.prometheusRule.namespace`        | namespace where prometheusRules resource should be created                                                                                   | the same namespace as spark                                  |
+| `metrics.prometheusRule.rules`            | [rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) to be created, check values for an example.              | `[]`                                                         |
+
 Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,

 ```console
--- a/bitnami/spark/templates/podmonitor.yaml
+++ b/bitnami/spark/templates/podmonitor.yaml
@@ -0,0 +1,29 @@
+{{- if and .Values.metrics.enabled .Values.metrics.podMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PodMonitor
+metadata:
+  name: {{ include "common.names.fullname" . }}
+  {{- if .Values.metrics.podMonitor.namespace }}
+  namespace: {{ .Values.metrics.podMonitor.namespace }}
+  {{- end }}
+  labels:
+    {{- include "common.labels.standard" . | nindent 4 }}
+    app.kubernetes.io/component: metrics
+    {{- if .Values.metrics.podMonitor.additionalLabels }}
+    {{- toYaml .Values.metrics.podMonitor.additionalLabels | nindent 4 }}
+    {{- end }}
+spec:
+  podMetricsEndpoints:
+    - port: http
+      {{- if .Values.metrics.podMonitor.interval }}
+      interval: {{ .Values.metrics.podMonitor.interval }}
+      {{- end }}
+      {{- if .Values.metrics.podMonitor.scrapeTimeout }}
+      scrapeTimeout: {{ .Values.metrics.podMonitor.scrapeTimeout }}
+      {{- end }}
+  namespaceSelector:
+    matchNames:
+      - {{ .Release.Namespace }}
+  selector:
+    matchLabels: {{- include "common.labels.matchLabels" . | nindent 6 }}
+{{- end }}
--- a/bitnami/spark/templates/prometheusrule.yaml
+++ b/bitnami/spark/templates/prometheusrule.yaml
@@ -0,0 +1,23 @@
+{{- if and .Values.metrics.enabled .Values.metrics.prometheusRule.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: {{ include "common.names.fullname" . }}
+{{- with .Values.metrics.prometheusRule.namespace }}
+  namespace: {{ . }}
+{{- end }}
+  labels:
+  {{- include "common.labels.standard" . | nindent 4 }}
+  {{- with .Values.metrics.prometheusRule.additionalLabels }}
+  {{- toYaml . | nindent 4 }}
+  {{- end }}
+  {{- if .Values.commonAnnotations }}
+  annotations: {{- include "common.tplvalues.render" ( dict "value" .Values.commonAnnotations "context" $ ) | nindent 4 }}
+  {{- end }}
+spec:
+{{- with .Values.metrics.prometheusRule.rules }}
+  groups:
+    - name: {{ include "common.names.fullname" . }}
+      rules: {{ tpl (toYaml .) $ | nindent 8 }}
+{{- end }}
+{{- end }}
--- a/bitnami/spark/templates/statefulset-master.yaml
+++ b/bitnami/spark/templates/statefulset-master.yaml
@@ -17,8 +17,14 @@ spec:
        {{- if .Values.master.extraPodLabels }}
          {{- include "common.tplvalues.render" (dict "value" .Values.master.extraPodLabels "context" $) | nindent 8 }}
        {{- end }}
+      {{- if or .Values.master.podAnnotations .Values.metrics.enabled }}
+      annotations:
      {{- if .Values.master.podAnnotations }}
-      annotations: {{- include "common.tplvalues.render" (dict "value" .Values.master.podAnnotations "context" $) | nindent 8 }}
+        {{- include "common.tplvalues.render" (dict "value" .Values.master.podAnnotations "context" $) | nindent 8 }}
+      {{- end }}
+      {{- if and .Values.metrics.enabled }}
+        {{- include "common.tplvalues.render" ( dict "value" .Values.metrics.masterAnnotations "context" $) | nindent 8 }}
+      {{- end }}
      {{- end }}
    spec:
 {{- include "spark.imagePullSecrets" . | nindent 6 }}
@@ -66,6 +72,10 @@ spec:
            - name: BASH_DEBUG
              value: {{ ternary "1" "0" .Values.image.debug | quote }}
            {{- end }}
+            {{- if .Values.metrics.enabled }}
+            - name: SPARK_METRICS_ENABLED
+              value: "true"
+            {{- end}}
            - name: SPARK_DAEMON_MEMORY
              value: {{ .Values.master.daemonMemoryLimit | quote }}
            {{- if .Values.master.clusterPort }}
--- a/bitnami/spark/templates/statefulset-worker.yaml
+++ b/bitnami/spark/templates/statefulset-worker.yaml
@@ -17,8 +17,14 @@ spec:
        {{- if .Values.worker.extraPodLabels }}
          {{- include "common.tplvalues.render" (dict "value" .Values.worker.extraPodLabels "context" $) | nindent 8 }}
        {{- end }}
+      {{- if or .Values.worker.podAnnotations .Values.metrics.enabled }}
+      annotations:
      {{- if .Values.worker.podAnnotations }}
-      annotations: {{- include "common.tplvalues.render" (dict "value" .Values.worker.podAnnotations "context" $) | nindent 8 }}
+        {{- include "common.tplvalues.render" (dict "value" .Values.worker.podAnnotations "context" $) | nindent 8 }}
+      {{- end }}
+      {{- if and .Values.metrics.enabled }}
+        {{- include "common.tplvalues.render" ( dict "value" .Values.metrics.workerAnnotations "context" $) | nindent 8 }}
+      {{- end }}
      {{- end }}
    spec:
 {{- include "spark.imagePullSecrets" . | nindent 6 }}
@@ -68,6 +74,10 @@ spec:
              value: {{ ternary "1" "0" .Values.image.debug | quote }}
            - name: SPARK_DAEMON_MEMORY
              value: {{ .Values.worker.daemonMemoryLimit | quote }}
+            {{- if .Values.metrics.enabled }}
+            - name: SPARK_METRICS_ENABLED
+              value: "true"
+            {{- end}}
            ## There are some environment variables whose existence needs
            ## to be checked because Spark checks if they are null instead of an
            ## empty string
--- a/bitnami/spark/values-production.yaml
+++ b/bitnami/spark/values-production.yaml
@@ -259,6 +259,69 @@ worker:
    ## Max number of workers when using autoscaling
    replicasMax: 10

+## Metrics configuration
+##
+metrics:
+  enabled: enabled
+
+  ## Prometheus metrics service parameters
+  ##
+  service:
+    ## Metrics port
+    ##
+    port: 9117
+    ## Annotations for the Prometheus metics service
+    ##
+    annotations:
+      prometheus.io/scrape: "true"
+      prometheus.io/port: "{{ .Values.metrics.service.port }}"
+
+  ## Prometheus Service Monitor
+  ## ref: https://github.com/coreos/prometheus-operator
+  ##      https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
+  ##
+  podMonitor:
+    ## If the operator is installed in your cluster, set to true to create a Service Monitor Entry
+    ##
+    enabled: false
+    ## Specify the namespace in which the podMonitor resource will be created
+    # namespace: ""
+    ## Specify the interval at which metrics should be scraped
+    ##
+    # interval: 30s
+    ## Specify the timeout after which the scrape is ended
+    # scrapeTimeout: 10s
+
+  masterAnnotations:
+    prometheus.io/scrape: 'true'
+    prometheus.io/port: "{{ .Values.master.webPort }}"
+
+  workerAnnotations:
+    prometheus.io/scrape: 'true'
+    prometheus.io/port: "{{ .Values.worker.webPort }}"
+
+  ## Custom PrometheusRule to be defined
+  ## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
+  ## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
+  ##
+  prometheusRule:
+    enabled: false
+    additionalLabels: {}
+    namespace: ''
+    ## These are just examples rules, please adapt them to your needs.
+    ## Make sure to constraint the rules to the current postgresql service.
+    ## rules:
+    ##   - alert: HugeReplicationLag
+    ##     expr: pg_replication_lag{service="{{ template "postgresql.fullname" . }}-metrics"} / 3600 > 1
+    ##     for: 1m
+    ##     labels:
+    ##       severity: critical
+    ##     annotations:
+    ##       description: replication for {{ template "postgresql.fullname" . }} PostgreSQL is lagging by {{ "{{ $value }}" }} hour(s).
+    ##       summary: PostgreSQL replication is lagging by {{ "{{ $value }}" }} hour(s).
+    ##
+    rules: []
+
 ## Security configuration
 ##
 security:
@@ -276,6 +339,11 @@ security:
  ##
  storageEncryptionEnabled: true

+  ## Name of the secret that contains the certificates
+  ## It should contains two keys called "spark-keystore.jks" and "spark-truststore.jks" with the files in JKS format.
+  ##
+  certificatesSecretName: my-certificates-secret
+
  ## SSL configuration
  ##
  ssl:
@@ -283,11 +351,6 @@ security:
    needClientAuth: true
    protocol: TLSv1.2

-  ## Name of the secret that contains the certificates
-  ## It should contains two keys called "spark-keystore.jks" and "spark-truststore.jks" with the files in JKS format.
-  ##
-  certificatesSecretName: my-certificates-secret
-
 ## Service parameters
 ##
 service:
--- a/bitnami/spark/values.yaml
+++ b/bitnami/spark/values.yaml
@@ -259,6 +259,69 @@ worker:
    ## Max number of workers when using autoscaling
    replicasMax: 5

+## Metrics configuration
+##
+metrics:
+  enabled: false
+
+  ## Prometheus metrics service parameters
+  ##
+  service:
+    ## Metrics port
+    ##
+    port: 9117
+    ## Annotations for the Prometheus metics service
+    ##
+    annotations:
+      prometheus.io/scrape: "true"
+      prometheus.io/port: "{{ .Values.metrics.service.port }}"
+
+  ## Prometheus Service Monitor
+  ## ref: https://github.com/coreos/prometheus-operator
+  ##      https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
+  ##
+  podMonitor:
+    ## If the operator is installed in your cluster, set to true to create a Service Monitor Entry
+    ##
+    enabled: false
+    ## Specify the namespace in which the podMonitor resource will be created
+    # namespace: ""
+    ## Specify the interval at which metrics should be scraped
+    ##
+    # interval: 30s
+    ## Specify the timeout after which the scrape is ended
+    # scrapeTimeout: 10s
+
+  masterAnnotations:
+    prometheus.io/scrape: 'true'
+    prometheus.io/port: "{{ .Values.master.webPort }}"
+
+  workerAnnotations:
+    prometheus.io/scrape: 'true'
+    prometheus.io/port: "{{ .Values.worker.webPort }}"
+
+  ## Custom PrometheusRule to be defined
+  ## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
+  ## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
+  ##
+  prometheusRule:
+    enabled: false
+    additionalLabels: {}
+    namespace: ''
+    ## These are just examples rules, please adapt them to your needs.
+    ## Make sure to constraint the rules to the current postgresql service.
+    ## rules:
+    ##   - alert: HugeReplicationLag
+    ##     expr: pg_replication_lag{service="{{ template "postgresql.fullname" . }}-metrics"} / 3600 > 1
+    ##     for: 1m
+    ##     labels:
+    ##       severity: critical
+    ##     annotations:
+    ##       description: replication for {{ template "postgresql.fullname" . }} PostgreSQL is lagging by {{ "{{ $value }}" }} hour(s).
+    ##       summary: PostgreSQL replication is lagging by {{ "{{ $value }}" }} hour(s).
+    ##
+    rules: []
+
 ## Security configuration
 ##
 security:
@@ -319,7 +382,7 @@ service:
  ## set the LoadBalancer service type to internal only.
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
  ##
-  annotations: {}
+  annotations:

 ## Ingress paramaters
 ##