[bitnami/spark] Adds support for Spark metrics (#4605)

* Initial metrics support

* Adds different annotations for master and worker

* Adds support for prometheus-operator, and update the values-production and README

* Fix indentation in values.yaml and values-production.yaml

* Fixes annotations, and use PodMonitor instead of ServiceMonitor

* Renames servicemonitor to podmonitor

Co-authored-by: rafael <rafael@bitnami.com>
This commit is contained in:
Rafael Ríos Saavedra
2020-12-04 15:41:47 +01:00
committed by GitHub
parent 129c744a38
commit 53bec24c61
8 changed files with 262 additions and 11 deletions

View File

@@ -21,4 +21,4 @@ name: spark
sources:
- https://github.com/bitnami/bitnami-docker-spark
- https://spark.apache.org/
version: 4.1.0
version: 4.2.0

View File

@@ -47,17 +47,35 @@ The command removes all the Kubernetes components associated with the chart and
The following tables lists the configurable parameters of the spark chart and their default values.
### Global parameters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `global.imageRegistry` | Global Docker image registry | `nil` |
| `global.imagePullSecrets` | Global Docker registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
### Common paramters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `nameOverride` | String to partially override common.names.fullname template with a string (will prepend the release name) | `nil` |
| `fullnameOverride` | String to fully override common.names.fullname template with a string | `nil` |
### Spark parameters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `image.registry` | spark image registry | `docker.io` |
| `image.repository` | spark Image name | `bitnami/spark` |
| `image.tag` | spark Image tag | `{TAG_NAME}` |
| `image.pullPolicy` | spark image pull policy | `IfNotPresent` |
| `image.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `nameOverride` | String to partially override common.names.fullname template with a string (will prepend the release name) | `nil` |
| `fullnameOverride` | String to fully override common.names.fullname template with a string | `nil` |
### Spark master parameters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `master.debug` | Specify if debug values should be set on the master | `false` |
| `master.webPort` | Specify the port where the web interface will listen on the master | `8080` |
| `master.clusterPort` | Specify the port where the master listens to communicate with workers | `7077` |
@@ -87,6 +105,11 @@ The following tables lists the configurable parameters of the spark chart and th
| `master.readinessProbe.timeoutSeconds` | When the probe times out | 5 |
| `master.readinessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | 6 |
| `master.readinessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed | 1 |
### Spark worker parameters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `worker.debug` | Specify if debug values should be set on workers | `false` |
| `worker.webPort` | Specify the port where the web interface will listen on the worker | `8080` |
| `worker.clusterPort` | Specify the port where the worker listens to communicate with the master | `7077` |
@@ -124,6 +147,11 @@ The following tables lists the configurable parameters of the spark chart and th
| `master.extraEnvVars` | Extra environment variables to pass to the worker container | `{}` |
| `worker.extraVolumes` | Array of extra volumes to be added to the Spark worker deployment (evaluated as template). Requires setting `worker.extraVolumeMounts` | `nil` |
| `worker.extraVolumeMounts` | Array of extra volume mounts to be added to the Spark worker deployment (evaluated as template). Normally used with `worker.extraVolumes`. | `nil` |
### Security paramters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `security.passwordsSecretName` | Secret to use when using security configuration to set custom passwords | No default |
| `security.rpc.authenticationEnabled` | Enable the RPC authentication | `false` |
| `security.rpc.encryptionEnabled` | Enable the encryption for RPC | `false` |
@@ -132,6 +160,11 @@ The following tables lists the configurable parameters of the spark chart and th
| `security.ssl.needClientAuth` | Enable the client authentication | `false` |
| `security.ssl.protocol` | Set the SSL protocol | `TLSv1.2` |
| `security.certificatesSecretName` | Set the name of the secret that contains the certificates | No default |
### Exposure parameters
| Parameter | Description | Default |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `service.type` | Kubernetes Service type | `ClusterIP` |
| `service.webPort` | Spark client port | `80` |
| `service.clusterPort` | Spark cluster port | `7077` |
@@ -149,6 +182,26 @@ The following tables lists the configurable parameters of the spark chart and th
| `ingress.hosts[0].tlsHosts` | Array of TLS hosts for ingress record (defaults to `ingress.hosts[0].name` if `nil`) | `nil` |
| `ingress.hosts[0].tlsSecret` | TLS Secret (certificates) | `spark.local-tls` |
### Metrics parameters
| Parameter | Description | Default |
|-------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| `metrics.enabled` | Start a side-car prometheus exporter | `false` |
| `metrics.service.port` | Service Metrics port | `9117` |
| `metrics.service.annotations` | Annotations for enabling prometheus to access the metrics endpoints | `{prometheus.io/scrape: "true", prometheus.io/port: "9117"}` |
| `metrics.resources.limits` | The resources limits for the metrics exporter container | `{}` |
| `metrics.resources.requests` | The requested resources for the metrics exporter container | `{}` |
| `metrics.podMonitor.enabled` | Create PodMonitor Resource for scraping metrics using PrometheusOperator | `false` |
| `metrics.podMonitor.namespace` | Namespace where podmonitor resource should be created | `nil` |
| `metrics.podMonitor.interval` | Specify the interval at which metrics should be scraped | `30s` |
| `metrics.podMonitor.scrapeTimeout` | Specify the timeout after which the scrape is ended | `nil` |
| `metrics.masterAnnotations` | Annotations for enabling prometheus to access the metrics endpoint of the master nodes | `{prometheus.io/scrape: "true", prometheus.io/port: "8080"}` |
| `metrics.workerAnnotations` | Annotations for enabling prometheus to access the metrics endpoint of the worker nodes | `{prometheus.io/scrape: "true", prometheus.io/port: "8081"}` |
| `metrics.prometheusRule.enabled` | Set this to true to create prometheusRules for Prometheus | `false` |
| `metrics.prometheusRule.additionalLabels` | Additional labels that can be used so prometheusRules will be discovered by Prometheus | `{}` |
| `metrics.prometheusRule.namespace` | namespace where prometheusRules resource should be created | the same namespace as spark |
| `metrics.prometheusRule.rules` | [rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) to be created, check values for an example. | `[]` |
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console

View File

@@ -0,0 +1,29 @@
{{- if and .Values.metrics.enabled .Values.metrics.podMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "common.names.fullname" . }}
{{- if .Values.metrics.podMonitor.namespace }}
namespace: {{ .Values.metrics.podMonitor.namespace }}
{{- end }}
labels:
{{- include "common.labels.standard" . | nindent 4 }}
app.kubernetes.io/component: metrics
{{- if .Values.metrics.podMonitor.additionalLabels }}
{{- toYaml .Values.metrics.podMonitor.additionalLabels | nindent 4 }}
{{- end }}
spec:
podMetricsEndpoints:
- port: http
{{- if .Values.metrics.podMonitor.interval }}
interval: {{ .Values.metrics.podMonitor.interval }}
{{- end }}
{{- if .Values.metrics.podMonitor.scrapeTimeout }}
scrapeTimeout: {{ .Values.metrics.podMonitor.scrapeTimeout }}
{{- end }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchLabels: {{- include "common.labels.matchLabels" . | nindent 6 }}
{{- end }}

View File

@@ -0,0 +1,23 @@
{{- if and .Values.metrics.enabled .Values.metrics.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ include "common.names.fullname" . }}
{{- with .Values.metrics.prometheusRule.namespace }}
namespace: {{ . }}
{{- end }}
labels:
{{- include "common.labels.standard" . | nindent 4 }}
{{- with .Values.metrics.prometheusRule.additionalLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- if .Values.commonAnnotations }}
annotations: {{- include "common.tplvalues.render" ( dict "value" .Values.commonAnnotations "context" $ ) | nindent 4 }}
{{- end }}
spec:
{{- with .Values.metrics.prometheusRule.rules }}
groups:
- name: {{ include "common.names.fullname" . }}
rules: {{ tpl (toYaml .) $ | nindent 8 }}
{{- end }}
{{- end }}

View File

@@ -17,8 +17,14 @@ spec:
{{- if .Values.master.extraPodLabels }}
{{- include "common.tplvalues.render" (dict "value" .Values.master.extraPodLabels "context" $) | nindent 8 }}
{{- end }}
{{- if or .Values.master.podAnnotations .Values.metrics.enabled }}
annotations:
{{- if .Values.master.podAnnotations }}
annotations: {{- include "common.tplvalues.render" (dict "value" .Values.master.podAnnotations "context" $) | nindent 8 }}
{{- include "common.tplvalues.render" (dict "value" .Values.master.podAnnotations "context" $) | nindent 8 }}
{{- end }}
{{- if and .Values.metrics.enabled }}
{{- include "common.tplvalues.render" ( dict "value" .Values.metrics.masterAnnotations "context" $) | nindent 8 }}
{{- end }}
{{- end }}
spec:
{{- include "spark.imagePullSecrets" . | nindent 6 }}
@@ -66,6 +72,10 @@ spec:
- name: BASH_DEBUG
value: {{ ternary "1" "0" .Values.image.debug | quote }}
{{- end }}
{{- if .Values.metrics.enabled }}
- name: SPARK_METRICS_ENABLED
value: "true"
{{- end}}
- name: SPARK_DAEMON_MEMORY
value: {{ .Values.master.daemonMemoryLimit | quote }}
{{- if .Values.master.clusterPort }}

View File

@@ -17,8 +17,14 @@ spec:
{{- if .Values.worker.extraPodLabels }}
{{- include "common.tplvalues.render" (dict "value" .Values.worker.extraPodLabels "context" $) | nindent 8 }}
{{- end }}
{{- if or .Values.worker.podAnnotations .Values.metrics.enabled }}
annotations:
{{- if .Values.worker.podAnnotations }}
annotations: {{- include "common.tplvalues.render" (dict "value" .Values.worker.podAnnotations "context" $) | nindent 8 }}
{{- include "common.tplvalues.render" (dict "value" .Values.worker.podAnnotations "context" $) | nindent 8 }}
{{- end }}
{{- if and .Values.metrics.enabled }}
{{- include "common.tplvalues.render" ( dict "value" .Values.metrics.workerAnnotations "context" $) | nindent 8 }}
{{- end }}
{{- end }}
spec:
{{- include "spark.imagePullSecrets" . | nindent 6 }}
@@ -68,6 +74,10 @@ spec:
value: {{ ternary "1" "0" .Values.image.debug | quote }}
- name: SPARK_DAEMON_MEMORY
value: {{ .Values.worker.daemonMemoryLimit | quote }}
{{- if .Values.metrics.enabled }}
- name: SPARK_METRICS_ENABLED
value: "true"
{{- end}}
## There are some environment variables whose existence needs
## to be checked because Spark checks if they are null instead of an
## empty string

View File

@@ -259,6 +259,69 @@ worker:
## Max number of workers when using autoscaling
replicasMax: 10
## Metrics configuration
##
metrics:
enabled: enabled
## Prometheus metrics service parameters
##
service:
## Metrics port
##
port: 9117
## Annotations for the Prometheus metics service
##
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "{{ .Values.metrics.service.port }}"
## Prometheus Service Monitor
## ref: https://github.com/coreos/prometheus-operator
## https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
##
podMonitor:
## If the operator is installed in your cluster, set to true to create a Service Monitor Entry
##
enabled: false
## Specify the namespace in which the podMonitor resource will be created
# namespace: ""
## Specify the interval at which metrics should be scraped
##
# interval: 30s
## Specify the timeout after which the scrape is ended
# scrapeTimeout: 10s
masterAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "{{ .Values.master.webPort }}"
workerAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "{{ .Values.worker.webPort }}"
## Custom PrometheusRule to be defined
## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
##
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ''
## These are just examples rules, please adapt them to your needs.
## Make sure to constraint the rules to the current postgresql service.
## rules:
## - alert: HugeReplicationLag
## expr: pg_replication_lag{service="{{ template "postgresql.fullname" . }}-metrics"} / 3600 > 1
## for: 1m
## labels:
## severity: critical
## annotations:
## description: replication for {{ template "postgresql.fullname" . }} PostgreSQL is lagging by {{ "{{ $value }}" }} hour(s).
## summary: PostgreSQL replication is lagging by {{ "{{ $value }}" }} hour(s).
##
rules: []
## Security configuration
##
security:
@@ -276,6 +339,11 @@ security:
##
storageEncryptionEnabled: true
## Name of the secret that contains the certificates
## It should contains two keys called "spark-keystore.jks" and "spark-truststore.jks" with the files in JKS format.
##
certificatesSecretName: my-certificates-secret
## SSL configuration
##
ssl:
@@ -283,11 +351,6 @@ security:
needClientAuth: true
protocol: TLSv1.2
## Name of the secret that contains the certificates
## It should contains two keys called "spark-keystore.jks" and "spark-truststore.jks" with the files in JKS format.
##
certificatesSecretName: my-certificates-secret
## Service parameters
##
service:

View File

@@ -259,6 +259,69 @@ worker:
## Max number of workers when using autoscaling
replicasMax: 5
## Metrics configuration
##
metrics:
enabled: false
## Prometheus metrics service parameters
##
service:
## Metrics port
##
port: 9117
## Annotations for the Prometheus metics service
##
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "{{ .Values.metrics.service.port }}"
## Prometheus Service Monitor
## ref: https://github.com/coreos/prometheus-operator
## https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#endpoint
##
podMonitor:
## If the operator is installed in your cluster, set to true to create a Service Monitor Entry
##
enabled: false
## Specify the namespace in which the podMonitor resource will be created
# namespace: ""
## Specify the interval at which metrics should be scraped
##
# interval: 30s
## Specify the timeout after which the scrape is ended
# scrapeTimeout: 10s
masterAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "{{ .Values.master.webPort }}"
workerAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "{{ .Values.worker.webPort }}"
## Custom PrometheusRule to be defined
## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
##
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ''
## These are just examples rules, please adapt them to your needs.
## Make sure to constraint the rules to the current postgresql service.
## rules:
## - alert: HugeReplicationLag
## expr: pg_replication_lag{service="{{ template "postgresql.fullname" . }}-metrics"} / 3600 > 1
## for: 1m
## labels:
## severity: critical
## annotations:
## description: replication for {{ template "postgresql.fullname" . }} PostgreSQL is lagging by {{ "{{ $value }}" }} hour(s).
## summary: PostgreSQL replication is lagging by {{ "{{ $value }}" }} hour(s).
##
rules: []
## Security configuration
##
security:
@@ -319,7 +382,7 @@ service:
## set the LoadBalancer service type to internal only.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
##
annotations: {}
annotations:
## Ingress paramaters
##