Address comments suggested by @juan131

PR: https://github.com/bitnami/charts/pull/1184
This commit is contained in:
Alejandro Moreno
2019-05-16 11:55:51 +02:00
parent 6a069a7e4a
commit f23cedb9f4
8 changed files with 186 additions and 228 deletions

View File

@@ -45,71 +45,67 @@ The command removes all the Kubernetes components associated with the chart and
The following table lists the configurable parameters of the MinIO chart and their default values.
| Parameter | Description | Default |
| -------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| `global.imageRegistry` | Global Docker image registry | `nil` |
| `global.imagePullSecrets` | Global Docker registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `image.registry` | PyTorch image registry | `docker.io` |
| `image.repository` | PyTorch image name | `bitnami/pytorch` |
| `image.tag` | PyTorch image tag | `{VERSION}` |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
| `image.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `image.debug` | Specify if debug logs should be enabled | `false` |
| `git.registry` | Git image registry | `docker.io` |
| `git.repository` | Git image name | `bitnami/git` |
| `git.tag` | Git image tag | `latest` |
| `git.pullPolicy` | Git image pull policy | `Always` |
| `git.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `pytorch.entrypoint.file` | Main entrypoint to your application | `''` |
| `pytorch.entrypoint.args` | Args required by your entrypoint | `nil` |
| `pytorch.distributed.enabled` | Enable distributed mode for PyTorch | `false` |
| `pytorch.distributed.worldSize` | Number of nodes that will execute your code | `4` |
| `pytorch.configMap` | Config map that contains the files you want to load in PyTorch | `nil` |
| `pytorch.cloneFilesFromGit.enabled` | Enable in order to download files from git repository | `false` |
| `pytorch.cloneFilesFromGit.repository` | Repository that holds the files | `nil` |
| `pytorch.cloneFilesFromGit.revision` | Revision from the repository to checkout | `master` |
| `pytorch.extraEnvVars` | Extra environment variables to add to master and workers pods | `nil` |
| `service.type` | Kubernetes Service type | `ClusterIP` |
| `service.port` | PyTorch master service port | `49875` |
| `service.nodePort` | Port to bind to for NodePort service type | `nil` |
| `service.loadBalancerIP` | Static IP Address to use for LoadBalancer service type | `nil` |
| `service.annotations` | Kubernetes service annotations | `{}` |
| `nodeSelector` | Node labels for pod assignment | `{}` |
| `tolerations` | Toleration labels for pod assignment | `[]` |
| `affinity` | Map of node/pod affinities | `{}` |
| `resources` | Pod resources | `{}` |
| `securityContext.enabled` | Enable security context | `true` |
| `securityContext.fsGroup` | Group ID for the container | `1001` |
| `securityContext.runAsUser` | User ID for the container | `1001` |
| `livenessProbe.enabled` | Enable/disable the Liveness probe | `true` |
| `livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `5` |
| `livenessProbe.periodSeconds` | How often to perform the probe | `5` |
| `livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `livenessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed. | `1` |
| `livenessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | `5` |
| `readinessProbe.enabled` | Enable/disable the Readiness probe | `true` |
| `readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `5` |
| `readinessProbe.periodSeconds` | How often to perform the probe | `5` |
| `readinessProbe.timeoutSeconds` | When the probe times out | `1` |
| `readinessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed. | `1` |
| `readinessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | `5` |
| `persistence.enabled` | Use a PVC to persist data | `true` |
| `persistence.mountPath` | Path to mount the volume at | `/bitnami/pytorch` |
| `persistence.storageClass` | Storage class of backing PVC | `nil` (uses alpha storage class annotation) |
| `persistence.accessMode` | Use volume as ReadOnly or ReadWrite | `ReadWriteOnce` |
| `persistence.size` | Size of data volume | `8Gi` |
| `persistence.annotations` | Persistent Volume annotations | `{}` |
| Parameter | Description | Default |
| ------------------------------------ | -------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| `global.imageRegistry` | Global Docker image registry | `nil` |
| `global.imagePullSecrets` | Global Docker registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `image.registry` | PyTorch image registry | `docker.io` |
| `image.repository` | PyTorch image name | `bitnami/pytorch` |
| `image.tag` | PyTorch image tag | `{VERSION}` |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
| `image.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `image.debug` | Specify if debug logs should be enabled | `false` |
| `git.registry` | Git image registry | `docker.io` |
| `git.repository` | Git image name | `bitnami/git` |
| `git.tag` | Git image tag | `latest` |
| `git.pullPolicy` | Git image pull policy | `Always` |
| `git.pullSecrets` | Specify docker-registry secret names as an array | `[]` (does not add image pull secrets to deployed pods) |
| `entrypoint.file` | Main entrypoint to your application | `''` |
| `entrypoint.args` | Args required by your entrypoint | `nil` |
| `mode` | Run PyTorch in standalone or distributed mode (possible values: `standalone`, `distributed`) | `standalone` |
| `worldSize` | Number of nodes that will execute your code | `nil` |
| `port` | PyTorch master port | `49875` |
| `configMap` | Config map that contains the files you want to load in PyTorch | `nil` |
| `cloneFilesFromGit.enabled` | Enable in order to download files from git repository | `false` |
| `cloneFilesFromGit.repository` | Repository that holds the files | `nil` |
| `cloneFilesFromGit.revision` | Revision from the repository to checkout | `master` |
| `extraEnvVars` | Extra environment variables to add to master and workers pods | `nil` |
| `nodeSelector` | Node labels for pod assignment | `{}` |
| `tolerations` | Toleration labels for pod assignment | `[]` |
| `affinity` | Map of node/pod affinities | `{}` |
| `resources` | Pod resources | `{}` |
| `securityContext.enabled` | Enable security context | `true` |
| `securityContext.fsGroup` | Group ID for the container | `1001` |
| `securityContext.runAsUser` | User ID for the container | `1001` |
| `livenessProbe.enabled` | Enable/disable the Liveness probe | `true` |
| `livenessProbe.initialDelaySeconds` | Delay before liveness probe is initiated | `5` |
| `livenessProbe.periodSeconds` | How often to perform the probe | `5` |
| `livenessProbe.timeoutSeconds` | When the probe times out | `5` |
| `livenessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed. | `1` |
| `livenessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | `5` |
| `readinessProbe.enabled` | Enable/disable the Readiness probe | `true` |
| `readinessProbe.initialDelaySeconds` | Delay before readiness probe is initiated | `5` |
| `readinessProbe.periodSeconds` | How often to perform the probe | `5` |
| `readinessProbe.timeoutSeconds` | When the probe times out | `1` |
| `readinessProbe.successThreshold` | Minimum consecutive successes for the probe to be considered successful after having failed. | `1` |
| `readinessProbe.failureThreshold` | Minimum consecutive failures for the probe to be considered failed after having succeeded. | `5` |
| `persistence.enabled` | Use a PVC to persist data | `true` |
| `persistence.mountPath` | Path to mount the volume at | `/bitnami/pytorch` |
| `persistence.storageClass` | Storage class of backing PVC | `nil` (uses alpha storage class annotation) |
| `persistence.accessMode` | Use volume as ReadOnly or ReadWrite | `ReadWriteOnce` |
| `persistence.size` | Size of data volume | `8Gi` |
| `persistence.annotations` | Persistent Volume annotations | `{}` |
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
$ helm install --name my-release \
--set pytorch.distributed.enabled=true \
--set pytorch.distributed.worldSize=8 \
--set mode=distributed \
--set worldSize=4 \
bitnami/pytorch
```
The above command create 8 pods for PyTorch: one master and seven workers.
The above command create 4 pods for PyTorch: one master and three workers.
Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,
@@ -133,7 +129,7 @@ In order to use use an existing config map:
```console
$ helm install --name my-release \
--set pytorch.configMap=my-config-map \
--set configMap=my-config-map \
bitnami/pytorch
```
@@ -148,9 +144,9 @@ Finally, if you want to clone a git repository:
```console
$ helm install --name my-release \
--set pytorch.cloneFilesFromGit.enabled=true \
--set pytorch.cloneFilesFromGit.repository=https://github.com/my-user/my-repo \
--set pytorch.cloneFilesFromGit.revision=master \
--set cloneFilesFromGit.enabled=true \
--set cloneFilesFromGit.repository=https://github.com/my-user/my-repo \
--set cloneFilesFromGit.revision=master \
bitnami/pytorch
```

View File

@@ -1,6 +1,6 @@
{{- if or (.Values.pytorch.configMap) (.Files.Glob "files/*") (.Values.pytorch.cloneFilesFromGit.enabled) }}
{{- if .Values.pytorch.entrypoint.file }}
The provided file {{ .Values.pytorch.entrypoint.file }} is being executed. You can see the logs of each running node with:
{{- if or (.Values.configMap) (.Files.Glob "files/*") (.Values.cloneFilesFromGit.enabled) }}
{{- if .Values.entrypoint.file }}
The provided file {{ .Values.entrypoint.file }} is being executed. You can see the logs of each running node with:
kubectl logs [POD_NAME]
and the list of pods:
@@ -20,9 +20,9 @@ To run it, you can either deploy again using the `pytorch.entrypoint.file` optio
{{- else }}
You haven't loaded any file. This chart allows three different methods to load your files:
1. Load the files from an existing ConfigMap, using the `pytorch.configMap` option.
1. Load the files from an existing ConfigMap, using the `configMap` option.
2. Putting your files in a `files` folder in the root of the Chart.
3. Cloning a Git repository with the `pytorch.cloneFilesFromGit` option.
3. Cloning a Git repository with the `cloneFilesFromGit` option.
Examples for the different methods can be found in the README.
{{- end }}

View File

@@ -1,7 +1,7 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "pytorch.fullname" . }}-master
name: {{ include "pytorch.fullname" . }}{{ if eq .Values.mode "distributed" }}-master{{ end }}
labels:
app.kubernetes.io/name: {{ include "pytorch.name" . }}
helm.sh/chart: {{ include "pytorch.chart" . }}
@@ -31,23 +31,26 @@ spec:
runAsUser: {{ .Values.securityContext.runAsUser }}
{{- end }}
{{- if .Values.nodeSelector }}
nodeSelector:
{{ toYaml .Values.nodeSelector | indent 8 }}
nodeSelector: {{ toYaml .Values.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.tolerations }}
tolerations:
{{ toYaml .Values.tolerations | indent 8 }}
tolerations: {{ toYaml .Values.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.affinity }}
affinity:
{{ toYaml .Values.affinity | indent 8 }}
affinity: {{ toYaml .Values.affinity | nindent 8 }}
{{- end }}
{{- if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- if .Values.cloneFilesFromGit.enabled }}
initContainers:
- name: git-clone-repository
image: {{ include "git.image" . }}
imagePullPolicy: {{ .Values.git.pullPolicy | quote }}
command: [ '/bin/sh', '-c' , 'git clone {{ .Values.pytorch.cloneFilesFromGit.repository }} /app && cd /app && git checkout {{ .Values.pytorch.cloneFilesFromGit.revision }}']
command:
- /bin/sh
- -c
- |
git clone {{ .Values.cloneFilesFromGit.repository }} /app
cd /app
git checkout {{ .Values.cloneFilesFromGit.revision }}
volumeMounts:
- name: git-cloned-files
mountPath: /app
@@ -60,27 +63,27 @@ spec:
- bash
- -c
- |
{{- if .Values.pytorch.entrypoint.file }}
python {{ .Values.pytorch.entrypoint.file }} {{ if .Values.pytorch.entrypoint.args }}{{ .Values.pytorch.entrypoint.args }}{{ end }}
{{- if .Values.entrypoint.file }}
python {{ .Values.entrypoint.file }} {{ if .Values.entrypoint.args }}{{ .Values.entrypoint.args }}{{ end }}
{{- end }}
sleep infinity
env:
{{- if .Values.pytorch.distributed.enabled }}
{{- if eq .Values.mode "distributed" }}
- name: MASTER_ADDR
value: "127.0.0.1"
- name: MASTER_PORT
value: {{ .Values.service.port | quote }}
value: {{ .Values.port | quote }}
- name: WORLD_SIZE
value: {{ .Values.pytorch.distributed.worldSize | quote }}
value: {{ .Values.worldSize | quote }}
- name: RANK
value: "0"
{{- end }}
{{- if .Values.pytorch.extraEnvVars }}
{{ toYaml .Values.pytorch.extraEnvVars | indent 8 }}
{{- if .Values.extraEnvVars }}
{{ toYaml .Values.extraEnvVars | indent 8 }}
{{- end }}
ports:
- name: pytorch
containerPort: {{ .Values.service.port }}
containerPort: {{ .Values.port }}
{{- if .Values.livenessProbe.enabled }}
livenessProbe:
exec:
@@ -109,28 +112,28 @@ spec:
{{- end }}
resources: {{ toYaml .Values.resources | nindent 12 }}
volumeMounts:
{{- if .Values.pytorch.configMap }}
{{- if .Values.configMap }}
- name: ext-files
mountPath: /app
{{- else if .Files.Glob "files/*" }}
- name: local-files
mountPath: /app
{{- else if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- else if .Values.cloneFilesFromGit.enabled }}
- name: git-cloned-files
mountPath: /app
{{- end }}
- name: data
mountPath: {{ .Values.persistence.mountPath }}
volumes:
{{- if .Values.pytorch.configMap }}
{{- if .Values.configMap }}
- name: ext-files
configMap:
name: {{ .Values.pytorch.configMap }}
name: {{ .Values.configMap }}
{{- else if .Files.Glob "files/*" }}
- name: local-files
configMap:
name: {{ include "pytorch.fullname" . }}-files
{{- else if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- else if .Values.cloneFilesFromGit.enabled }}
- name: git-cloned-files
emptyDir: {}
{{- end }}

View File

@@ -2,7 +2,7 @@
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ include "pytorch.fullname" . }}-master
name: {{ include "pytorch.fullname" . }}{{ if eq .Values.mode "distributed" }}-master{{ end }}
labels:
app.kubernetes.io/name: {{ include "pytorch.name" . }}
helm.sh/chart: {{ include "pytorch.chart" . }}

View File

@@ -9,9 +9,9 @@ metadata:
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/component: "master"
spec:
type: {{ .Values.service.type }}
type: ClusterIP
ports:
- port: {{ .Values.service.port }}
- port: {{ .Values.port }}
targetPort: pytorch
name: pytorch
selector:

View File

@@ -1,4 +1,4 @@
{{- if .Values.pytorch.distributed.enabled }}
{{- if eq .Values.mode "distributed" }}
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
@@ -15,7 +15,7 @@ spec:
app.kubernetes.io/name: {{ include "pytorch.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: "worker"
replicas: {{ sub .Values.pytorch.distributed.worldSize 1 }}
replicas: {{ sub .Values.worldSize 1 }}
template:
metadata:
labels:
@@ -31,23 +31,26 @@ spec:
runAsUser: {{ .Values.securityContext.runAsUser }}
{{- end }}
{{- if .Values.nodeSelector }}
nodeSelector:
{{ toYaml .Values.nodeSelector | indent 8 }}
nodeSelector: {{ toYaml .Values.nodeSelector | nindent 8 }}
{{- end }}
{{- if .Values.tolerations }}
tolerations:
{{ toYaml .Values.tolerations | indent 8 }}
tolerations: {{ toYaml .Values.tolerations | nindent 8 }}
{{- end }}
{{- if .Values.affinity }}
affinity:
{{ toYaml .Values.affinity | indent 8 }}
affinity: {{ toYaml .Values.affinity | nindent 8 }}
{{- end }}
{{- if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- if .Values.cloneFilesFromGit.enabled }}
initContainers:
- name: git-clone-repository
image: {{ include "git.image" . }}
imagePullPolicy: {{ .Values.git.pullPolicy | quote }}
command: [ '/bin/sh', '-c' , 'git clone {{ .Values.pytorch.cloneFilesFromGit.repository }} /app && cd /app && git checkout {{ .Values.pytorch.cloneFilesFromGit.revision }}']
command:
- /bin/sh
- -c
- |
git clone {{ .Values.cloneFilesFromGit.repository }} /app
cd /app
git checkout {{ .Values.cloneFilesFromGit.revision }}
volumeMounts:
- name: git-cloned-files
mountPath: /app
@@ -63,8 +66,8 @@ spec:
RANK=${POD_NAME##*-}
((RANK++))
export RANK
{{- if .Values.pytorch.entrypoint.file }}
python {{ .Values.pytorch.entrypoint.file }} {{ if .Values.pytorch.entrypoint.args }}{{ .Values.pytorch.entrypoint.args }}{{ end }}
{{- if .Values.entrypoint.file }}
python {{ .Values.entrypoint.file }} {{ if .Values.entrypoint.args }}{{ .Values.entrypoint.args }}{{ end }}
{{- end }}
sleep infinity
env:
@@ -75,11 +78,11 @@ spec:
- name: MASTER_ADDR
value: {{ include "pytorch.fullname" . }}
- name: MASTER_PORT
value: {{ .Values.service.port | quote }}
value: {{ .Values.port | quote }}
- name: WORLD_SIZE
value: {{ .Values.pytorch.distributed.worldSize | quote }}
{{- if .Values.pytorch.extraEnvVars }}
{{ toYaml .Values.pytorch.extraEnvVars | indent 8 }}
value: {{ .Values.worldSize | quote }}
{{- if .Values.extraEnvVars }}
{{ toYaml .Values.extraEnvVars | indent 8 }}
{{- end }}
{{- if .Values.livenessProbe.enabled }}
livenessProbe:
@@ -109,28 +112,28 @@ spec:
{{- end }}
resources: {{ toYaml .Values.resources | nindent 12 }}
volumeMounts:
{{- if .Values.pytorch.configMap }}
{{- if .Values.configMap }}
- name: ext-files
mountPath: /app
{{- else if .Files.Glob "files/*" }}
- name: local-files
mountPath: /app
{{- else if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- else if .Values.cloneFilesFromGit.enabled }}
- name: git-cloned-files
mountPath: /app
{{- end }}
- name: data
mountPath: {{ .Values.persistence.mountPath }}
volumes:
{{- if .Values.pytorch.configMap }}
{{- if .Values.configMap }}
- name: ext-files
configMap:
name: {{ .Values.pytorch.configMap }}
name: {{ .Values.configMap }}
{{- else if .Files.Glob "files/*" }}
- name: local-files
configMap:
name: {{ include "pytorch.fullname" . }}-files
{{- else if .Values.pytorch.cloneFilesFromGit.enabled }}
{{- else if .Values.cloneFilesFromGit.enabled }}
- name: git-cloned-files
emptyDir: {}
{{- end }}

View File

@@ -48,67 +48,45 @@ git:
## PyTorch configuration
##
pytorch:
## The main entrypoint of your app, this will be executed as:
## python [file] [args]
entrypoint:
file:
#args:
## The main entrypoint of your app, this will be executed as:
## python [file] [args]
entrypoint:
file:
#args:
## To enable distributed mode
##
distributed:
enabled: true
worldSize: 4
## Name of an existing config map containing all the files you want to load in PyTorch
##
#configMap:
## Enable in order to download files from git repository.
##
cloneFilesFromGit:
enabled: false
# repository:
# revision: master
## Additional environment variables
##
# extraEnvVars:
# - name: NCCL_DEBUG
# value: "INFO"
# - name: NCCL_DEBUG_SUBSYS
# value: "ALL"
## Kubernetes Service Configuration
service:
## For minikube, set this to NodePort, elsewhere use LoadBalancer
##
type: ClusterIP
port: 49875
## Specify the nodePort value for the LoadBalancer and NodePort service types.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
##
# nodePort:
## Provide any additional annotations which may be required. This can be used to
## set the LoadBalancer service type to internal only.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
##
annotations: {}
# loadBalancerIP:
## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## Set to `distributed` in order to enable distributed mode
## mode: distributed
##
persistence:
mode: distributed
## Number of nodes that will run the code
## WORLD_SIZE will be set to this value
##
worldSize: 4
## The port used to comunicate with the master
## MASTER_PORT will be set to this value
##
port: 49875
## Name of an existing config map containing all the files you want to load in PyTorch
##
#configMap:
## Enable in order to download files from git repository.
##
cloneFilesFromGit:
enabled: false
path: /app
## If defined, volume.beta.kubernetes.io/storage-class: <storageClass>
## Default: volume.alpha.kubernetes.io/storage-class: default
##
# storageClass:
accessMode: ReadWriteOnce
size: 1Gi
# repository:
# revision: master
## Additional environment variables
##
# extraEnvVars:
# - name: NCCL_DEBUG
# value: "INFO"
# - name: NCCL_DEBUG_SUBSYS
# value: "ALL"
## Node labels for pod assignment
## Ref: https://kubernetes.io/docs/user-guide/node-selection/

View File

@@ -48,67 +48,45 @@ git:
## PyTorch configuration
##
pytorch:
## The main entrypoint of your app, this will be executed as:
## python [file] [args]
entrypoint:
file:
#args:
## The main entrypoint of your app, this will be executed as:
## python [file] [args]
entrypoint:
file:
#args:
## To enable distributed mode
##
distributed:
enabled: false
worldSize: 4
## Name of an existing config map containing all the files you want to load in PyTorch
##
#configMap:
## Enable in order to download files from git repository.
##
cloneFilesFromGit:
enabled: false
# repository:
# revision: master
## Additional environment variables
##
# extraEnvVars:
# - name: NCCL_DEBUG
# value: "INFO"
# - name: NCCL_DEBUG_SUBSYS
# value: "ALL"
## Kubernetes Service Configuration
service:
## For minikube, set this to NodePort, elsewhere use LoadBalancer
##
type: ClusterIP
port: 49875
## Specify the nodePort value for the LoadBalancer and NodePort service types.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
##
# nodePort:
## Provide any additional annotations which may be required. This can be used to
## set the LoadBalancer service type to internal only.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
##
annotations: {}
# loadBalancerIP:
## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## Set to `distributed` in order to enable distributed mode
## mode: distributed
##
persistence:
mode: standalone
## Number of nodes that will run the code
## WORLD_SIZE will be set to this value
##
#worldSize:
## The port used to comunicate with the master
## MASTER_PORT will be set to this value
##
port: 49875
## Name of an existing config map containing all the files you want to load in PyTorch
##
#configMap:
## Enable in order to download files from git repository.
##
cloneFilesFromGit:
enabled: false
path: /app
## If defined, volume.beta.kubernetes.io/storage-class: <storageClass>
## Default: volume.alpha.kubernetes.io/storage-class: default
##
# storageClass:
accessMode: ReadWriteOnce
size: 1Gi
# repository:
# revision: master
## Additional environment variables
##
# extraEnvVars:
# - name: NCCL_DEBUG
# value: "INFO"
# - name: NCCL_DEBUG_SUBSYS
# value: "ALL"
## Node labels for pod assignment
## Ref: https://kubernetes.io/docs/user-guide/node-selection/