Files
charts/bitnami/mxnet

Apache MXNet (Incubating)

Apache MXNet (Incubating) is a deep learning platform that accelerates the transition from research prototyping to production deployment. It is built for full integration into Python that enables you to use it with its libraries and main packages.

TL;DR

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/mxnet

Introduction

This chart bootstraps an Apache MXNet (Incubating) deployment on a Kubernetes cluster using the Helm package manager.

Bitnami charts can be used with Kubeapps for deployment and management of Helm Charts in clusters. This Helm chart has been tested on top of Bitnami Kubernetes Production Runtime (BKPR). Deploy BKPR to get automated TLS certificates, logging and monitoring for your applications.

Prerequisites

  • Kubernetes 1.12+
  • Helm 3.1.0
  • PV provisioner support in the underlying infrastructure
  • ReadWriteMany volumes for deployment scaling

Installing the Chart

To install the chart with the release name my-release:

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/mxnet

These commands deploy Apache MXNet (Incubating) on the Kubernetes cluster in the default configuration. The Parameters section lists the parameters that can be configured.

Tip

: List all releases using helm list

Uninstalling the Chart

To uninstall/delete the my-release deployment:

$ helm delete my-release

The command removes all the Kubernetes components associated with the chart and deletes the release.

Parameters

Global parameters

Name Description Value
global.imageRegistry Global Docker image registry ""
global.imagePullSecrets Global Docker registry secret names as an array []
global.storageClass Global StorageClass for Persistent Volume(s) ""

Common parameters

Name Description Value
nameOverride String to partially override common.names.fullname template (will maintain the release name) ""
fullnameOverride String to fully override common.names.fullname template ""

Common Mxnet parameters

Name Description Value
image.registry Apache MXNet (Incubating) image registry docker.io
image.repository Apache MXNet (Incubating) image repository bitnami/mxnet
image.tag Apache MXNet (Incubating) image tag (immutable tags are recommended) 1.8.0-debian-10-r141
image.pullPolicy Image pull policy IfNotPresent
image.pullSecrets Specify docker-registry secret names as an array []
image.debug Specify if debug logs should be enabled false
git.registry Git image registry docker.io
git.repository Git image repository bitnami/git
git.tag Git image tag (immutable tags are recommended) 2.33.0-debian-10-r8
git.pullPolicy Git image pull policy IfNotPresent
git.pullSecrets Specify docker-registry secret names as an array []
volumePermissions.enabled Enable init container that changes volume permissions in the data directory false
volumePermissions.image.registry Init container volume-permissions image registry docker.io
volumePermissions.image.repository Init container volume-permissions image repository bitnami/bitnami-shell
volumePermissions.image.tag Init container volume-permissions image tag (immutable tags are recommended) 10-debian-10-r172
volumePermissions.image.pullPolicy Init container volume-permissions image pull policy Always
volumePermissions.image.pullSecrets Specify docker-registry secret names as an array []
volumePermissions.resources.limits The resources limits for the container {}
volumePermissions.resources.requests The requested resources for the container {}
service.type Kubernetes service type ClusterIP
service.port Scheduler Service port 9092
service.annotations Provide any additional annotations which may be required. {}
entrypoint The main entrypoint of your app, this will be executed as: {}
mode Apache MXNet (Incubating) deployment mode. Can be standalone or distributed standalone
existingSecret Name of a secret with sensitive data to mount in the pods ""
configMap Name of an existing config map containing all the files you want to load in Apache MXNet (Incubating) ""
hostAliases Deployment pod host aliases []
cloneFilesFromGit.enabled Enable in order to download files from git repository false
cloneFilesFromGit.extraVolumeMounts Add extra volume mounts for the GIT container []
commonExtraEnvVars Additional environment variables for all node types []
podManagementPolicy StatefulSet (worker and server nodes) pod management policy Parallel
podAffinityPreset Pod affinity preset. Ignored if affinity is set. Allowed values: soft or hard ""
podAntiAffinityPreset Pod anti-affinity preset. Ignored if affinity is set. Allowed values: soft or hard soft
nodeAffinityPreset.type Node affinity preset type. Ignored if affinity is set. Allowed values: soft or hard ""
nodeAffinityPreset.key Node label key to match Ignored if affinity is set. ""
nodeAffinityPreset.values Node label values to match. Ignored if affinity is set. []
affinity Affinity for pod assignment. Evaluated as a template. {}
nodeSelector Node labels for pod assignment. Evaluated as a template. {}
tolerations Tolerations for pod assignment. Evaluated as a template. []
resources.limits The resources limits for the Mxnet container {}
resources.requests The requested resources for the Mxnet container {}
securityContext.enabled Enable security context true
securityContext.fsGroup Group ID for the container 1001
securityContext.runAsUser User ID for the container 1001
livenessProbe.enabled Enable livenessProbe true
livenessProbe.initialDelaySeconds Initial delay seconds for livenessProbe 5
livenessProbe.periodSeconds Period seconds for livenessProbe 5
livenessProbe.timeoutSeconds Timeout seconds for livenessProbe 15
livenessProbe.failureThreshold Failure threshold for livenessProbe 5
livenessProbe.successThreshold Success threshold for livenessProbe 1
readinessProbe.enabled Enable readinessProbe true
readinessProbe.initialDelaySeconds Initial delay seconds for readinessProbe 5
readinessProbe.periodSeconds Period seconds for readinessProbe 5
readinessProbe.timeoutSeconds Timeout seconds for readinessProbe 15
readinessProbe.failureThreshold Failure threshold for readinessProbe 5
readinessProbe.successThreshold Success threshold for readinessProbe 1
persistence.enabled Use a PVC to persist data false
persistence.mountPath Path to mount the volume at /bitnami/mxnet
persistence.accessModes Persistent Volume Access Mode []
persistence.size Size of data volume 8Gi
persistence.annotations Persistent Volume annotations {}
sidecars Attach additional containers to the pods (scheduler, worker and server nodes) []
initContainers Attach additional init containers to the pods (scheduler, worker and server nodes) []
extraVolumes Array to add extra volumes (evaluated as a template) []
extraVolumeMounts Array to add extra mounts (normally used with extraVolumes, evaluated as a template) []

Mxnet Server parameters (only for distributed mode)

Name Description Value
server.replicaCount Number of Server nodes that will execute your code 1
server.extraEnvVars Extra environment variables to add to the Server nodes []
server.hostAliases Deployment pod host aliases []
server.resources.limits The resources limits for the Mxnet Server container {}
server.resources.requests The requested resources for the Mxnet Server container {}
server.podAffinityPreset Mxnet Server pod affinity preset. Ignored if affinity is set. Allowed values: soft or hard ""
server.podAntiAffinityPreset Mxnet Server pod anti-affinity preset. Ignored if affinity is set. Allowed values: soft or hard soft
server.nodeAffinityPreset.type Mxnet Server node affinity preset type. Ignored if affinity is set. Allowed values: soft or hard ""
server.nodeAffinityPreset.key Mxnet Server node label key to match Ignored if affinity is set. ""
server.nodeAffinityPreset.values Mxnet Server node label values to match. Ignored if affinity is set. []
server.affinity Mxnet Server affinity for pod assignment {}
server.nodeSelector Mxnet Server node labels for pod assignment {}
server.tolerations Mxnet Server tolerations for pod assignment []

Mxnet Worker parameters (only for distributed mode)

Name Description Value
worker.replicaCount Number of Worker nodes that will execute your code 1
worker.hostAliases Add deployment host aliases []
worker.extraEnvVars Extra environment variables to add to the Server nodes []
worker.resources.limits The resources limits for the Mxnet Worker container {}
worker.resources.requests The requested resources for the Mxnet Worker container {}
worker.podAffinityPreset Mxnet Worker pod affinity preset. Ignored if affinity is set. Allowed values: soft or hard ""
worker.podAntiAffinityPreset Mxnet Worker pod anti-affinity preset. Ignored if affinity is set. Allowed values: soft or hard soft
worker.nodeAffinityPreset.type Mxnet Worker node affinity preset type. Ignored if affinity is set. Allowed values: soft or hard ""
worker.nodeAffinityPreset.key Mxnet Worker node label key to match Ignored if affinity is set. ""
worker.nodeAffinityPreset.values Mxnet Worker node label values to match. Ignored if affinity is set. []
worker.affinity Mxnet Worker affinity for pod assignment {}
worker.nodeSelector Mxnet Worker node labels for pod assignment {}
worker.tolerations Mxnet Worker tolerations for pod assignment []

Mxnet Scheduler parameters (only for distributed mode)

Name Description Value
scheduler.port The port used to communicate with the scheduler 9092
scheduler.hostAliases Deployment pod host aliases []
scheduler.extraEnvVars An array to add extra env vars []
scheduler.resources.limits The resources limits for the Mxnet Scheduler container {}
scheduler.resources.requests The requested resources for the Mxnet Scheduler container {}
scheduler.podAffinityPreset Mxnet Scheduler pod affinity preset. Ignored if affinity is set. Allowed values: soft or hard ""
scheduler.podAntiAffinityPreset Mxnet Scheduler pod anti-affinity preset. Ignored if affinity is set. Allowed values: soft or hard soft
scheduler.nodeAffinityPreset.type Mxnet Scheduler node affinity preset type. Ignored if affinity is set. Allowed values: soft or hard ""
scheduler.nodeAffinityPreset.key Mxnet Scheduler node label key to match Ignored if affinity is set. ""
scheduler.nodeAffinityPreset.values Mxnet Scheduler node label values to match. Ignored if affinity is set. []
scheduler.affinity Mxnet Scheduler affinity for pod assignment {}
scheduler.nodeSelector Mxnet Scheduler node labels for pod assignment {}
scheduler.tolerations Mxnet Scheduler tolerations for pod assignment []

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,

$ helm install my-release \
  --set mode=distributed \
  --set server.replicaCount=2 \
  --set worker.replicaCount=3 \
    bitnami/mxnet

The above command creates 6 pods for Apache MXNet (Incubating): one scheduler, two servers, and three workers.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install my-release -f values.yaml bitnami/mxnet

Tip

: You can use the default values.yaml

Configuration and installation details

Rolling VS Immutable tags

It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.

Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.

Loading your files

The Apache MXNet (Incubating) chart supports three different ways to load your files. In order of priority, they are:

  1. Existing config map
  2. Files under the files directory
  3. Cloning a git repository

This means that if you specify a config map with your files, it won't look for the files/ directory nor the git repository.

In order to use use an existing config map you can set the configMap=my-config-map parameter.

To load your files from the files/ directory you don't have to set any option. Just copy your files inside and don't specify a ConfigMap.

Finally, if you want to clone a git repository you can use the following parameters:

cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/my-user/my-repo
cloneFilesFromGit.revision=master

In case you want to add a file that includes sensitive information, pass a secret object using the existingSecret parameter. All the files in the secret will be mounted in the /secrets folder.

Distributed training example

We will use the gluon example from the Apache MXNet (Incubating) official repository. Launch it with the following values:

mode=distributed
cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/apache/incubator-mxnet.git
cloneFilesFromGit.revision=master
entrypoint.file=image_classification.py
entrypoint.args="--dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync"
entrypoint.workDir=/app/example/gluon/

Check the logs of the worker node:

INFO:root:Starting new image-classification task:, Namespace(batch_norm=False, batch_size=32, builtin_profiler=0, data_dir='', dataset='cifar10', dtype='float32', epochs=1, gpus='', kvstore='dist_sync', log_interval=50, lr=0.1, lr_factor=0.1, lr_steps='30,60,90', mode=None, model='vgg11', momentum=0.9, num_workers=4, prefix='', profile=False, resume='', save_frequency=10, seed=123, start_epoch=0, use_pretrained=False, use_thumbnail=False, wd=0.0001)
INFO:root:downloaded http://data.mxnet.io/mxnet/data/cifar10.zip into data/cifar10.zip successfully
[10:05:40] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: data/cifar/train.rec, use 1 threads for decoding..
[10:05:45] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: data/cifar/test.rec, use 1 threads for decoding..

If you want to increase the verbosity, set the environment variable PS_VERBOSE=1 or PS_VERBOSE=2 using the commonEnvVars value.

mode=distributed
cloneFilesFromGit.enabled=true
cloneFilesFromGit.repository=https://github.com/apache/incubator-mxnet.git
cloneFilesFromGit.revision=master
entrypoint.file=image_classification.py
entrypoint.args="--dataset cifar10 --model vgg11 --epochs 1 --kvstore dist_sync"
entrypoint.workDir=/app/example/gluon/
commonExtraEnvVars[0].name=PS_VERBOSE
commonExtraEnvVars[0].value=1

You will now see log entries in the scheduler and server nodes.

[14:22:44] src/van.cc:290: Bind to role=scheduler, id=1, ip=10.32.0.11, port=9092, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=9 to node role=worker, ip=10.32.0.17, port=55423, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=11 to node role=worker, ip=10.32.0.16, port=60779, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=13 to node role=worker, ip=10.32.0.15, port=39817, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=15 to node role=worker, ip=10.32.0.14, port=48119, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=8 to node role=server, ip=10.32.0.13, port=56713, is_recovery=0
[14:22:53] src/van.cc:56: assign rank=10 to node role=server, ip=10.32.0.12, port=57099, is_recovery=0
[14:22:53] src/van.cc:83: the scheduler is connected to 4 workers and 2 servers
[14:22:53] src/van.cc:183: Barrier count for 7 : 1
[14:22:53] src/van.cc:183: Barrier count for 7 : 2
[14:22:53] src/van.cc:183: Barrier count for 7 : 3
[14:22:53] src/van.cc:183: Barrier count for 7 : 4
...

Sidecars and Init Containers

If you have a need for additional containers to run within the same pod as Apache MXNet (Incubating) (e.g. an additional metrics or logging exporter), you can do so via the sidecars config parameter. Simply define your container according to the Kubernetes container spec.

sidecars:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
   containerPort: 1234

Similarly, you can add extra init containers using the initContainers parameter.

initContainers:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
   containerPort: 1234

Persistence

The Bitnami Apache MXNet (Incubating) image can persist data. If enabled, the persisted path is /bitnami/mxnet by default.

The chart mounts a Persistent Volume at this location. The volume is created using dynamic volume provisioning.

Adjust permissions of persistent volume mountpoint

As the image run as non-root by default, it is necessary to adjust the ownership of the persistent volume so that the container can write data into it.

By default, the chart is configured to use Kubernetes Security Context to automatically change the ownership of the volume. However, this feature does not work in all Kubernetes distributions. As an alternative, this chart supports using an initContainer to change the ownership of the volume before mounting it in the final destination.

You can enable this initContainer by setting volumePermissions.enabled to true.

Setting Pod's affinity

This chart allows you to set your custom affinity using the XXX.affinity parameter(s). Find more information about Pod's affinity in the kubernetes documentation.

As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the bitnami/common chart. To do so, set the XXX.podAffinityPreset, XXX.podAntiAffinityPreset, or XXX.nodeAffinityPreset parameters.

Troubleshooting

Find more information about how to deal with common errors related to Bitnamis Helm charts in this troubleshooting guide.

Upgrading

To 2.1.0

Some parameters disappeared in favor of new ones:

  • schedulerExtraEnvVars and schedulerPort -> deprecated in favor of scheduler.extraEnvVars and scheduler.port, respectively.
  • serverExtraEnvVars and serverCount -> deprecated in favor of server.extraEnvVars and server.replicaCount, respectively.
  • workerExtraEnvVars and workerCount -> deprecated in favor of worker.extraEnvVars and worker.replicaCount, respectively.

This version also introduces bitnami/common, a library chart as a dependency. More documentation about this new utility could be found here. Please, make sure that you have updated the chart dependencies before executing any upgrade.

To 2.0.0

On November 13, 2020, Helm v2 support was formally finished, this major version is the result of the required changes applied to the Helm Chart to be able to incorporate the different features added in Helm v3 and to be consistent with the Helm project itself regarding the Helm v2 EOL.

What changes were introduced in this major version?

  • Previous versions of this Helm Chart use apiVersion: v1 (installable by both Helm 2 and 3), this Helm Chart was updated to apiVersion: v2 (installable by Helm 3 only). Here you can find more information about the apiVersion field.
  • The different fields present in the Chart.yaml file has been ordered alphabetically in a homogeneous way for all the Bitnami Helm Charts

Considerations when upgrading to this version

  • If you want to upgrade to this version from a previous one installed with Helm v3, you shouldn't face any issues
  • If you want to upgrade to this version using Helm v2, this scenario is not supported as this version doesn't support Helm v2 anymore
  • If you installed the previous version with Helm v2 and wants to upgrade to this version with Helm v3, please refer to the official Helm documentation about migrating from Helm v2 to v3

Useful links