Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting our team. We will be in touch shortly.Close

Configuration Setup and Runtime behavior for Apache Spark client

Apache Spark comes with wide range of configuration properties.

Passing each and every configuration in command line is cumbersome, so Apache Spark supports the use of properties configuration files allowing the user to reuse settings across submissions.

In addition, the user can still add or override configuration values on the command line.

spark-client tools provide the same rich set of options to specify configuration properties and also override them similarly to Apache Spark.

Following is the hierarchy of configurations merged during spark-client commands:

  • Snap Configuration: Immutable defaults provided in the Snap
  • Service Account Configuration: Set up time generated immutable defaults kept as a secret collection in Kubernetes. Valid across sessions and machines. Please refer to the setup section, specifically the part about service-account.
  • Environment Configuration: Properties in file specified via environment variable ($SPARK_CLIENT_ENV_CONF) valid across spark-submit commands in a shell session.
  • CLI Properties File: Properties file specified as a parameter (--properties-file)
  • CLI Configuration: Properties specified as parameters (list of --conf)

The final configuration is resolved by merging the above, overriding the latter sources on top of previous ones in case of multi-level definitions.

Last updated 10 months ago. Help improve this document in the forum.