Batch pipeline (Spark) Metrics

Spark Metrics is a default dashboard available to you in Grafana that shows the standard metrics described below. Custom metrics can be enabled using Spark Accumulators.

Spark Accumulators

Spark allows the creation of custom numerical metrics using accumulators. Batch Pipelines using Apache Spark support the following type of accumulators: numerical. Once created, these accumulators become available as named metrics that Grafana can query and add to dashboards. The metric names are commonly prefixed with the phrase spark_accumulators_.

For more information on using accumulators, see Custom Metrics and the documentation on Spark accumulators.

Spark Metrics for Pipelines

METRIC	DESCRIPTION
`driver_DAGScheduler_job_allJobs`	Number of Pipeline Jobs
`driver_DAGScheduler_job_activeJobs`	Number of Running Pipeline Jobs
`executor_threadpool_activeTasks`	Number of Workers per Running Job
`executor_threadpool_completeTasks`	Number of Completed Spark Tasks per Running Job
`driver_DAGScheduler_job_allJobs`	Number of Spark Jobs per Pipeline Job
`driver_DAGScheduler_stage_failedStages`	Number of Failed Stages per Pipeline Job
`driver_accumulators_.*`	Accumulator Values

Additional Spark Metrics for Pipelines

The following metrics are not displayed in the default dashboard but are available for use in custom dashboards.

Container Metrics

METRIC	UNIT	DESCRIPTION
`container_cpu_usage_seconds_total`	Seconds	Container Total CPU used
`container_memory_working_set_bytes`	Bytes	Container Memory used

Spark Driver Metrics

METRIC	UNIT	DESCRIPTION
`driver_jvm_total_committed`	Bytes	Memory available for use by the JVM for the driver.
`driver_jvm_total_init`	Bytes	Amount of memory available for use by the JVM at initialization for the driver.
`driver_jvm_total_max`	Bytes	Maximum amount of memory available to the JVM for the driver.
`driver_jvm_total_used`	Bytes	Amount of memory currently used by the driver.
`driver_jvm_heap_used`	Bytes	Amount of memory currently being used by the driver.
`driver_jvm_non_heap_used`	Bytes	Amount of non-heap memory currently being used by the driver.

Spark Executor Metrics

METRIC	UNIT	DESCRIPTION
`executor_threadpool_activeTasks`	Count	Number of active executor tasks
`executor_threadpool_completeTasks`	Count	Number of completed executor tasks
`jvm_G1_Young_Generation_time`	Seconds	G1 young generation garbage collection time
`jvm_G1_Old_Generation_time`	Seconds	G1 old generation garbage collection time
`jvm_G1_Young_Generation_count`	Count	G1 young generation garbage collection count
`jvm_G1_Old_Generation_count`	Count	G1 old generation garbage collection count
`jvm_heap_usage`	Bytes	Amount of memory currently being used by the executor.
`jvm_non_heap_usage`	Bytes	Amount of non-heap memory currently being used by the executor.

Filtering Pipeline Metrics

You can filter pipelines metrics using these Prometheus filters:

FILTER BY	KEY	EXAMPLE
Pipeline Id	PipeLineId	PipeLineId="00112233-4455-6677-8899-aabbccddeeff"
Job Id	DeploymentId	DeploymentId="00112233-4455-6677-8899-aabbccddeeff"
Pod Name	pod_name="job-<job_id>-worker-<executor_id>"	pod_name="job-00112233-4455-6677-8899-aabbccddeeff-worker-0"
Executor Id	executorId	executorId="0"

For example, to get the G1 young generation garbage collection for all executors for a given pipeline ID and job ID you would use this filter:

jvm_G1_Young_Generation_time{DeploymentId="ffeeddcc-bbaa-9988-7766-554433221100",PipeLineId="00112233-4455-6677-8899-aabbccddeeff"}

Spark Metrics for Notebooks

METRIC	DESCRIPTION
Average Memory per Executor	Average memory per executor and Spark driver
Average and Total Spark Memory Usage for All Units	Aggregate of average memory per executor and driver. Also aggregates all memory of the cluster
Active Cores	Number of active cores
Stages	Stages, such as running, pending and failed
Tasks by All Executors	Tasks by executors, active, and pool. This is another way to observe the active and available cores
Message Processing Time	Average message processing time
Completed Tasks by Each Executer	Completed tasks by executors and counters
File System Reads/Writes by Executors	File system read and writes in bytes (when the filesystem is used within jobs only)