GuidesChangelogData Inspector Library API Reference
Guides

Batch pipeline (Spark) Metrics

Spark Metrics is a default dashboard available to you in Grafana that shows the standard metrics described below. Custom metrics can be enabled using Spark Accumulators.

Spark Accumulators

Spark allows the creation of custom numerical metrics using accumulators. Batch Pipelines using Apache Spark support the following type of accumulators: numerical. Once created, these accumulators become available as named metrics that Grafana can query and add to dashboards. The metric names are commonly prefixed with the phrase spark_accumulators_.

For more information on using accumulators, see Custom Metrics and the documentation on Spark accumulators.

Spark Metrics for Pipelines

METRICDESCRIPTION
driver_DAGScheduler_job_allJobsNumber of Pipeline Jobs
driver_DAGScheduler_job_activeJobsNumber of Running Pipeline Jobs
executor_threadpool_activeTasksNumber of Workers per Running Job
executor_threadpool_completeTasksNumber of Completed Spark Tasks per Running Job
driver_DAGScheduler_job_allJobsNumber of Spark Jobs per Pipeline Job
driver_DAGScheduler_stage_failedStagesNumber of Failed Stages per Pipeline Job
driver_accumulators_.*Accumulator Values

Additional Spark Metrics for Pipelines

The following metrics are not displayed in the default dashboard but are available for use in custom dashboards.

Container Metrics

METRICUNITDESCRIPTION
container_cpu_usage_seconds_totalSecondsContainer Total CPU used
container_memory_working_set_bytesBytesContainer Memory used

Spark Driver Metrics

METRICUNITDESCRIPTION
driver_jvm_total_committedBytesMemory available for use by the JVM for the driver.
driver_jvm_total_initBytesAmount of memory available for use by the JVM at initialization for the driver.
driver_jvm_total_maxBytesMaximum amount of memory available to the JVM for the driver.
driver_jvm_total_usedBytesAmount of memory currently used by the driver.
driver_jvm_heap_usedBytesAmount of memory currently being used by the driver.
driver_jvm_non_heap_usedBytesAmount of non-heap memory currently being used by the driver.

Spark Executor Metrics

METRICUNITDESCRIPTION
executor_threadpool_activeTasksCountNumber of active executor tasks
executor_threadpool_completeTasksCountNumber of completed executor tasks
jvm_G1_Young_Generation_timeSecondsG1 young generation garbage collection time
jvm_G1_Old_Generation_timeSecondsG1 old generation garbage collection time
jvm_G1_Young_Generation_countCountG1 young generation garbage collection count
jvm_G1_Old_Generation_countCountG1 old generation garbage collection count
jvm_heap_usageBytesAmount of memory currently being used by the executor.
jvm_non_heap_usageBytesAmount of non-heap memory currently being used by the executor.

Filtering Pipeline Metrics

You can filter pipelines metrics using these Prometheus filters:

FILTER BYKEYEXAMPLE
Pipeline IdPipeLineIdPipeLineId="00112233-4455-6677-8899-aabbccddeeff"
Job IdDeploymentIdDeploymentId="00112233-4455-6677-8899-aabbccddeeff"
Pod Namepod_name="job-<job_id>-worker-<executor_id>"pod_name="job-00112233-4455-6677-8899-aabbccddeeff-worker-0"
Executor IdexecutorIdexecutorId="0"

For example, to get the G1 young generation garbage collection for all executors for a given pipeline ID and job ID you would use this filter:

jvm_G1_Young_Generation_time{DeploymentId="ffeeddcc-bbaa-9988-7766-554433221100",PipeLineId="00112233-4455-6677-8899-aabbccddeeff"}

Spark Metrics for Notebooks

METRICDESCRIPTION
Average Memory per ExecutorAverage memory per executor and Spark driver
Average and Total Spark Memory Usage for All UnitsAggregate of average memory per executor and driver. Also aggregates all memory of the cluster
Active CoresNumber of active cores
StagesStages, such as running, pending and failed
Tasks by All ExecutorsTasks by executors, active, and pool. This is another way to observe the active and available cores
Message Processing TimeAverage message processing time
Completed Tasks by Each ExecuterCompleted tasks by executors and counters
File System Reads/Writes by ExecutorsFile system read and writes in bytes (when the filesystem is used within jobs only)