Stream pipeline (Flink) Metrics

Flink Metrics is a default dashboard available to you in Grafana that shows the following metrics. The standard metrics listed here are available for Flink pipelines. Custom metrics can be added to your pipeline code. See the official Flink documentation for more information about Flink metrics.

Flink Accumulators

Flink allows the creation of custom numerical metrics using accumulators. Stream Pipelines using Apache Flink support the following type of accumulators: Long and Double. Once created, these accumulators become available as named metrics that Grafana can query and add to dashboards. The metric names are commonly prefixed with the phrase flink_accumulators_.

For more information on using accumulators, see Custom Metrics and the documentation on Flink Accumulators.

Standard Metrics

CPU/Memory Metrics

METRIC	UNIT	DESCRIPTION
`flink_jobmanager_Status_JVM_CPU_Load`	Percentage	JobManager - recent CPU usage of the JVM, due to unclear reasons is not functioning as expected (For more information on workarounds see: How can I see the percentage CPU usage of jobmanager or taskmanagers of a Stream pipeline.)
`flink_jobmanager_Status_JVM_CPU_Time`	Nanoseconds	JobManager - CPU Time used by the JVM
`flink_jobmanager_Status_JVM_Memory_Heap_Used`	Bytes	JobManager - amount of heap memory currently used
`flink_jobmanager_Status_JVM_Memory_Heap_Committed`	Bytes	JobManager - amount of heap memory guaranteed to be available to the JVM
`flink_jobmanager_Status_JVM_Memory_Heap_Max`	Bytes	JobManager - maximum amount of heap memory that can be used for memory management
`flink_jobmanager_Status_JVM_Memory_NonHeap_Used`	Bytes	JobManager - amount of non-heap memory currently used
`flink_jobmanager_Status_JVM_Memory_NonHeap_Committed`	Bytes	JobManager - amount of non-heap memory guaranteed to be available to the JVM
`flink_jobmanager_Status_JVM_Memory_NonHeap_Max`	Bytes	JobManager - maximum amount of non-heap memory that can be used for memory management
`flink_jobmanager_Status_JVM_Memory_Direct_Count`	Count	JobManager - number of buffers in the direct buffer pool
`flink_jobmanager_Status_JVM_Memory_Direct_MemoryUsed`	Bytes	JobManager - amount of memory used by the JVM for the direct buffer pool
`flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity`	Bytes	JobManager - total capacity of all buffers in the direct buffer pool
`flink_jobmanager_Status_JVM_Memory_Mapped_Count`	Count	JobManager - number of buffers in the mapped buffer pool
`flink_jobmanager_Status_JVM_Memory_Mapped_MemoryUsed`	Bytes	JobManager - amount of memory used by the JVM for the mapped buffer pool
`flink_jobmanager_Status_JVM_Memory_Mapped_TotalCapacity`	Bytes	JobManager - number of buffers in the mapped buffer pool
`flink_taskmanager_Status_JVM_CPU_Load`	Percentage	TaskManager - recent CPU usage of the JVM, due to unclear reasons is not functioning as expected (For more information on workarounds see: How can I see the percentage CPU usage of jobmanager or taskmanagers of a Stream pipeline.)
`flink_taskmanager_Status_JVM_CPU_Time`	Nanoseconds	TaskManager - CPU Time used by the JVM
`flink_taskmanager_Status_JVM_Memory_Heap_Used`	Bytes	TaskManager - amount of heap memory currently used
`flink_taskmanager_Status_JVM_Memory_Heap_Committed`	Bytes	TaskManager - amount of heap memory guaranteed to be available to the JVM
`flink_taskmanager_Status_JVM_Memory_Heap_Max`	Bytes	TaskManager - maximum amount of heap memory that can be used for memory management
`flink_taskmanager_Status_JVM_Memory_NonHeap_Used`	Bytes	TaskManager - amount of non-heap memory currently used
`flink_taskmanager_Status_JVM_Memory_NonHeap_Committed`	Bytes	TaskManager - amount of non-heap memory guaranteed to be available to the JVM
`flink_taskmanager_Status_JVM_Memory_NonHeap_Max`	Bytes	TaskManager - maximum amount of non-heap memory that can be used for memory management
`flink_taskmanager_Status_JVM_Memory_Direct_Count`	Count	TaskManager - number of buffers in the direct buffer pool
`flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed`	Bytes	TaskManager - amount of memory used by the JVM for the direct buffer pool
`flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity`	Bytes	TaskManager - total capacity of all buffers in the direct buffer pool
`flink_taskmanager_Status_JVM_Memory_Mapped_Count`	Count	TaskManager - number of buffers in the mapped buffer pool
`flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed`	Bytes	TaskManager - amount of memory used by the JVM for the mapped buffer pool
`flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity`	Bytes	TaskManager - number of buffers in the mapped buffer pool

Flink Cluster Metrics

METRIC	DESCRIPTION
`flink_jobmanager_numRegisteredTaskManagers`	Total Number of Registered Task Managers
`flink_jobmanager_numRunningJobs`	Total Number of Running Jobs
`flink_jobmanager_taskSlotsTotal`	Total Number of Task Slots Allocated
`flink_jobmanager_taskSlotsAvailable`	Total Number of Task Slots Available

Flink I/O Metrics

METRIC	DESCRIPTION
`flink_taskmanager_job_task_currentLowWatermark`	Task - currentLowWatermark: the lowest watermark this task has received
`flink_taskmanager_job_task_numBytesInLocal`	Task - numBytesInLocal: the total number of bytes this task has read from a local source
`flink_taskmanager_job_task_numBytesInLocalPerSecond`	Task - numBytesInLocalPerSecond: the number of bytes this task reads from a local source per second
`flink_taskmanager_job_task_numBytesInRemote`	Task - numBytesInRemote: the total number of bytes this task has read from a remote source
`flink_taskmanager_job_task_numBytesInRemotePerSecond`	Task - numBytesInRemotePerSecond: the number of bytes this task reads from a remote source per second
`flink_taskmanager_job_task_numBytesOut`	Task - numBytesOut: the total number of bytes this task has emitted
`flink_taskmanager_job_task_numBytesOutPerSecond`	Task - numBytesOutPerSecond: the number of bytes this task emits per second
`flink_taskmanager_job_task_numRecordsIn`	Task/Operator - numRecordsIn: the total number of records this operator/task has received
`flink_taskmanager_job_task_numRecordsInPerSecond`	Task/Operator - numRecordsInPerSecond: the number of records this operator/task receives per second
`flink_taskmanager_job_task_numRecordsOut`	Task/Operator - numRecordsOut: the total number of records this operator/task has emitted
`flink_taskmanager_job_task_numRecordsOutPerSecond`	Task/Operator - numRecordsOutPerSecond: the number of records this operator/task sends per second
`flink_taskmanager_job_task_operator_latency`	Operator - latency: the latency distributions from all incoming sources

Kafka Producer and Consumer Metrics

Standard Kafka metrics are available when enabled in the configuration settings of the HERE platform Data Client, and their names are prefixed with:

METRIC	DESCRIPTION
`flink_taskmanager_job_task_operator_KafkaProducer`	Kafka Producer metrics
`flink_taskmanager_job_task_operator_KafkaConsumer`	Kafka Consumer metrics

The complete list of Kafka Producer and Consumer metrics can be found in Apache Kafka documentation (see links below).

📘
: Querying Prometheus When querying these metrics with PromQL (Prometheus Query Language), you can take advantage of label matchers on the metric names by matching against the internal __name__ label. For example, the expression flink_taskmanager_job_task_operator_KafkaConsumer_client_id_consumer_fetch_manager_metrics_fetch_rate is equivalent to {__name__=~".*consumer_fetch_manager_metrics_fetch_rate"}.

Flink Accumulators

Standard Metrics

CPU/Memory Metrics

Flink Cluster Metrics

Flink I/O Metrics

Kafka Producer and Consumer Metrics

See Also