# How to run pipelines

Once you have deployed a pipeline and created a pipeline version as described in [Deploy a pipeline via the web portal](portal-deployment.md) section,
the typical next step is to activate it to start processing the data from the data source.
However, in addition to activation, there are several other actions that can be performed on a pipeline version.
The full list of these operations includes:

* [Activate pipeline version](#activate-pipeline-version)
  * [Activate stream pipeline](#activate-stream-pipeline)
  * [Activate batch pipeline](#activate-batch-pipeline)
    * [Activate batch pipeline in `Run now` mode](#activate-batch-pipeline-in-run-now-mode)
    * [Activate batch pipeline in `Schedule` mode](#activate-batch-pipeline-in-schedule-mode)
* [Pause pipeline version](#pause-pipeline-version)
* [Resume pipeline version](#resume-pipeline-version)
* [Cancel pipeline version](#cancel-pipeline-version)
* [Deactivate pipeline version](#deactivate-pipeline-version)

The above operations can be performed from either the platform portal or via the OLP CLI.
For more information on how to perform them via the OLP CLI, please see the [Pipeline workflows](https://docs.here.com/workspace/docs/olp-cli-topics-pipeline-workflows) article,
as this section covers the platform portal part.

For more information on deploying pipelines, creating pipeline versions, and managing these instances, see the following articles:

* [Deploy a pipeline via the web portal](https://docs.here.com/workspace/docs/portal-deployment)
* [Manage pipelines](https://docs.here.com/workspace/docs/managing-pipelines)

## Activate pipeline version

Once you have created the pipeline version, it is displayed on a portal page similar to the following:

![manage-pipeline-list-versions-2.png](https://files.readme.io/1be029457595dbaaa53fcfa2cfc1c3f158061d4ed5cf5021b88bbdbd87313612-manage-pipeline-list-versions-2.png "Pipeline versions")

For more information about this page, the properties available for each version, etc., refer to the
[Manage pipelines - Display a list of pipeline versions](https://docs.here.com/workspace/docs/managing-pipelines#display-a-list-of-pipeline-versions) chapter.<br />

To start processing the data from the input catalogs, click the `Activate` button for the version you wish to activate:

![run-pipelines-activate-1.png](https://files.readme.io/d961f5cf30fe6ecb7a9a285dbc3fabfc2f857de8c730186a4075f4d417eb39cf-run-pipelines-activate-1.png "Activate pipeline version")

From an activation perspective, there are several differences between stream and batch pipelines.

### Activate stream pipeline

Once you have clicked the `Activate` button for a stream pipeline version, the following dialog box opens:

![run-pipelines-activate-2.png](https://files.readme.io/887a3124388c5c2f908aedab7df07508e0592f477124202e44578182e2e7e7da-run-pipelines-activate-2.png "Activation options for stream pipeline version")

As you can see, it's possible to select the runtime credentials to run this pipeline version.
This could be your user or application credentials. For more information, see the [Identity & Access Management Guide](https://docs.here.com/identity-and-access-management/docs/here-identity-and-access-management-readme).

This dialog also contains a switch to run the `JobManager` of the stream pipeline version in `High Availability` mode.
When enabled, another `JobManager` is deployed as a standby for the pipeline. These multiple `JobManagers` are managed
via `ZooKeeper` which coordinates leader election and the state of pipeline. This second `JobManager` is deployed in a different
`Availability Zone` than the first one. If the primary `JobManager` fails, the standby `JobManager` quickly takes over and the pipeline continues to run.
The failed primary `JobManager` is also restarted, and it becomes the new standby `JobManager` to reestablish `High Availability` to protect against future failures.

The option to enable `High Availability` is available during the `Activate`, `Resume`, and `Upgrade` operations.

Please note that enabling this feature introduces additional cost for the additional resources.
These additional resources required to run a stream pipeline `JobManager` with high availability are:

* Resources for the second `JobManager` (same size as the first one).
* Resources for the `ZooKeeper`: `1.5` CPU and `1.5` GB of RAM.

The cost of these additional resources is added to the original cost of the pipeline.

For more information on the `High Availability` feature, see [Best practices for high availability - Enable high availability option for stream pipelines](https://docs.here.com/workspace/docs/highly-available-pipelines#enable-high-availability-option-for-stream-pipelines) chapter.

### Activate batch pipeline

Once you have clicked the `Activate` button for a batch pipeline version, the following dialog box opens:

![run-pipelines-activate-3.png](https://files.readme.io/fb1c84da8b2d84a435497cc3438b97a0fa2336872279f98b614e1dd5a812aacd-run-pipelines-activate-3.png "Activation options for batch pipeline version")

In addition to the drop-down list for selecting the runtime credentials, this menu contains two activation modes for
the batch pipeline version - `Run now` and `Schedule`.

#### Activate batch pipeline in `Run now` mode

The `Run now` activation mode forces the pipeline version to run immediately without waiting for input data to change.
This mode is selected by default and requires additional information about input catalogs.
The available options are `Reprocess latest catalog version` and `Reprocess a specific catalog version`.

If you select the `Reprocess latest catalog version` option (which is the default), the system will identify and reprocess the latest input catalog version:

![run-pipelines-activate-3.png](https://files.readme.io/8e0e3c9d796925f5e337ea2c5a923cc3f160c1a31b8c737dc5758ccf64f47255-run-pipelines-activate-4.png "Reprocess latest catalog version")

If you select the `Reprocess specific catalog version` option, you also need to specify a version of the input catalog to be processed:

![run-pipelines-activate-5.png](https://files.readme.io/1641f7e7332d055e3d493b012266f680144ee6f5292969ac8bc0412c6b7b0fe0-run-pipelines-activate-5.png "Reprocess specific catalog version")

#### Activate batch pipeline in `Schedule` mode

The second activation mode is `Schedule`. If this mode is selected, the pipeline version will only run when the input data changes.
There are two trigger options available in `Schedule` mode - `Data change` and `Time schedule`:

![run-pipelines-activate-5.png](https://files.readme.io/d4ceffc8a9b7c3b6fcfa188c4fb3eb05b4a3977d806903bc6da473006142bd8a-run-pipelines-activate-6.png "Activate a batch pipeline in Schedule mode")

With the `Data change` trigger option, a pipeline version will run when input data changes.
If you select this option, the pipeline version will wait in the [`Scheduled`](https://docs.here.com/workspace/docs/pipeline-states#pipeline-version-states) state
until the input catalogs are updated with new data:

![run-pipelines-activate-5.png](https://files.readme.io/3985b7b016fc52dde8dd6d51bcb377ceeb1f455f85180c4c7bfe10b8c6c8cf9b-run-pipelines-activate-7.png "Data change trigger option")

With the `Time schedule` trigger option, a pipeline version will run on a set schedule, but **only** when there is new data to process.
If you select this option, you also need to provide a `CRON` schedule using the [Unix `CRON` expression format](https://en.wikipedia.org/wiki/Cron):

![run-pipelines-activate-5.png](https://files.readme.io/ee36f494d81f8a2c3febef64a03886d1e36cba964d9e8d4faadd3d109ac712f1-run-pipelines-activate-8.png "Time schedule trigger option")

The interval between consecutive attempts to run the pipeline version cannot be less than an hour.
The `CRON` expression provided is evaluated in UTC timezone - as an example, a `CRON` expression of `30 * * * *` will result in attempts to
run the pipeline version at 30 minutes past the hour, every hour of the UTC clock.
The attempt to run the pipeline version will be skipped if the pipeline version is still running at the time of the next attempt.
A job will only be run if there are pending changes to be processed in the input catalogs.

If you are looking for more advanced scheduling functionality for the batch pipeline versions, this is available with the [OLP CLI](https://docs.here.com/workspace/docs/olp-cli-topics-pipeline-workflows).

> #### Info
>
> **Run latency**\
> There are always a few moments of latency before the pipeline actually starts processing (during which the pipeline is in the `Scheduled` state).
> This is even true for the `Run now` option.
> Scheduled operations can be even more delayed because they are triggered by the availability of data and system resources to start processing.

## Pause pipeline version

Once a pipeline version has been activated, it will eventually run when all activation conditions are met:

![run-pipelines-activate-5.png](https://files.readme.io/d9007da1bd0b43f985a9a8fedac33efb12080090a6adda8e53f1f8121df270a0-run-pipelines-activate-9.png "Pipeline version is running")

There are several operations that can be performed on a pipeline version that is in the `Running` state, one of which is `Pause`, which is used to temporarily stop
the data from being processed.

Click the `Pause` button for the running pipeline version you wish to pause:

![run-pipelines-activate-5.png](https://files.readme.io/a84a665ae7ff3861bf953f31db583dd80c08fce3615214c876ccbd6efb0a1baf-run-pipelines-activate-9-1.png "Pause pipeline version")

Pausing a running pipeline version requires special considerations, as the results of the `Pause` depend on the type of pipeline job being executed.
If we pause a batch pipeline job that has been activated using a `Schedule` mode, it will not stop processing immediately.
Instead, the current job is executed to completion, and the pipeline version state is then changed to `Paused`:

![run-pipelines-activate-5.png](https://files.readme.io/1cad73043faa42376fe1dc2b2ff70c7899e7b4cea06ef4bcfc22d45271e8c66f-run-pipelines-activate-10.png "Pipeline version is being paused")

On the other hand, if a batch pipeline version has been activated in `Run now` mode, it cannot be paused, only cancelled:

![run-pipelines-activate-5.png](https://files.readme.io/bd9f12c7aef8830b58c1a538e7f8a46844a4587dd2f6b987aeaa533c92de28e2-run-pipelines-activate-11.png "Batch pipeline version that was activated in Run now mode cannot be paused")

When you pause a running stream pipeline version, the current state of the job is saved and the job is gracefully stopped at that point.
When a `Resume` command is issued, a new job is submitted to restart the pipeline version from the previously saved state.

If the paused job is cancelled, the saved state of the paused job is discarded and the pipeline version moves to the `Ready` state.

## Resume pipeline version

The `Resume` operation is used to resume the data processing after the pipeline version has been paused.
To do this, select the `Resume` option for the pipeline version that is in the `Paused` state:

![run-pipelines-activate-5.png](https://files.readme.io/785b7297f4e35a0b26f5a83bcb2eeec3653f13ae9ea021dbaffd840b1d884678-run-pipelines-activate-12.png "Resuming pipeline version")

If you have resumed the stream pipeline version, the following dialogue box will appear:

![run-pipelines-activate-5.png](https://files.readme.io/d75cf07b7ba305d8eda9bf399c7cb1d7472a49286472074e509dc6e664bf3242-run-pipelines-activate-13-1.png "Resuming stream pipeline version")

As you can see, you will need to select the runtime credentials to resume this pipeline version with.
This dialog also contains a switch to resume the pipeline version with the [`High Availability` feature](highly-available-pipelines#enable-high-availability-option-for-stream-pipelines) enabled.

If you have resumed the batch pipeline version, you will need to select the runtime credentials only:

![run-pipelines-activate-5.png](https://files.readme.io/d75cf07b7ba305d8eda9bf399c7cb1d7472a49286472074e509dc6e664bf3242-run-pipelines-activate-13-1.png "Resuming batch pipeline version")

When a paused pipeline version is resumed, the typical delay is 30-90 seconds, but this delay can last for several minutes if resources are limited.
While the pipeline version is being resumed, it will be moved to the `Scheduled` state:

![run-pipelines-activate-5.png](https://files.readme.io/2edfdcfa0debd3c44674b94bb308e19fcfe0f7a1fac80657ee9a03348eef752c-run-pipelines-activate-14.png "Pipeline version is being resumed")

When a pipeline version is resumed, it will eventually run again:

![run-pipelines-activate-5.png](https://files.readme.io/3bf01ee72e3ee77ca51af86b8801e6b7db9ef1238582f85aa019572e3cdb3808-run-pipelines-activate-15.png "Pipeline version is resumed")

Since both batch and stream pipelines have mechanisms to mark the point at which the data processing was paused,
when a pipeline version is resumed, a new data processing job starts from that point.
In the case of stream pipelines, [Flink Savepoints](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/ops/state/savepoints/)
are used to resume data processing.
For batch pipelines, data processing will only resume if there is new data to process and if the `Time schedule` trigger
requirements (if any) are met.

> #### Warning
>
> A **stream** pipeline version can be resumed if it has been paused for less than 7 days.

## Cancel pipeline version

Another operation available for pipeline versions is `Cancel`.
It is required to cancel the specified pipeline version and any future jobs scheduled for that version.

It's possible to cancel a pipeline version that is in the `Running` or `Paused` states:

![run-pipelines-activate-5.png](https://files.readme.io/d0f511fb70a5f57f000b18cfecec26588af0dd2ecc60493c8e1483f26a12b01d-run-pipelines-activate-16.png "Cancel running pipeline version")

![run-pipelines-activate-5.png](https://files.readme.io/f5f91248f9915cec1cf375fccbee6c7e3558757144f7e6feb7fb5750d7ea2405-run-pipelines-activate-17.png "Cancel paused pipeline version")

Once you have clicked on the `Cancel` button, you will be asked to confirm the operation:

![run-pipelines-activate-5.png](https://files.readme.io/4c449783f98f3224de46d8f14e567b57ceed84e344851c49c0b609747b5a1345-run-pipelines-activate-18.png "Confirm operation")

After confirming this operation, the pipeline job would be immediately interrupted and cancelled.
Finally, the cancelled pipeline version is returned to the `Ready` state:

![run-pipelines-activate-5.png](https://files.readme.io/3c4af19afc23229ff11e0f4d374065ffdfb5f46a48fc83b763009710a736205f-run-pipelines-activate-19.png "Pipeline version was cancelled")

## Deactivate pipeline version

If a pipeline version is still in a `Scheduled` state after activation, deactivate it by clicking `Deactivate` as shown below:

![run-pipelines-activate-5.png](https://files.readme.io/24674d8ae85eeffed91a35ad017bb19992555eb2f382cdeb4b1fd3c6d452aba2-run-pipelines-activate-20.png "Deactivate a pipeline version")

After deactivation, the pipeline version returns to a `Ready` state where it is available for activation again:

![run-pipelines-activate-5.png](https://files.readme.io/38d8a42911360ebb3087636d69d5a882b687f05fdeb5f9f244e934237638ad81-run-pipelines-activate-21.png "Pipeline version was deactivated")

Please note that the `Deactivate` operation may not be available in certain circumstances.
The screenshot below was taken immediately after the pipeline version was activated.
As you can see, the version is in the `Scheduled` state, but the `Deactivate` button is not active because the `Run` operation
is currently being performed and is in the `Pending` state:

![run-pipelines-activate-5.png](https://files.readme.io/292bac42ec9d50e5c510ce981cdc7d4060b44add82c398e91bc1092c77d130af-run-pipelines-activate-22.png "Pipeline version was deactivated")