# Pipelines API pipeline lifecycle

The operational lifecycle of a typical pipeline follows a consistent pattern. Whether you use the portal GUI or the CLI, this pattern has three major phases:

1. [Create the pipeline application](#create-the-pipeline-application)
2. [Deploy the pipeline](#deploy-the-pipeline)
3. [Manage the pipeline](#manage-the-pipeline)

The first two steps are summarized in Figure 1. Phase 1 is done in a local development environment. Phase 2 can be done using the portal GUI or the OLP CLI.
`{% if book.product == 'internal' %}`
![Process diagram of pipeline creation and deployment](https://files.readme.io/f2e08039a11ed7339b2d36f69ddf6c48133b2e0a13946f79a941999d4f261271-pipeline-lifecycle-2a.png "Pipeline v3 lifecycle")
`{% else %}`
![Process diagram of pipeline creation and deployment](https://files.readme.io/208134055eaa0cc6c2bcf8c4b85190b516af895f67c060064e55520f919ab1c5-pipeline-lifecycle-2.png "Pipeline v2 lifecycle")
`{% endif %}`

## Create the pipeline application

The goal of Phase 1 is to create a *pipeline JAR file*. This JAR file contains the code for the pipeline framework, the data ingestion, the data output, and all of the data transformation logic required to implement the intended data processing workflow.

To simplify this task, HERE provides *project archetypes* to supply as much of the boilerplate code as possible, and a framework to contain everything else. Different *Maven archetypes* set up a project for either a batch pipeline or a stream pipeline. These archetypes also provide all of the interface code needed to execute the pipeline in the proper framework within the platform. The only thing you need to provide is the data processing code itself. The pipeline JAR file is actually a Fat JAR file containing all of the libraries and other assets needed by the pipeline.

1. Define the *business requirements* for the pipeline: data source, data type/schema, process flow, and desired results of data processing.
2. Based on the business requirements, determine the *workflow*, formally define the *data schema*, and develop the *data transformation algorithms*. The algorithms and data ingestion/output are implemented in Java or Scala language and integrated into the pipeline project.
3. The Java/Scala code is compiled. The result is a JAR file that contains the code for data ingestion, data processing, and outputting the processed data. All the required libraries and other assets are added to make a Fat JAR file. The resulting pipeline JAR file is unique, transportable, and reusable.

<Callout icon="📘" theme="info">
  Note

  **Credentials required**
  You must register every pipeline application with the HERE platform before it can be used. This process is described in the [Identity & Access Management Guide](/identity-and-access-management/docs/). For specific procedural information, see the article [Manage apps](/identity-and-access-management/docs/manage-apps).
  The Phase 1 process shown here is actually more complex than Phase 2, since it is not a simple task to design the transformation algorithms and translate them into compilable code. Nor does this process address the ancillary steps of testing, reviewing, or validating the pipeline code.
</Callout>

> A good description of the detailed process of creating a batch pipeline can be found in the *Data Processing Library Developer Guide* using both [Java](/data-sdk/docs/) and [Scala](/data-sdk/docs/).
> See the following articles for more information:

* [Develop pipelines](https://docs.here.com/workspace/docs/develop-pipelines)
* [Develop a Spark application](/docs/)
* [Develop a Flink application](/docs/)

## Deploy the pipeline

You use pipeline JAR files to deploy a pipeline. Pipeline JAR Files are designed for either batch or stream processing. They are also designed to implement a specific data processing workflow for a specific data schema. There are also runtime considerations that are specified during deployment.

Select the pipeline JAR file to be deployed and do the following to prepare it for deployment:

1. *Create a pipeline object* - Set up an instance of a pipeline and obtain a pipeline ID\{\{ ' (or pipeline HRN if Pipelines API v3 is used)' if book.product == 'internal' else ''}}.

`{% if book.product == 'internal' %}`
![Create a pipeline object](https://files.readme.io/d3464fe7f5b1a7fd2d2e4b645005bc534f68ecf910b2502f148ee0c89ef981a3-pipeline-lifecycle-6a-2.png "Create a pipeline")
`{% else %}`
![Create a pipeline object](https://files.readme.io/891c0a1cb2fa284fd1720e6f639721f5b4a3c0a5417ab3b01d7de9535a5d41fc-pipeline-lifecycle-6a.png "Create a pipeline")
`{% endif %}`
2\. *Create a template* - Upload the pipeline JAR file and obtain a template ID\{\{ ' (or template HRN if Pipelines API v3 is used)' if book.product == 'internal' else ''}}. Specify the input and output catalog identifiers.

`{% if book.product == 'internal' %}`
![Create a package](https://files.readme.io/1405c95cabbb136667e82452efcb7f86479a5cbcafc683159015578856a088b7-pipeline-lifecycle-6b-2.png "Create a template")
`{% else %}`
![Create a package](https://files.readme.io/138225dd22c49f7e69092f880746f242036ab8f886a1bff63fd7cf6baa09f9b5-pipeline-lifecycle-6b.png "Create a template")
`{% endif %}`
3\. *Create a pipeline version* - Create an executable instance of the pipeline and register the runtime requirements for the deployed pipeline. A pipeline version ID\{\{' (UUID for Pipelines API v2 or a sequence number for Pipelines API v3)' if book.product == 'internal' else ''}} is assigned upon successful completion of this step. The pipeline is now deployed and ready to be activated.

`{% if book.product == 'internal' %}`
![Create a pipeline version](https://files.readme.io/3d35ed9d822951d5ff8c84b975ac07eb7bb6d1300b393e3235a283aae4683fcd-pipeline-lifecycle-6c-2.png "Create a pipeline version")
`{% else %}`
![Create a pipeline version](https://files.readme.io/c5ced2e0545c79132daf5a1a757bd48d68102adc74eeda2c9e30c77c0025db04-pipeline-lifecycle-6c.png "Create a pipeline version")
`{% endif %}`

### Activate the pipeline

To execute a pipeline, you must activate one of its pipeline versions.

To activate the pipeline version, perform an *Activate* operation on the pipeline version ID. A batch pipeline can be activated to run *On-demand (Run Now)* OR it can be *Scheduled*. With the *Scheduled* option, the batch pipeline version can be executed when the input catalogs are updated with new data or based on a time schedule. See the following for details on various modes of execution.

#### Execution modes for activating a pipeline

There are several execution modes available for activating a pipeline version, as summarized in the following table:

| Pipeline type | Execution mode: On-demand                                                                                                                                                                                                                                                                                                                        | Execution mode: Scheduled                                                                                                                                                                                                                                                                                                                                                | Execution mode: Time Schedule                                                                                                                                                                                                                                                                                                                          |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Batch         | The pipeline enters the `Scheduled` state and immediately changes to the `Running` state to attempt to process the specified input data catalogs. When the job is done, the pipeline returns to the `Ready` state. No further processing is done, even if the input catalogs receive new data. Additional processing must be initiated manually. | The pipeline version enters the `Scheduled` state for a brief period of time and then changes to the `Running` state to begin processing the existing data in the input catalogs. After the job is completed, it returns to the `Scheduled` state where it waits for new data to be available in the input catalogs. Only new data is processed for each subsequent run. | The pipeline enters the `Scheduled` state and waits for the *next attempt* time of the time schedule. Once the *next attempt* time has arrived, it changes to the `Running` state to begin processing the existing data in the input catalogs. After the job is completed, it returns to the `Scheduled` state where it waits for *next attempt* time. |
| Stream        | Not supported. At the moment, there's no option to specify an end time for a stream pipeline. Therefore, it cannot be run once.                                                                                                                                                                                                                  | The pipeline begins in the `Scheduled` state for a brief period of time and then changes to the `Running` state to begin processing the data stream from the specified input catalog. The pipeline continues to run (and stays in the `Running` state) until it is paused, canceled, or deactivated.                                                                     | Not supported because stream pipelines process data continuously.                                                                                                                                                                                                                                                                                      |

When you activate a pipeline version, a request is made to the pipeline service to initiate the execution of pipeline version. A *job* is created to start the execution and a job ID is generated. When the job starts, a URL for all the logs of the job is returned by the pipeline service.
`{% if book.product == 'internal' %}`
![Typical pipeline lifecycle from predeployment through deployment in a runtime environment](https://files.readme.io/a1dbe3d7f192be3a8b43964e9ca1cc482e85ebc807975123287c220315c356af-pipeline-lifecycle-6d-2.png "Activate the pipeline version")
`{% else %}`
![Typical pipeline lifecycle from predeployment through deployment in a runtime environment](https://files.readme.io/5b06f4946e8514f894561d1f6c831c975c633ce9e91dbccaf565515d8c9efd77-pipeline-lifecycle-6d.png "Activate the pipeline version")
`{% endif %}`

#### Job failures in Scheduled and Time Scheduled batch pipelines

If an individual job in a Scheduled batch pipeline version fails, the pipeline version returns to the `Scheduled` state. If the failure occurred after the job published a new version of the output catalog, the pipeline runs if new data is present in the input catalog. Alternatively, the pipeline will run immediately, reprocessing the same data as the failed job.

Individual job failures in a Time Scheduled batch pipeline are processed the same as successful completions. The pipeline version returns to the `Scheduled` state and runs again at the next attempt per the scheduled time.

This mechanism provides resiliency to intermittent failure. There are cases however, where a pipeline fails consistently and requires direct intervention to fix (for example, corrupted data in the input catalog). If several jobs in a pipeline version fail consecutively, the pipeline version returns to a `Ready` state and must be activated again after addressing the failures.

The consecutive failure count is reset by `Activate`, `Resume`, and `Upgrade` operations. The number of failures at which the pipeline version is deactivated varies by realm, with 10 being the default.

A notification email is sent to the email address configured for the pipeline when it's deactivated due to consecutive failures.

<Callout icon="📘" theme="info">
  Note

  **Deployment tips**

  * The logging URL is returned automatically when you activate from the web portal or the CLI.
  * Additional pipeline versions can be created using the same template or another template. Each pipeline version is distinguished by its own unique pipeline version ID.
  * A pipeline can have only one (1) pipeline version `running`/`active` at any time. This life cycle applies, with minor variations, to both batch and stream pipelines.
</Callout>

`{% if book.product == 'internal' %}`

> * It is important to remember that the deployment of any pipeline begins with creating an instance of that pipeline in the pipeline service. That instance is assigned an identifier - [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) (for instances created with Pipelines API v2) or [HERE Resource Name (HRN)](/identity-and-access-management/docs/faq#what-is-a-resource) (for instances created with Pipelines API v3). Everything else is managed by the pipeline service under that identifier, so it cannot change - it is immutable. The metadata associated with the `UUID` or `HRN` is simply used as a convenient way to talk about the pipeline instance. So, names and descriptions may be changed, but as far as the pipeline service is concerned it is the same pipeline instance.
>   `{% else %}`
> * It is important to remember that the deployment of any pipeline begins with creating an instance of that pipeline in the pipeline service. That instance is assigned a UUID for identification: the `pipeline ID`. Everything else is managed by the pipeline service under that `pipeline ID`, so it cannot change - it is immutable. The metadata associated with the `pipeline ID` is simply used as a convenient way to talk about the pipeline instance. So, names and descriptions may be changed, but as far as the pipeline service is concerned it is the same pipeline instance.
>   `{% endif %}`
>   For more detailed information on how this all works, see:

* [Deploy pipelines](deployment.md)
* [Run a Flink application on the platform](/docs/)
* [Run a Spark application on the platform](/docs/)

## Manage the pipeline

Once the pipeline is activated and running, it responds to the following operations:

* Cancel
* Deactivate
* Delete
* Pause
* Resume
* Show
* Upgrade

To check the current state of the pipeline version, you can review it in the web portal or use the CLI commands.

The basic pipeline runtime environment looks like this:
![Typical pipeline lifecycle from predeployment through deployment in a runtime environment](https://files.readme.io/dce7b7fa7817de98c93fa78940a804007a10eda1cc80914df6145694697333a2-pipeline-lifecycle-6e.png "Runtime environment")

### Terminate a pipeline version

A running pipeline version can be terminated via the following operations:

* *Pause*
  * For a batch pipeline version, the current job is completed and future jobs are paused. Thus, the pause may not happen quickly.

  * For a batch pipeline version that is run on-demand, the *Pause* operation is not available. Such a pipeline can only be canceled.

  * For a stream pipeline version, the current state is saved and the job is gracefully terminated.

* *Cancel*
  * For a batch or stream pipeline version, the running job is immediately terminated without saving the state and the pipeline version moves to the `Ready` state.

* *Terminate* (internal)
  * This is an internal operation only. The current job terminates with a success or failure. If the pipeline version is configured to run again, it will be set to a `Scheduled` state, otherwise it will be set to `Ready` state.

<Callout icon="📘" theme="info">
  Note

  **Resume a paused pipeline version**
  A paused pipeline version can be restarted using the *Resume* operation. For a stream pipeline version, the job resumes from the saved state of the paused job. For a batch pipeline version, the pipeline version state is changed to `Scheduled` and the next job is created based on the execution mode.
</Callout>

> A canceled pipeline version cannot be resumed. Instead, it must be activated to return to a `Running` or `Scheduled` state.

### Delete a pipeline

To delete a pipeline, its set of pipeline versions and associated content, please follow [these instructions](https://docs.here.com/workspace/docs/managing-pipelines#delete-a-pipeline).

Note that no running or paused pipeline versions can be deleted, which means that all pipeline versions to be deleted must be in the `Ready` state.

An error is returned if one or more of its pipeline versions are either running or paused.

### Upgrade a pipeline

The purpose of upgrading a pipeline is to replace the existing pipeline version with a new pipeline version that is based on a different pipeline JAR file and/or configuration than the original.

Upgrading a pipeline is possible for both Stream and Batch pipelines, but there are subtle differences:

* In case of a stream pipeline, a savepoint will be taken of the running job and processing will be terminated immediately. The savepoint will be passed to the upgraded pipeline version to start processing from.
* In case of a batch pipeline, the running job will not be terminated. Instead, it is allowed to complete its processing, after which the pipeline version will return to the `Ready` state, while the upgraded pipeline version will be in the `Scheduled` state. In other words, the upgrade does not take effect until the next time the pipeline is scheduled to run.

#### Upgrade states

Stream or batch pipelines can only be upgraded when the version is in the `Running` or `Paused` state. The version used to upgrade to will be in the `Ready` state, since there can be only one version of the pipeline that is not in that state. If you wish to upgrade a pipeline that is in he `Scheduled` state, you can simply deactivate it and then activate the version you wish to upgrade to.

#### Upgrade sequence

1. Create a new pipeline version using an existing or new template.
2. Execute the *Upgrade* operation from the [portal](https://docs.here.com/workspace/docs/managing-pipelines#upgrade-a-pipeline-version).

As part of the upgrade process, the old pipeline version is paused and the new pipeline version is activated. After a couple of minutes, the old pipeline version moves to the `Ready` state and the new pipeline version moves to the `Scheduled` state.

See the image below to understand the process.
![Sequence diagram of pipeline upgrade process execution](https://files.readme.io/e45197c4ffc0203b20af3335f5f4762be248f58b6d47578e69776fe908920978-upgrade-sequence-diagram-2.png "Upgrade sequence")

### Update a pipeline

You can change the name, description, and contact email properties associated with your pipeline instance. All other properties cannot be updated.

#### Update sequence

1. Cancel the running pipeline version. The job will stop processing and transition the pipeline version into a `Ready` state.
2. Use the `Edit pipeline` option of the `More` menu in the top right-hand corner of the page related to specific pipeline to start editing it:

![pipeline-edit.png](https://files.readme.io/9048f70238f0087f2f6f1c1caa9dd2693abc0ef3f5cd56087cac6d1821836e8f-pipeline-edit.png)
3\. After you save the changes, the pipeline instance has its metadata updated and pipeline versions associated with that pipeline can now be run with the new metadata associated with them.

### Group ID/Project ID

Whenever you create a pipeline, you must either assign it to a group by specifying the group ID or a project by specifying the project ID.

Only users and applications that are part of the group or project can access the pipeline. To keep your pipelines private,
restrict the access to yourself or to a group of registered users. These users are identified by the group ID or project ID.

You must have a valid group ID or project ID to work with a pipeline.
`{% if book.product == 'internal' %}`

<Callout icon="🚧" theme="warning">
  Caution

  Please note that with the Pipelines API v3, you can only create pipelines using projects for access control, group-based access is not supported by this version of API.
</Callout>

`{% endif %}`
For more details on groups and projects, see the [Identity & Access Management Guide](/identity-and-access-management/docs/).

<Callout icon="🚧" theme="warning">
  Caution

  **Stream pipelines must use a unique application ID**
  A potential problem exists when you use the same group (or project) ID for a given combination of an application ID, layer ID,
  and catalog ID that can lead to partial data consumption issues. To avoid this situation, create a different group
  or project for every stream pipeline, thus ensuring that each pipeline uses a unique application ID.
  For more information, see [Stream processing best practices](https://docs.here.com/workspace/docs/stream-processing#multiple-pipeline-use).
</Callout>

## See also

* [Build a Batch Pipeline with Maven Archetypes (Java)](/data-sdk/docs/)
* [Build a Batch Pipeline with Maven Archetypes (Scala)](/data-sdk/docs/)
* [Maven Archetypes](https://maven.apache.org/guides/introduction/introduction-to-archetypes.html)
* [Configurations available for pipeline developers](https://docs.here.com/workspace/docs/configurations-for-pipeline-developers)
* [Pipeline Components](https://docs.here.com/workspace/docs/pipeline-components)
* [Pipeline patterns](https://docs.here.com/workspace/docs/pipeline-patterns)