Guides
Guides

Azure Blob Storage

The Azure Blob Storage data connector is implemented for source and sink connections.

This connector supports all URI schemes available in its HERE Anonymizer Self-Hosted version. To learn more, see Azure Blob Storage connector documentation.

Additionally, the following schemes are supported:

az+blob+index scheme

This scheme allows to read the raw data from the input container and prepare an index to be used in the next phase.

This is a minimal example of a az+blob+index data connector URI:

az+blob+index://accName:AccKey@container/path?endpoint=https://<account-name>.blob.core.windows.net&cache_endpoint=ignite://host:10800

az+blob+preprocess scheme

This scheme allows to read the index prepared in the previous phase from cache and the raw data from the input bucket as defined by the index.

This is a minimal example of a az+blob+preprocess data connector URI:

az+blob+preprocess://accName:AccKey@container/path?endpoint=https://<account-name>.blob.core.windows.net&cache_endpoint=ignite://host:10800

Authentication

This connector supports authentication through a static credentials provider, where the credentials are provided explicitly in the connector URI.

For example, the credentials in this URI are AccName:AccKey: az+blob+preprocess://AccName:AccKey@container/path?endpoint=...

Configuration

Use the following parameters to configure the connector.

The endpoint parameter

Mandatory parameter. Defines the Azure Blob Storage URL that follows the https://AccountName.blob.core.windows.net/ format, or the custom domain name URL, for example https://myblob.example.com.

The cache_endpoint parameter

Mandatory parameter. Defines the endpoint where the cache server is hosted.

The path_resolver_recursive parameter

Optional parameter. Defines if all the files under the specified path must be processed recursively.

Default value: true

The path_resolver_filter parameter

Optional parameter. Defines a pattern used to filter files based on their paths. Only files matching the specified pattern will be included during processing.

The value must be in the Java FileSystem compatible format.

For example:

  • glob:*.json: Matches a path that represents files with the .json extension.
  • glob:*.{json,pb}: Matches file name with .json or .pb extensions.

No filter is defined by default, meaning all the files found under the specified path will be processed.