Guides
Guides

AWS S3

The AWS S3 connector is designed for use with the Amazon Simple Storage Service (S3).

It supports 2 operational modes:

Batch data processing

This is a minimal example of an S3 connector URI for processing batch data:

s3+batch+files://bucket/path?region=eu-west-1

When the batch data processing is configured to use the example URI as a source, the HERE Anonymizer Self-Hosted reads the available files according to the defined filter from the given bucket.

When the batch data processing is configured to use the example URI as a sink, the HERE Anonymizer Self-Hosted writes the anonymized data as files to that location.

Please see explicit authentication and optional parameters in the chapters below.

Runtime configuration

This is a minimal example of an S3 connector URI for runtime configuration:

s3+management://bucket/path?region=eu-west-1

When the anonymization management queue is configured to use the example URI, HERE Anonymizer Self-Hosted reads the following predefined S3 objects in one-minute intervals:

  • s3://bucket/path/anonymization.conf
  • s3://bucket/path/HERE_ANONYMIZER_LICENSE

Please see explicit authentication and optional parameters in the chapters below.

Logging

HERE Anonymizer Self-Hosted logs one of the following messages, depending on the state of the objects:

  • Both objects recognized, detected changes to objects

    INFO  c.h.a.f.c.s3.S3FilesSourceConnector - Configured s3 files source of s3://bucket/path/ reading every PT1M all the files: [HERE_ANONYMIZER_LICENSE, anonymization.conf]
    ...
    INFO  c.h.a.f.c.s3.S3FilesSourceFunction - File bucket/path/HERE_ANONYMIZER_LICENSE has been changed, downloading new version
    INFO  c.h.a.f.c.s3.S3FilesSourceFunction - File bucket/path/anonymization.conf has been changed, downloading new version
  • Objects detected, no changes to objects (checked by S3 Object's eTag) - the system doesn't log any messages.

  • One or both objects unavailable

    WARN  c.h.a.f.c.s3.S3FilesSourceFunction - Unable to read s3://bucket/path/anonymization.conf: null (Service: S3, Status Code: 404, Request ID: *******, Extended Request ID: ******) (Service: S3, Status Code: 404, Request ID: *******)

Authentication

This connector supports two authentication methods:

  • Default authentication

    This method is used when credentials aren't provided in the connector URI, for example: s3+management://bucket/path?region=... In that case, the default AWS SDK credentials provider (reading environment variables, system properties, ~/.aws/credentials) is used.

  • Explicit (static) credentials provider

    This method is used when the credentials are provided explicitly in the connector URI, for example: s3+batch+files://aws-access-id:aws-secret-key@bucket/path?region=...

Configuration

Use the following parameters to configure the connector.

The region parameter

Mandatory parameter. Defines the AWS region.

The poll_interval parameter

Optional parameter. Defines the time interval for checking for changes in the configured files. The value must be in the Java Duration format.

This parameter is applicable for runtime configuration mode only.

Default value: PT1M (1 minute)

The endpoint parameter

Optional parameter. Enables the use of alternatively hosted S3 API.

The files parameter

Optional parameter. Comma-separated list of files to read. Enables the use of alternative filenames for the managed objects (configuration and license).

This parameter is applicable for runtime configuration mode only.

Default value: anonymization.conf,HERE_ANONYMIZER_LICENSE

With the parameter configured to s3+management://bucket/path/to/folder?files=new-config-name.conf,CUSTOM_LICENSE_NAME the connector observes these S3 objects:

  • s3://bucket/path/to/folder/new-config-name.conf
  • s3://bucket/path/to/folder/CUSTOM_LICENSE_NAME

The path_resolver_recursive parameter

Optional parameter. Defines if the S3 keys should be recursively traversed in a directory-like way with / character as a delimiter.

This parameter is applicable for batch data processing mode only.

Default value: true

The path_resolver_filter parameter

Optional parameter. Defines the PathMatcher pattern which should be applied as a filter to the overall S3 object list relative to the given bucket and path of URI.

This parameter is applicable for batch data processing mode only.

Default value: glob:*