Amazon S3
The Amazon Simple Storage Service (S3) data connector is implemented for source and sink connections.
This connector supports all URI schemes available in its HERE Anonymizer Self-Hosted version. To learn more, see AWS S3 data connector documentation.
Additionally, the following schemes are supported:
s3+index
s3+indexThis scheme allows to read the raw data from the input bucket and prepare an index to be used in the next phase.
This is a minimal example of a s3+index data connector URI:
s3+index://bucket/path?region=eu-west-1&cache_endpoint=ignite://host:10800
s3+preprocess
s3+preprocessThis scheme allows to read the index prepared in the previous phase from cache and the raw data from the input bucket as defined by the index.
This is a minimal example of a s3+preprocess data connector URI:
s3+preprocess://bucket/path?region=eu-west-1&cache_endpoint=ignite://host:10800
Authentication
This connector supports two authentication methods:
-
Default authentication
This method is used when credentials aren't provided in the connector URI, for example:
s3+index://bucket/path?region=...In that case, the default AWS SDK credentials provider (reading environment variables, system properties,~/.aws/credentials) is used. -
Explicit (static) credentials provider This method is used when the credentials are provided explicitly in the connector URI, for example:
s3+index://aws-access-id:aws-secret-key@bucket/path?region=...
Configuration
Use the following parameters to configure the connector.
The region parameter
region parameterMandatory parameter. Defines the AWS region. To learn more, see Regions, Availability Zones, and Local Zones in AWS documentation.
The cache_endpoint parameter
cache_endpoint parameterMandatory parameter. Defines the endpoint where the cache server is hosted.
The path_resolver_recursive parameter
path_resolver_recursive parameterOptional parameter. Defines if all the files under the specified path must be processed recursively.
Default value: true
The path_resolver_filter parameter
path_resolver_filter parameterOptional parameter. Defines a pattern used to filter files based on their paths. Only files matching the specified pattern will be included during processing.
The value must be in the Java FileSystem compatible format.
For example:
glob:*.json: Matches a path that represents files with the.jsonextension.glob:*.{json,pb}: Matches file name with.jsonor.pbextensions.
No filter is defined by default, meaning all the files found under the specified path will be processed.
The endpoint parameter
endpoint parameterOptional parameter. Enables the use of an alternatively hosted S3 API, such as a MinIO deployment.
Updated 16 days ago