Guides
Guides

AWS EKS

Prerequisites

  1. Install tools:
  2. Create an AWS user with the following policies attached:

Deploy HERE Anonymizer Self-Hosted playbook

You can complete the deployment scenario by running the supplied scripts from the path where you unpacked the Anonymizer.

Run the commands and export environment variables indicated in the code block.

# The full hostname of AWS ECR registry, for example "{ACCOUNT}.dkr.ecr.eu-west-1.amazonaws.com"
export AWS_ECR_HOST="{ACCOUNT}.dkr.ecr.eu-west-1.amazonaws.com"

# AWS region, for example "eu-west-1"
export AWS_REGION="{AWS_REGION}"

# Configure aws-cli credentials in ~/.aws/credentials or export these variables
export AWS_ACCESS_KEY_ID={USER_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY={USER_SECRET_ACCESS_KEY}

# Optionally, configure two private subnets in different availability zones
# where the EKS cluster will be deployed, see https://eksctl.io/usage/vpc-configuration/
# Otherwise, the new VPC will be created.
# AWS_SUBNET1="subnet-xxxxxx"
# AWS_SUBNET2="subnet-xxxxxx"

# Path where the HERE Anonymizer is unzipped
export HERE_ANONYMIZER_DIST={PATH_TO_UNPACKED_ANONYMIZER}

# Build and push container images, create EKS cluster, deploy
# HERE Anonymizer images and RabbitMQ server for demo purposes
./deployments/kubernetes/aws/aws-deploy-eks.sh

# For deploying flink in session mode for batch processing, run:
# ./deployments/kubernetes/aws/aws-deploy-eks.sh batch


# There the HERE Anonymizer is deployed and ready to accept data
# into the RabbitMQ queue.
# Pushing test data:
kubectl exec -i deployment/rabbit -- rabbitmqadmin publish \
  exchange=amq.default \
  routing_key="input-queue" \
  < ./deployments/common/here-probe-example.json

# For running a batch sample, set the environment variables
# APP_CLIENT_AWS_ACCESS_KEY_ID, APP_CLIENT_AWS_SECRET_ACCESS_KEY, APP_CLIENT_AWS_REGION
# APP_S3_BUCKET_IN, APP_S3_BUCKET_OUT and run:
# /deployments/kubernetes/aws/batch-anonymize-sample-data.sh

# Shutdown the cluster and all its resources,
# delete HERE Anonymizer container image
/deployments/kubernetes/aws/aws-shutdown-eks.sh

Additional variables

List of optional variables for configuring aws-deploy-eks.sh and aws-shutdown.sh with their default values:

APP_NAME=hereanoneks
APP_VERSION=latest
AWS_CLUSTER_NAME=hereanoneks-latest
AWS_REGION=eu-west-1
BILLING_TAG=test

Deploy manually

Push container images to ECR

  1. Log in to ECR.

    aws ecr get-login-password --region "$AWS_REGION" | docker login --username AWS --password-stdin $AWS_ECR_HOST
  2. Build and push the HERE Anonymizer container. Running the commands checks if the container registry exists and creates one if a registry doesn't exist.

    APP_IMAGE=${AWS_ECR_HOST}/${APP_NAME}-flink:${APP_VERSION}
    docker build --tag "$APP_IMAGE" $HERE_ANONYMIZER_DIST
    aws ecr describe-repositories --repository-names ${APP_NAME}-flink || \
      aws ecr create-repository --repository-name ${APP_NAME}-flink
    docker push "$APP_IMAGE"
  3. Push the RabbitMQ image. Alternatively, you can use any of the supported connectors.

    RABBIT_IMAGE=$AWS_ECR_HOST/rabbit
    docker pull docker.io/rabbitmq:4.1.0-management
    docker tag rabbitmq:4.1.0-management "$RABBIT_IMAGE"
    aws ecr describe-repositories --repository-names rabbit || \
      aws ecr create-repository --repository-name rabbit
    docker push "$RABBIT_IMAGE"

Create Kubernetes cluster

  1. Use eksctl to create a cluster and all required resources. The creation process takes about 10-15 minutes.

    eksctl create cluster \
        --version 1.27 \
        -n $AWS_CLUSTER_NAME \
        --region $AWS_REGION \
        --tags "app_name=${APP_NAME},app_version=${APP_VERSION}"
  2. Check Kubernetes connection and cluster readiness.

    kubectl wait --for=condition=Ready pods --all --all-namespaces --timeout=300s
    kubectl get nodes,pods,deployments -A

Create S3 bucket for streaming mode (optional)

To use HERE Anonymizer Self-Hosted in the streaming mode, you must create an S3 buckets to store state and checkpoints. This ensures reliable recovery and state management.

Run the following commands to create the required resources:

aws s3api put-object --bucket "$APP_S3_BUCKET_IN" --key "${APP_NAME}"/"${APP_VERSION}"/checkpoints/
aws s3api put-object --bucket "$APP_S3_BUCKET_IN" --key "${APP_NAME}"/"${APP_VERSION}"/savepoints/

Deploy HERE Anonymizer Self-Hosted

  1. Configure the HERE Anonymizer license and deploy it as a Kubernetes Secret:

    export HERE_ANONYMIZER_LICENSE={YOUR_LICENSE}
    envsubst < "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/secrets.template.yml" | kubectl apply -f -
  2. Edit configuration files and environment variables. See Configuration of HERE Anonymizer Self-Hosted for details.

    kubectl create -f "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/env-configmap.yml"
    
    # To run Flink with rescaling and checkpointing enabled for a stream anonymization.
    export AWS_SECRET_ACCESS_KEY={AWS_SECRET_ACCESS_KEY}
    export AWS_ACCESS_KEY_ID={AWS_ACCESS_KEY_ID}
    export CHECKPOINTS_DIR="s3://${APP_S3_BUCKET_IN}/${APP_NAME}/${APP_VERSION}/checkpoints/"
    export SAVEPOINTS_DIR="s3://${APP_S3_BUCKET_IN}/${APP_NAME}/${APP_VERSION}/savepoints/"
    kubectl create -f "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/aws/conf-files-configmap.yml"
    # For flink in session mode for a batch anonymization.
    kubectl create -f "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/conf-files-configmap.yml"
  3. Configure container registry credentials as a Kubernetes secret. Note that the command exports all local Docker credentials.

    kubectl create secret generic regcred \
      --from-file=.dockerconfigjson="$HOME/.docker/config.json" \
      --type=kubernetes.io/dockerconfigjson
  4. As the private container registry is used, container image substitution is required before each deployment.

    IMAGE=$RABBIT_IMAGE envsubst < "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/rabbit-deployment.template.yml" | kubectl apply -f -
    
    # change to 'jobmanager' to run Flink in session mode for a batch anonymization
    export JM_CMD='jobmanager-stream'
    IMAGE=$APP_IMAGE envsubst < "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/jobmanager-deployment.template.yml" | kubectl apply -f -
    
    IMAGE=$APP_IMAGE envsubst < "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/taskmanager-deployment.template.yml" | kubectl apply -f -
    
    IMAGE=$APP_IMAGE envsubst < "${HERE_ANONYMIZER_DIST}/deployments/kubernetes/bootstrap-deployment.template.yml" | kubectl apply -f -
  5. Review all deployed resources:

    kubectl wait --for=condition=Ready pods --all --all-namespaces --timeout=300s
    kubectl get nodes,pods,deployments,jobs,configmaps,secrets,services

Run smoke test

Run a smoke test on the deployed cluster by anonymizing sample probe data.

  1. Run this command:

    kubectl exec -i deployment/rabbit -- rabbitmqadmin publish \
    exchange=amq.default \
    routing_key="input-queue" \
    < ./deployments/common/here-probe-example.json
  2. In a new terminal, run this command to access the Flink UI:

    kubectl port-forward jobs/jobmanager 8081:8081
  3. Next, go to http://localhost:8081 and open Running Jobs -> HERE Anonymizer, then Decode by ... task's Accumulators tab.

  4. Check if the essential metrics like HERE_decoding_point_info_all and HERE_output_point_info_all have a value greater than zero. See all the anonymization metrics explained here.

Cleanup

  1. Use eksctl to clean up all AWS resources created for the Kubernetes cluster, including all EC2 resources. Run the following command:

    eksctl delete cluster --force -n $AWS_CLUSTER_NAME --region $AWS_REGION

    It's recommended to check if the Cloud Formation Stack eksctl-$APP_NAME-$APP_VERISION-cluster (by default eksctl-hereanoneks-latest-cluster) is deleted successfully. You can delete the stack and inherited resources manually if it's not deleted by eksctl.

  2. Remove the HERE Anonymizer Self-Hosted container image:

    aws ecr batch-delete-image \
      --repository-name ${APP_NAME}-flink \
      --image-ids imageTag=${APP_VERSION}

Example of required IAM policies

ECR_CreateRepository

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "ecr:DescribeRegistry",
                "ecr:DescribeRepositories",
                "ecr:CreateRepository"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

ECR_Push

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CompleteLayerUpload",
                "ecr:GetAuthorizationToken",
                "ecr:UploadLayerPart",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "*"
        }
    ]
}

ECR_DeleteImageOrRepository

Optional policy. Required for cleaning recently deployed version of HERE Anonymizer container image in example script ./deployments/kubernetes/aws/aws-shutdown.sh.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchDeleteImage",
                "ecr:DeleteRepository"
            ],
            "Resource": "*"
        }
    ]
}

IRSA for RBAC

HERE Anonymizer Self-Hosted supports token-based authentication in Kubernetes through IRSA (IAM Roles for Service Accounts) for RBAC (Role-Based Access Control) on the EKS cluster.

To learn more about IRSA and IAM in EKS clusters, see IAM Roles for Service Accounts.