Guides
Guides

Azure VM

Prerequisites

  1. Install tools:
  2. Create an Azure Storage Account

Deploy HERE Anonymizer Self-Hosted playbook

You can complete the deployment scenario by running the supplied scripts from the path where you unpacked HERE Anonymizer Self-Hosted.

Run the commands and export environment variables indicated in the code block.

# Azure Storage Account name (just the name, not the FQDN)
export AZURE_STORAGE_ACCOUNT={YOUR_STORAGE_ACCOUNT_NAME}
export HERE_ANONYMIZER_DIST={PATH_TO_UNPACKED_ANONYMIZER}
export HERE_ANONYMIZER_LICENSE={YOUR_LICENSE_FILE_CONTENT}

# You can use login from a service principal, see an example in
# ./deployments/kubernetes/azure/azure-login.sh
azure login

# Main deployment scenario including uploading here-anonymizer.jar to Azure storage,
# creating RabbitMQ VM, JobManager VM and the ScaleSet of two instances for
# Task Managers. The script produces two files:
# - ./azure-deploy-vm.env: contains env variables for connecting to the JobManager and RabbitMQ VMs
# - ./azure-deploy-vm.jobmanager.log: contains logs of JobManager
./deployments/vm/azure/azure-deploy-vm.sh

# For deploying flink in session mode for batch processing, run:
# ./deployments/vm/azure/azure-deploy-vm.sh batch

# Export SSH_RABBITMQ_CMD and SSH_JOBMANAGER_CMD from ./azure-deploy-vm.env
set -a ; . ./azure-deploy-vm.env ; set +a

# There the anonymizer is deployed and ready to accept data into RabbitMQ queue
# Pushing test data:
$SSH_RABBITMQ_CMD 'rabbitmqadmin publish exchange=amq.default routing_key="input-queue"' \
  < ./deployments/common/here-probe-example.json
# Last command output must be: Message published

# Getting JobID and print metrics statistic
JOB_ID=$($SSH_JOBMANAGER_CMD curl -s "http://localhost:8081/jobs" | jq -r '.jobs[0].id')
$SSH_JOBMANAGER_CMD curl -s "http://localhost:8081/jobs/${JOB_ID}/accumulators" | jq

# For running a batch sample, set the environment variables
# APP_CLIENT_AZURE_STORAGE_ACCOUNT and APP_CLIENT_AZURE_STORAGE_KEY and run:
# /deployments/vm/azure/batch-anonymize-sample-data.sh

# Shutdown the cluster and all its resources
./deployments/vm/azure/azure-shutdown-vm.sh

Additional variables

List of optional variables for configuring azure-deploy-vm.sh and azure-shutdown-vm.sh with their default values:

export APP_NAME="hereanonvm"
export APP_VERSION="latest"
export AZURE_RESOURCE_GROUP="hereanonvm-latest"
export AZURE_LOCATION="germanywestcentral"
export AZURE_VM_IMAGE="Ubuntu2204"
export AZURE_VM_SIZE="Standard_B2s"
export TAGS="no_user_tags=true"

Deploy manually

Login and create resource group

  1. Log in to Azure using the CLI.

    # User must have the Contributor role
    az login
  2. Create a resource group.

    az group create --name "$AZURE_RESOURCE_GROUP" --location "$AZURE_LOCATION"

Upload HERE Anonymizer Self-Hosted files

  1. Create an Azure Storage container.

    CONTAINER="${APP_NAME}-${APP_VERSION}"
    az storage container create -n "${CONTAINER}"
  2. Upload distribution files and anonymization config.

    az storage blob upload -f "./here-anonymizer.jar" -c "${CONTAINER}"
    az storage blob upload -f "./azure-blob-connector.jar" -c "${CONTAINER}"
    az storage blob upload -f "./rabbit-connector.jar" -c "${CONTAINER}"
    az storage blob upload -f "./simple-anonymization.conf" -c "${CONTAINER}" -n "anonymization.conf"
    TMP_DIR=$(mktemp -d -t "${APP_NAME}-${APP_VERSION}-XXXXXX")
    curl -L "${FLINK_DIST_DOWNLOAD_URL}" -o "${TMP_DIR}/flink.tgz"
    az storage blob upload -f "${TMP_DIR}/flink.tgz" -c "${CONTAINER}"
    az storage blob upload -f "./deployments/vm/azure/flink-install.sh" -c "${CONTAINER}"
    az storage blob upload -f "./deployments/vm/azure/rabbit-install-and-run.sh" -c "${CONTAINER}"

Deploy RabbitMQ virtual machine

📘

Note

The RabbitMQ deployment is used for demo purposes only. For production deployments, configure SOURCE_URI and SINK_URI in

all of the cloud-config-***.template.yml to point to production data streams.

  1. Create a virtual machine.

    VM_RABBIT="${APP_NAME}-${APP_VERSION}-rabbit"
    az vm create \
    --name "${VM_RABBIT}" \
    --image "${AZURE_VM_IMAGE}" \
    --size "${AZURE_VM_SIZE}" \
    --vnet-name "${AZURE_RESOURCE_GROUP}-vnet" \
    --subnet "${AZURE_RESOURCE_GROUP}-subnet" \
    --admin-username "azureuser" \
    --generate-ssh-keys \
    --tags ${TAGS} \
    --public-ip-sku Standard
    
    IP_RABBIT=$(az vm show --name "${VM_RABBIT}" --show-details -o tsv --query publicIps)
  2. Apply the CustomScript Azure VM extension.

    envsubst < "./deployments/vm/azure/rabbit-custom-script.template.json" > "${TMP_DIR}/rabbit-custom-script.json"
    az vm extension set --vm-name "${VM_RABBIT}" -n CustomScript --publisher Microsoft.Azure.Extensions \
    --protected-settings "${TMP_DIR}/rabbit-custom-script.json"
📘

Note

Optionally, you can enable the 15672 RabbitMQ UI port and access management console at http://${IP_RABBIT}:15672/. Note that this link uses the default and unsecure guest:guest credentials.

az vm open-port --port 15672 --resource-group $AZURE_RESOURCE_GROUP --name ${VM_RABBIT}
  1. Prepare SOURCE_URI and SINK_URI variables for configuring HERE Anonymizer Self-Hosted.
export SOURCE_URI=rabbit://guest:guest@${TMP_VM_NAME}:5672/input-queue
export SINK_URI=rabbit://guest:guest@${TMP_VM_NAME}:5672/output-queue

Deploy JobManager virtual machine

  1. Create a virtual machine.

    VM_FLINK_JM="${APP_NAME}-${APP_VERSION}-jobmanager"
    az vm create \
    --name "${VM_FLINK_JM}" \
    --image "${AZURE_VM_IMAGE}" \
    --size "${AZURE_VM_SIZE}" \
    --vnet-name ${AZURE_RESOURCE_GROUP}-vnet \
    --subnet ${AZURE_RESOURCE_GROUP}-subnet \
    --admin-username "azureuser" \
    --generate-ssh-keys \
    --tags ${TAGS} \
    --public-ip-sku Standard
    
    IP_FLINK_JM=$(az vm show --name "${VM_FLINK_JM}" --show-details -o tsv --query publicIps)
  2. Configure and upload environment configuration.

    JM_RPC_HOST=$VM_FLINK_JM PARALLELISM_DEFAULT=$TASKMANAGER_INSTANCE_COUNT \
    SOURCE_URI=$SOURCE_URI SINK_URI=$SINK_URI \
    envsubst \
      < "./deployments/vm/azure/flink-config.template.env" \
      > "${TMP_DIR}/config.env"
    az storage blob upload -f "${TMP_DIR}/config.env" -c "${CONTAINER}" --overwrite
  3. Apply the CustomScript Azure VM extension to install flink.

    JM_CMD=jobmanager-stream
    # For running Flink in session mode for batch processing
    #JM_CMD=jobmanager
    FLINK_CMD=$JM_CMD \
    envsubst \
      < "./deployments/vm/azure/flink-custom-script.template.json" \
      > "${TMP_DIR}/flink-jm-custom-script.json"
    az vm extension set --no-wait --vm-name "${VM_FLINK_JM}" -n CustomScript --publisher Microsoft.Azure.Extensions \
    --protected-settings "${TMP_DIR}/flink-jm-custom-script.json"
  4. Start the jobmanager process.

    # Start bootstrap service for keeping the license and anonymization config during streaming
    ssh azureuser@${IP_FLINK_JM} "nohup java -cp '/opt/flink/usrlib/*:/opt/flink/lib/*' com.here.anonymization.opa.flink.BootstrapServer > /opt/flink/bootstrap-server.log 2>&1 < /dev/null &"
    # Load initial license and anonymization config into bootstrap service
    ssh azureuser@${IP_FLINK_JM} "java -cp '/opt/flink/usrlib/*:/opt/flink/lib/*' com.here.anonymization.opa.flink.Bootstrap"
    # Start main anonymization stream
    ssh azureuser@${IP_FLINK_JM} "/opt/flink/bin/standalone-job.sh start --job-classname com.here.anonymization.opa.flink.MainStream"
📘

Note

Optionally, you can enable the 8081 Flink UI port and it at http://${IP_FLINK_JM}:8081/. Note that this link is not secured.

az vm open-port --port 8081 --resource-group $AZURE_RESOURCE_GROUP --name ${VM_FLINK_JM}

Deploy Flink Task Managers Scale Set

  1. Create virtual machines scale set.

    VM_FLINK_TM="${APP_NAME}-${APP_VERSION}-taskmanager"
    az vmss create \
    --orchestration-mode Uniform \
    --upgrade-policy-mode Automatic \
    --name "$VM_FLINK_TM" \
    --image "$AZURE_VM_IMAGE" \
    --vnet-name ${AZURE_RESOURCE_GROUP}-vnet \
    --subnet ${AZURE_RESOURCE_GROUP}-subnet \
    --vm-sku "$AZURE_VM_SIZE" \
    --instance-count $TASKMANAGER_INSTANCE_COUNT \
    --admin-username "azureuser" \
    --generate-ssh-keys \
    --public-ip-address-allocation static --public-ip-per-vm \
    --lb-sku Standard \
    --tags ${TAGS}
  2. Apply the CustomScript Azure VM extension to install flink.

    FLINK_CMD="taskmanager" \
      envsubst \
      < "./deployments/vm/azure/flink-custom-script.template.json" \
      > "${TMP_DIR}/flink-tm-custom-script.json"
    # The extension will upgrade all the VM in scale set only if --orchestration-mode Uniform and --upgrade-policy-mode Automatic
    az vmss extension set --vmss-name "${VM_FLINK_TM}" -n CustomScript --publisher Microsoft.Azure.Extensions \
    --protected-settings "${TMP_DIR}/flink-tm-custom-script.json"
  3. Start the taskmanager process.

    for ip in $(az vmss list-instance-public-ips --name $VM_FLINK_TM --query "[*].ipAddress" -o tsv); do
      echo "Starting flink on taskmanager $ip"
      ssh azureuser@${ip} "/opt/flink/bin/taskmanager.sh start"
    done

Check the deployed HERE Anonymizer Self-Hosted

To smoke test HERE Anonymizer Self-Hosted, publish an example probe file and check if the Anonymization metrics changed.

  1. Publish the example probe file.

    ssh azureuser@${IP_RABBIT} 'rabbitmqadmin publish exchange=amq.default routing_key="input-queue"' \
      < ./deployments/common/here-probe-example.json
  2. Check if anonymization metrics are not empty.

    JOB_ID=$(ssh azureuser@${IP_FLINK_JM} curl -s "http://localhost:8081/jobs" | jq -r '.jobs[0].id')
    ssh azureuser@${IP_FLINK_JM} curl -s "http://localhost:8081/jobs/${JOB_ID}/accumulators" | jq

Cleanup

  1. Remove the Azure Resource Group to delete all associated resources.

    az group delete --name "$AZURE_RESOURCE_GROUP" -y
  2. Delete the container created for the uploaded application files.

    az storage container delete \
    --name "${APP_NAME}-${APP_VERSION}" \
    --account-name "$AZURE_STORAGE_ACCOUNT"