Guides
Guides

How to run HERE Anonymizer Preprocessor locally in Docker

This section outlines how to quickly get started with and evaluate HERE Anonymizer Preprocessor by running the application locally in Docker. This allows you to get familiar with the product before deploying to a production environment.

Prerequisites

  • Recommended: Linux-based or macOS machine.
  • Windows machines: install WSL2 to run the supplied shell scripts.
  • Install Docker with docker and docker-compose commands in your PATH.
  • Download the HERE Anonymizer Preprocessor ZIP archive.
📘

Note

To get started, you must run the supplied shell scripts on your machine. Windows machines can't run these scripts natively. To run them on Windows, install WSL2.

Start the application

To run HERE Anonymizer Preprocessor locally in Docker, open a terminal window and navigate to the path where you unpacked the HERE Anonymizer Preprocessor ZIP archive. Then, run ./demo-start.sh (or demo-start.ps1 on Windows) to run a script that sets up a local Docker cluster and starts the application.

📘

Note

If you can't run the script on macOS or Linux, try fixing the executable permission by running chmod a+x ./demo-*.sh or run $SHELL ./demo-start.sh instead.

When the application starts, you can access the following information:

Input data

When you run the demo, HERE probe example data is copied to the MINIO S3 container. This data is then used as the input data set. The example data is included in the HERE Anonymizer Preprocessor package on the dist/deployments/common/here-probe-example-data path.

Preprocessed data

The HERE Anonymizer Preprocessor is a batch data processing application. It keeps the Flink cluster online only for the time required to process the input data set.

The cluster is shut down immediately after the input data set is processed.

To check the preprocessed data:

  1. Check the MINIO web console at http://localhost:9002:
  • To check the report of indexing phase, go to Object Browser > input-bucket > input > INDEXER_REPORT.json. You should see HERE_indexer_input_files_total: 2 and HERE_indexer_input_files_dropped_corrupted: 0.
  • To check the report of preprocessing phase, go to Object Browser > output-bucket > output > PREPROCESSOR_REPORTjson. You should see HERE_preprocessor_input_traces_total: 2, HERE_preprocessor_input_points_total: 200, HERE_preprocessor_output_points_total: 200,HERE_preprocessor_output_files_total: 2, and HERE_preprocessor_output_traces_total: 2.
  • To see the preprocess data, go to Object Browser > output-bucket > output > preprocessed-data.
  1. Check Dozzle at http://localhost:8888 to see the logs of the *_jobmanager and *_taskmanager containers.

Stop the app

Run ./demo-stop.sh to stop the app and shut down all running containers.

Further reading

To better understand how Docker deployments work and take this concept to production, see Standalone deployment.

Troubleshooting

Consult these procedures if you run into issues when running the example.

Can't start the example

  • For macOS and Linux: ensure that the demo-start.sh and demo-stop.sh scripts have the +x permission. Run chmod +x ./demo-*.sh

  • Make sure you installed Docker and Docker Compose.

    $ docker -v
    # Docker version 20.10.23, build 7155243
    $ docker-compose -v
    # docker-compose version 1.29.2, build 5becea4c
  • Ensure that there are no containers left running from a previous time you worked with the example. Run ./demo-stop.sh and list all running containers.

    $ docker ps
    # CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
    $

Demo starts, **-jobmanager container fails

  1. Configure the demo logs at http://localhost:8888/settings to show dropped containers.
  2. Find the ***-jobmanager container.
  3. Check its logs for ERROR or Exception messages

Error connecting to IgniteCache container

If you get the IgniteCache container error:

Exception in thread "main" com.here.anonymization.data.preprocessor.cache.IgniteCacheException: Fail to initiate cache for the preprocessor.
	at com.here.anonymization.data.preprocessor.Indexer.initiateCache(Indexer.java:84)
	at com.here.anonymization.data.preprocessor.Indexer.main(Indexer.java:46)
	/
	/ ...

Check if the container is running with correct port mapping 10800 and check if the cache_endpoint parameter is correctly defined in the SOURCE_URI.

Error connecting to MINIO S3 container

If you get the MINIO S3 container error:

Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at com.here.anonymization.data.preprocessor.Indexer.main(Indexer.java:67)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
	/
	/...

Check if the s3 container is running with correct port-mapping (9000), the init-s3 container has completed successfully and check if the endpoint parameter is correctly defined in both the SOURCE_URI and SINK_URI.

Data preprocessing doesn't work

Check the INDEXER_REPORT in the MINIO web console.

  • If HERE_indexer_input_files_dropped_corrupted changed to (+2), check if the SOURCE_FORMAT is configured correctly and check if the input data follows the configured format. Enable the DEBUG logging level and check logs for more details.
  • If the HERE_indexer_input_files_dropped_corrupted is 0 and the HERE_indexer_input_files_total is 2, it means that indexer should have run correctly. Check the PREPROCESSOR_REPORT.
  • If the PREPROCESSOR_REPORT doesn't exist or shows deviation in any of the following metrics, check the configuration entities (SOURCE_URI, SINK_URI) for the preprocessor and investigate the job managaer logs for errors or exceptions.
HERE_preprocessor_input_traces_total: 2
HERE_preprocessor_input_points_total: 200
HERE_preprocessor_output_traces_total: 2
HERE_preprocessor_output_points_total: 200
HERE_preprocessor_output_files_total: 2

Manifest file size exceeds JDK limits

Certain versions of the Java Development Kit (JDK) require the manifest file to stay within a file size limit. When the file exceeds the limit, you get this error:

Unsupported size: xxx for JarEntry META-INF/MANIFEST.MF. Allowed max size: 8000000 bytes

To fix this issue, increase the maximum manifest file size by adjusting the jdk.jar.maxSignatureFileSize property of the JAVA_TOOL_OPTIONS environment variable. This is a mechanism used to pass startup options and arguments to the Java Virtual Machine (JVM).

For example, use the following configuration to set the maximum file size to approximately 22 MB:

JAVA_TOOL_OPTIONS=-Djdk.jar.maxSignatureFileSize=22000000