Guides
Guides

Custom data formats

You can use custom data formats with HERE Anonymizer Self-Hosted which allows you to avoid adding converter components to your infrastructure.

Custom data formats must be implemented as .jar files and deployed with the main HERE Anonymizer Self-Hosted modules.

Implement custom format

Follow these steps to implement a custom data format using Gradle.

📘

Note

You can follow a similar process to implement a custom format using Maven.

  1. Create a new project using Java 17

  2. Add here-anonymizer.jar as dependency. Use gradle.properties to externalize the absolute path to here-anonymizer.jar.

    here.anonymizer.jar=/home/user/HERE-Anonymizer/here-anonymizer.jar

    For build.gradle use:

    // ...
    dependencies {
            implementation files(project['here.anonymizer.jar'])
    }
  3. Make sure the created .jar doesn't include here-anonymizer.jar. Configure build.gradle:

    // ...
    jar {
      from {
          duplicatesStrategy = DuplicatesStrategy.EXCLUDE
          configurations.runtimeClasspath
                  .filter{!it.name.containsIgnoreCase("here-anonymizer.jar")}
                  .collect{it.isDirectory() ? it : zipTree(it) }
      }
    }
📘

Note

In Maven implementations you can use the provided dependency scope.

  1. Implement the data format converter:

    import com.here.anonymization.probedata.ProbeData;
    import com.here.anonymization.probedata.ProbePoint;
    import com.here.anonymization.probedata.TraceChunk;
    import com.here.anonymization.probedata.converter.ConverterConfiguration;
    import com.here.anonymization.probedata.converter.DataFormatConverter;
    // ...
    public class MyFormatConverter implements DataFormatConverter {
    
      @Override
      public ProbeData toProbeData(byte @NonNull [] bytes) {
          // Implement parsing and return result instead of this mocking.
          return new ProbeData(new TraceChunk(
              "some-trace-id",
              Collections.singleton(new ProbePoint()))
          );
      }
    
      @Override
      public byte[] fromProbeData(@NonNull ProbeData probeData) {
          // Implement encoding to the desired format instead of this mocking.
          return "[mock-my-format-data]".getBytes();
      }
    
      @Override
      public DataFormatConverter configure(ConverterConfiguration configuration) {
          // You can implement some configuration there but make sure this
          // method is overridden. The default implementation will throw UnsupportedOperationException.
          return this;
      }
    }
  2. Implement the data format provider:

    import com.here.anonymization.probedata.converter.DataFormat;
    import com.here.anonymization.probedata.converter.DataFormatProvider;
    //...
    public class MyFormatProvider implements DataFormatProvider {
      @Override
      public Collection<DataFormat> provideDataFormats() {
        return Collections.singleton(new DataFormat("MY_FORMAT", MyFormatConverter.class));
      }
    }
  3. Ensure that the .jar is built correctly, doesn't contain com.here.* classes but has all third-party classes you use.

    ./gradlew clean jar
    jar -tf ./build/libs/custom-format-example.jar | grep "com/here"

Deploy custom format

The generic deployment scenario consists of three steps:

  1. Copy the custom data format .jar into the /opt/flink/usrlib/ path of the deployed Task Managers and the Job Manager.
  2. Update the environment variables of HERE Anonymizer Self-Hosted to use SINK_FORMAT or SOURCE_FORMAT that matches your format's name.
  3. Redeploy HERE Anonymizer Self-Hosted.

Kubernetes-based deployments

  1. Copy the custom data format .jar into the HERE Anonymizer Self-Hosted distribution directory.

    cp $HOME/project/my-format/build/my-format.jar $HERE_ANONYMIZER_DIST/
  2. Change the ./Dockerfile by adding the line:

    COPY ./my-format.jar /opt/flink/usrlib/my-format.jar
  3. Edit $HERE_ANONYMIZER_DIST/deployments/kubernetes/env-configmap.yml to update the SINK_FORMAT or SOURCE_FORMAT variables.

Virtual machine-based deployments

  1. Upload the custom data format .jar to a cloud storage of your choice.
  2. Generate a download URL for this file.
  3. In cloud-config-jobmanager.template.yml and cloud-config-taskmanager.template.yml:
    • Add curl -l "${MY_FORMAT_DOWNLOAD_URL}" -o /opt/flink/usrlib/my-format.jar.
    • Update the SINK_FORMAT or SOURCE_FORMAT variables.