GuidesChangelogData Inspector Library API Reference
Guides

Data Processing Library compilation patterns

Data Processing Library compilation patterns

One key feature in the Data Processing Library are its compilation patterns. These patterns guide you to implement incremental distributed compilers. Each task executes one compiler, which can either consist of your code only or a combination of your code with one of the patterns provided.

There are two types of patterns:

  • Functional patterns: provide specific interfaces that guide you to develop the compiler in a more precise way. In these patterns, Spark is hidden in the pattern implementation and the compiler focuses on the business logic. The processing library takes care of the distributed processing and incremental compilation details.
  • Spark RDD-based patterns: expose Spark RDDs, allowing the compiler implementation to perform parallel operations on data and metadata using Spark, such as join, cogroup, filter, or map. In these patterns, the interfaces are less rigid and you may need to actively support incremental compilation.

Table 1: Compilation patterns overview

Compiler ClassIncremental ProcessingReferences to Other TilesGlobal AlgorithmsFunctional or RDD-BasedComplexity
DirectCompilerYesNoNoFunctionalSimple
MapGroupCompilerYesNoNoFunctionalSimple
RefTreeCompilerYesYesNoFunctionalMedium
NonIncrementalCompilerNoYesYesRDDSimple
DepCompilerPartiallyNoYesRDDMedium
IncrementalDepCompilerYesNoYesRDDComplex

Note: Where possible, it is recommended to use functional patterns instead of Spark RDD-based patterns.