init method: Used for setting private variables in the UDF. This method may be a no-op function if no variables must be set. See the Example - Concatenate strings below.
Tip: In this method, perform your data validation on the input parameters, including count, data type, and other constraints.
NOTE: Each UDF requires at least one input parameter.The init method must be specified but can be empty, if there are no input parameters.
- exec method: Contains functionality of the UDF. The output of the exec method must be one of the supported types. It is also must match the generic as described. In the following example,
TrifactaUDF<String>implements a String.
This method is run on each record.
Tip: In this method, you should check the number of input columns.
Keep state that varies across calls to the exec method can lead to unexpected behavior. One-time initialization, such as initializing the regex compiler, is safe, but do not allow state information to mutate across calls to exec. This is a known issue.
- inputSchema method: The inputSchema method describes the schema of the list on which the exec method is acting. The classes in the schema must be supported. Essentially, you should support the I/O types described earlier.
finish method: The finish method is run at the end of UDF. Typically, it is a no-op.
NOTE: If you are executing your UDF on the Spark running environment, the finish method cannot be invoked at this point. Instead, it is invoked as part of the shutdown of the Java VM. This later execution may result in the finish method failing to be invoked in situations like a JVM crash.