...
- Access to the
D s item item deployment - IDE
The Java UDF is stored in the
in the following location:D s item item deployment libs/custom-udfs-sdk/build/distributions/java-custom-udf-sdk.zip
Info NOTE:
custom-udf-sdk.zip
is required for compilation and executing of the unit test. Any JAR files present incustom-udf-sdk.zip
, such astrifacta-base-udf.jar
, do not need to be packaged in the custom UDF JAR.
Info | ||
---|---|---|
NOTE: If you are installing custom UDFs and the
|
...
init method: Used for setting private variables in the UDF. This method may be a no-op function if no variables must be set. See the Example - Concatenate strings 148821598 below.
Tip Tip: In this method, perform your data validation on the input parameters, including count, data type, and other constraints.
Info NOTE: The init method must be specified but can be empty, if there are no input parameters.
exec method: Contains functionality of the UDF. The output of the exec method must be one of the supported types. It is also must match the generic as described. In the following example,
TrifactaUDF<String>
implements a String. This method is run on each record.Tip Tip: In this method, you should check the number of input columns.
Warning Keep state that varies across calls to the exec method can lead to unexpected behavior. One-time initialization, such as initializing the regex compiler, is safe, but do not allow state information to mutate across calls to exec. This is a known issue.
- inputSchema method: The inputSchema method describes the schema of the list on which the exec method is acting. The classes in the schema must be supported. Essentially, you should support the I/O types described earlier.
finish method: The finish method is run at the end of UDF. Typically, it is a no-op.
Info NOTE: If you are executing your UDF on the Spark running environment, the finish method cannot be invoked at this point. Instead, it is invoked as part of the shutdown of the Java VM. This later execution may result in the finish method failing to be invoked in situations like a JVM crash.
...
- The first line indicates that the function is part of the
com.trifacta.trifactaudfs
package. - The defined UDF class implements the
TrifactaUDF
class, which is the base interface for UDFs.- It is parameterized with the return type of the UDF (a Java
String
in this case). - The input into the function is a list with input parameters in the order they are passed to the function within the
. See Running Your UDF 148821598 below.D s platform
- It is parameterized with the return type of the UDF (a Java
- The UDF checks the input data for null values, and if any nulls are detected, returns a null.
- The
inputSchema
describes the input list passed into the exec method.- An error is thrown if the type of the data that is passed into the UDF does not match the schema.
- The UDF must handle improper data. See Error Handling 148821598 below.
Example - Add by constant
...
- The init method consumes a list of objects, each of which can be used to set a variable in the UDF. The input into the init function is a list with parameters in the order they are passed to the function within the
. See Running Your UDF 148821598 below.D s platform
Code Block | ||||
---|---|---|---|---|
| ||||
package com.trifacta.trifactaudfs; import java.io.IOException; import java.util.List; /** * Example UDF. Adds a constant amount to an Integer column. */ public class AdderUDF implements TrifactaUDF<Long> { private Long _addAmount; @Override public void init(List<Object> initArgs) { if (initArgs.size() != 1) { System.out.println("AdderUDF takes in exactly one init argument"); } Long addAmount = (Long) initArgs.get(0); _addAmount = addAmount; } @Override public Long exec(List<Object> input) { if (input == null) { return null; } if (input.size() != 1) { return null; } return (Long) input.get(0) + _addAmount; } @SuppressWarnings("rawtypes") public Class[] inputSchema() { return new Class[]{Long.class}; } @Override public void finish() throws IOException { } } |
...
Info |
---|
NOTE: Custom UDFs should be compiled to one or more JAR files. Avoid using the example JAR filename, which can be overwritten on upgrade. |
...
JDK version mismatches
To avoid an Unsupported major.minor version
error during execution, the JDK version used to compile the UDF JAR file should be less than or equal to the JDK version on the Hadoop cluster.
...