Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Access to the 
    D s item
    itemdeployment
  2. IDE
  3. The Java UDF is stored in the

    D s item
    itemdeployment
     in the following location: libs/custom-udfs-sdk/build/distributions/java-custom-udf-sdk.zip  

    Info

    NOTE: custom-udf-sdk.zip is required for compilation and executing of the unit test. Any JAR files present in custom-udf-sdk.zip , such as trifacta-base-udf.jar, do not need to be packaged in the custom UDF JAR. 


Info

NOTE: If you are installing custom UDFs and the

D s node
does not have an Internet connection, you should download the Java UDF SDK in an Internet-accessible location, build your customer UDF JAR there, and then upload the JAR to the
D s node

...

  1. init method: Used for setting private variables in the UDF. This method may be a no-op function if no variables must be set. See the Example - Concatenate strings 143190720 below. 

    Tip

    Tip: In this method, perform your data validation on the input parameters, including count, data type, and other constraints.


    Info

    NOTE: The init method must be specified but can be empty, if there are no input parameters.


  2. exec method:  Contains functionality of the UDF. The output of the exec method must be one of the supported types. It is also must match the generic as described. In the following example, TrifactaUDF<String> implements a String. This method is run on each record.

    Tip

    Tip: In this method, you should check the number of input columns.


    Warning

    Keep state that varies across calls to the exec method can lead to unexpected behavior. One-time initialization, such as initializing the regex compiler, is safe, but do not allow state information to mutate across calls to exec. This is a known issue.


  3. inputSchema method: The inputSchema method describes the schema of the list on which the exec method is acting. The classes in the schema must be supported. Essentially, you should support the I/O types described earlier.
  4. finish method: The finish method is run at the end of UDF. Typically, it is a no-op.

    Info

    NOTE: If you are executing your UDF on the Spark running environment, the finish method cannot be invoked at this point. Instead, it is invoked as part of the shutdown of the Java VM. This later execution may result in the finish method failing to be invoked in situations like a JVM crash.


...

  • The first line indicates that the function is part of the com.trifacta.trifactaudfs package.
  • The defined UDF class implements the TrifactaUDF class, which is the base interface for UDFs. 
    • It is parameterized with the return type of the UDF (a Java String in this case). 
    • The input into the function is a list with input parameters in the order they are passed to the function within the
      D s platform
      . See Running Your UDF 143190720 below. 
  • The UDF checks the input data for null values, and if any nulls are detected, returns a null. 
  • The inputSchema describes the input list passed into the exec method. 
    • An error is thrown if the type of the data that is passed into the UDF does not match the schema.
    • The UDF must handle improper data. See Error Handling 143190720 below. 

Example - Add by constant

...

  • The init method consumes a list of objects, each of which can be used to set a variable in the UDF. The input into the init function is a list with parameters in the order they are passed to the function within the
    D s platform
    . See Running Your UDF 143190720 below.
Code Block
languagejava
titleExample UDF: AdderUDF
package com.trifacta.trifactaudfs;
import java.io.IOException;
import java.util.List;

/**
 * Example UDF. Adds a constant amount to an Integer column.
 */
public class AdderUDF implements TrifactaUDF<Long> {
  private Long _addAmount;
  @Override
  public void init(List<Object> initArgs) {
    if (initArgs.size() != 1) {
      System.out.println("AdderUDF takes in exactly one init argument");
    }
    Long addAmount = (Long) initArgs.get(0);
    _addAmount = addAmount;
  }
  @Override
  public Long exec(List<Object> input) {
    if (input == null) {
      return null;
    }
    if (input.size() != 1) {
      return null;
    }
    return (Long) input.get(0) + _addAmount;
  }
  @SuppressWarnings("rawtypes")
  public Class[] inputSchema() {
    return new Class[]{Long.class};
  }
  @Override
  public void finish() throws IOException {
  }
}

...

Info

NOTE: Custom UDFs should be compiled to one or more JAR files. Avoid using the example JAR filename, which can be overwritten on upgrade.

...


JDK version mismatches

To avoid an Unsupported major.minor version error during execution, the JDK version used to compile the UDF JAR file should be less than or equal to the JDK version on the Hadoop cluster.

...

  1. D s config
  2. Bump the value for udf-service.udfCommunicationTimeout setting. Raise this value a bit at a time to see if that allows the UDF to execute.

    Info

    NOTE: Avoid setting this value to high, which can cause the Java heap size to be exceeded and another Photon crash. Maximum value is 2147483646.


  3. Save your changes and restart the platform.