Page tree

 

Contents:


Overview

Through the APIs, you can separately manage the outputs associated with an individual recipe. This workflow describes how to create output objects, which are associated with your recipe, and how to publish those outputs to different datastores in varying formats. You can continue to modify the output objects and their related write settings and publications independently of managing the wrangling process. Whenever you need new results, you can reference the wrangled dataset with which your outputs have been associated, and the job is executed and published in the appropriate manner to your targets. 

 Terms...

Relevant terms:

Term Description
outputObjects An outputObject is a definition of one or more types of outputs and how they are generated. It must be associated with a recipe.

NOTE: An outputObject must be created for a recipe before you can run a job on it. One and only one outputObject can be associated with a recipe.

writeSettings A writeSettings object defines file-based outputs within an outputObject. Settings include path, format, compression, and delimiters.
publications A publications object is used to specify a table-based output and is associated with an outputObject. Settings include the connection to use, path, table type, and write action to apply.

NOTE: If you need to make changes for purposes of a specific job run, you can add overrides to the request for the job. These overrides apply only for the current job. For more information, see API JobGroups Create v4.

Basic Workflow

  1. Get the internal identifier for the recipe for which you are building outputs.
  2. Create the outputobject for the recipe. 

  3. Create a writesettings object and associate it with the outputobject.
  4. Run a test job, if desired.
  5. For any publication, get the internal identifier for the connection to use.
  6. Create a publication object and associate it with the outputobject.
  7. Run your job.

Step - Get Recipe ID

To begin, you need the internal identifier for the recipe. 

NOTE: In the APIs, a recipe is identified by its internal name, a wrangled dataset.

 

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/wrangleddatasets
AuthenticationRequired
MethodGET
Request Body

None.

Response:

Status Code200 - OK
Response Body
{
    "data": [
        {
            "id": 11,
            "wrangled": true,
            "createdAt": "2018-11-12T23:06:36.473Z",
            "updatedAt": "2018-11-12T23:06:36.539Z",
            "recipe": {
                "id": 10
            },
            "name": "POS-r01",
            "description": null,
            "referenceInfo": null,
            "activeSample": {
                "id": 11
            },
            "creator": {
                "id": 1
            },
            "updater": {
                "id": 1
            },
            "flow": {
                "id": 4
            }
        },
        {
            "id": 1,
            "wrangled": true,
            "createdAt": "2018-11-12T23:19:57.650Z",
            "updatedAt": "2018-11-12T23:20:47.297Z",
            "recipe": {
                "id": 19
            },
            "name": "member_info",
            "description": null,
            "referenceInfo": null,
            "activeSample": {
                "id": 20
            },
            "creator": {
                "id": 1
            },
            "updater": {
                "id": 1
            },
            "flow": {
                "id": 6
            }
        }
    ]
}

cURL example:

curl -X GET \
  http://www.wrangle-dev.example.com:3005/v4/connections \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache'

 Terms...

Relevant terms:

Term Description
URL URL and method to execute.
authorization Authorization taken to pass to the platform. Basic authorization works.

NOTE: This token must be passed with each request to the platform.

cache-control Cache control setting.
content-type HTTP content type to send. These applications use application/json.

Checkpoint: In the above, let's assume that the recipe identifier of interest is wrangledDataset=11. This means that the flow where it is hosted is flow.id=4. Retain this information for later.

For more information, see API Connections Get v4.

Step - Create OutputObject

Create the outputobject and associate it with the recipe identifier. In the following request, the wrangledDataset identifier that you retrieved in the previous call is applied as the flowNodeId value.

The following example includes an embedded writesettings object, which generates a CSV file output. You can remove this embedded object if desired, but you must create a writesettings object before you can generate an output.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/outputobjects
AuthenticationRequired
MethodPOST
Request Body
{
    "execution": "photon",
    "profiler": true,
    "isAdhoc": true,
    "writeSettings": {
        "data": [
            {
                "delim": ",",
                "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS_01.avro",
                "action": "create",
                "format": "avro",
                "compression": "none",
                "header": false,
                "asSingleFile": false,
                "prefix": null,
                "suffix": "_increment",
                "hasQuotes": false
            }
        ]
    },
    "flowNode": {
        "id": 11
    }
}

Response:

Status Code201 - Created
Response Body
{
    "id": 4,
    "execution": "photon",
    "profiler": true,
    "isAdhoc": true,
    "updatedAt": "2018-11-13T00:20:49.258Z",
    "createdAt": "2018-11-13T00:20:49.258Z",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "flowNode": {
        "id": 11
    }
}

cURL example:

curl -X POST \
  http://www.wrangle-dev.example.com/v4/outputobjects \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
    "execution": "photon",
    "profiler": true,
    "isAdhoc": true,
    "writeSettings": {
        "data": [
            {
                "delim": ",",
                "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS_01.avro",
                "action": "create",
                "format": "avro",
                "compression": "none",
                "header": false,
                "asSingleFile": false,
                "prefix": null,
                "suffix": "_increment",
                "hasQuotes": false
            }
        ]
    },
    "flowNode": {
        "id": 11
    }
}'

 Terms...

Relevant terms:

Term Description
URL URL and method to execute.
authorization Authorization taken to pass to the platform. Basic authorization works.

NOTE: This token must be passed with each request to the platform.

cache-control Cache control setting.
content-type HTTP content type to send. These applications use application/json.

Checkpoint: You've created an outputobject (id=4) and an embedded writesettings object and have associated them with the appropriate recipe flowNodeId=11. You can now run a job for this recipe generating the specified output.



Step - Run a Test Job

Now that outputs have been defined for the recipe, you can just execute a job on the specified recipe flowNodeId=11:

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body
{
  "wrangledDataset": {
    "id": 11
  }
}

Response:

Status Code201 - Created
Response Body
{
    "reason": "JobStarted",
    "sessionId": "4de74ab0-e6db-11e8-89d6-a98f99482612",
    "id": 2
}

NOTE: To re-run the job against its currently specified outputs, writesettings, and publications, you only need the recipe ID. If you need to make changes for purposes of a specific job run, you can add overrides to the request for the job. These overrides apply only for the current job. For more information, see API JobGroups Create v4.

To track the status of the job:

  • You can monitor the progress through the application. 
  • You can monitor progress through the status field by querying the specific job. For more information, see API JobGroups Get v4.

Checkpoint: You've run a job, generating one output in Avro format.

Step - Create WriteSettings Object

Suppose you want to create another file-based output for this outputobject. You can create a second writesettings object, which publishes the results of the job run on the recipe to the specified location.

The following example creates settings for generating a parquet-based output.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/writesettings/
AuthenticationRequired
MethodPOST
Request Body
{
    "delim": ",",
    "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS_r03.pqt",
    "action": "create",
    "format": "pqt",
    "compression": "none",
    "header": false,
    "asSingleFile": false,
    "prefix": null,
    "suffix": "_increment",
    "hasQuotes": false,
    "outputObjectId": 4
}

Response:

Status Code201 - Created
Response Body
{
    "delim": ",",
    "id": 2,
    "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS_r03.pqt",
    "action": "create",
    "format": "pqt",
    "compression": "none",
    "header": false,
    "asSingleFile": false,
    "prefix": null,
    "suffix": "_increment",
    "hasQuotes": false,
    "updatedAt": "2018-11-13T01:07:52.386Z",
    "createdAt": "2018-11-13T01:07:52.386Z",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "outputObject": {
        "id": 4
    }
}

cURL example:

curl -X POST \
  http://www.wrangle-dev.example.com/v4/writesettings \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{    "delim": ",",
    "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS_r03.pqt",
    "action": "create",
    "format": "pqt",
    "compression": "none",
    "header": false,
    "asSingleFile": false,
    "prefix": null,
    "suffix": "_increment",
    "hasQuotes": false,
    "outputObject": {
      "id": 4
    }
}

 Terms...

Relevant terms:

Term Description
URL URL and method to execute.
authorization Authorization taken to pass to the platform. Basic authorization works.

NOTE: This token must be passed with each request to the platform.

cache-control Cache control setting.
content-type HTTP content type to send. These applications use application/json.

Checkpoint: You've added a new writesettings object and associated it with your outputobject (id=4). When you run the job again, the Parquet output is also generated.



Step - Get Connection ID for Publication

To generate a publication, you must identify the connection through which you are publishing the results.

Below, the request returns a single connection to Hive (id=1).

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/connections
AuthenticationRequired
MethodGET
Request Body

None.

Response:

Status Code200 - OK
Response Body
{
    "data": [
        {
            "connectParams": {
                "vendor": "hive",
                "vendorName": "hive",
                "host": "hadoop",
                "port": "10000",
                "jdbc": "hive2",
                "defaultDatabase": "default"
            },
            "id": 1,
            "host": "hadoop",
            "port": 10000,
            "vendor": "hive",
            "params": {
                "jdbc": "hive2",
                "connectStringOptions": "",
                "defaultDatabase": "default"
            },
            "ssl": false,
            "vendorName": "hive",
            "name": "hive",
            "description": null,
            "type": "jdbc",
            "isGlobal": true,
            "credentialType": "conf",
            "credentialsShared": true,
            "uuid": "28415970-e6c4-11e8-82be-9947a31ecdd5",
            "disableTypeInference": false,
            "createdAt": "2018-11-12T21:44:39.816Z",
            "updatedAt": "2018-11-12T21:44:39.842Z",
            "credentials": [],
            "creator":  {
                "id":  1
            },
            "updater":  {
                "id":  1
            },
            "workspace":  {
                "id":  1
            }
        }
    ],
    "count": 1
}

cURL example:

curl -X GET \
  http://www.wrangle-dev.example.com/v4/connections \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json'

 Terms...

Relevant terms:

Term Description
URL URL and method to execute.
authorization Authorization taken to pass to the platform. Basic authorization works.

NOTE: This token must be passed with each request to the platform.

cache-control Cache control setting.
content-type HTTP content type to send. These applications use application/json.

For more information, see API Connections Get List v4.

 

Step - Create a Publication

You can create publications that publish table-based outputs through specified connections. In the following, a Hive table is written out to the default database through connectionId = 1. This publication is associated with the outputObject id=4.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/publications
AuthenticationRequired
MethodPOST
Request Body
{
    "path": [
        "default"
    ],
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}

Response:

Status Code201 - Created
Response Body
{
    "path": [
        "default"
    ],
    "id": 3,
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "updatedAt": "2018-11-13T01:25:39.698Z",
    "createdAt": "2018-11-13T01:25:39.698Z",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}

cURL example:

curl -X POST \
  http://latest-dev.trifacta.net:3005/v4/publications \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
    "path": [
        "default"
    ],
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}' 

 Terms...

Relevant terms:

Term Description
URL URL and method to execute.
authorization Authorization taken to pass to the platform. Basic authorization works.

NOTE: This token must be passed with each request to the platform.

cache-control Cache control setting.
content-type HTTP content type to send. These applications use application/json.

Checkpoint: You're done.

You have done the following:

  1. Created an output object:
    1. Embedded a writesettings object to define an Avro output.
    2. Associated the outputobject with a recipe.
  2. Added another writesettings object to the outputobject.
  3. Added a table-based publication object to the outputobject.

You can now generate results for these three different outputs whenever you run a job (create a jobgroup) for the associated recipe. 

This page has no comments.