Page tree

 

Contents:


This section describes how to run a job using the APIs available in  Trifacta® Wrangler Enterprise.

A note about API URLs:

In the listed examples, URLs are referenced in the following manner:

<protocol>://<platform_base_url>/

In your product, these map references map to the following:

<http or https>://<hostname>:<port_number>/

For more information, see API Reference.

Pre-requisites

Before you begin, you should verify the following:

  1. Get authentication credentials. As part of each request, you must pass in authentication credentials to the Trifacta platform

    Tip: The recommended method is to use an access token, which can be generated from the Trifacta application. For more information, see Access Tokens Page.

    For more information, see API Authentication.

  2. Verify job execution. Run the desired job through the Trifacta application and verify that the output objects are properly generated.
  3. Acquire recipe (wrangled dataset) identifier. In Flow View, click the icon for the recipe whose outputs you wish to generate. Acquire the numeric value for the recipe from the URL. In the following, the recipe Id is 28629:

    https://<platform_base_url>/flows/5479?recipe=28629&tab=recipe
  4. Create output object. A recipe must have at least one output object created for it before you can run a job via APIs. For more information, see Flow View Page.

If you wish to apply overrides to the inputs or outputs of the recipe, you should acquire those identifiers or paths now. For more information, see "Run Job with Parameter Overrides" below. 

Step - Run Job

Through the APIs, you can specify and run a job. To run a job with all default settings, construct a request like the following:

NOTE: A wrangledDataset is an internal object name for the recipe that you wish to run. Please see previous section for how to acquire this value.


Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body
{
  "wrangledDataset": {
    "id": 28629
  }
}
Response Code201 - Created
Response Body
{
    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 961247,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}

If the 201 response code is returned, then the job has been queued for execution. 

Tip: Retain the id value in the response. In the above, 961247 is the internal identifier for the job group for the job. You will need this value to check on your job status.

For more information, see API JobGroups Create v4.

Checkpoint: You have queued your job for execution.


Step - Monitoring Your Job

You can monitor the status of your job through the following endpoint:

Endpoint<protocol>://<platform_base_url>/v4/jobGroups/<id>/status
AuthenticationRequired
MethodGET
Request BodyNone.
Response Code200 - Ok
Response Body
{
    "id": 961247,
    "name": null,
    "description": null,
    "ranfrom": "ui",
    "ranfor": "recipe",
    "status": "Complete",
    "profilingEnabled": true,
    "runParameterReferenceDate": "2019-08-20T17:46:27.000Z",
    "createdAt": "2019-08-20T17:46:28.000Z",
    "updatedAt": "2019-08-20T17:53:17.000Z",
    "workspace": {
        "id": 22
    },
    "creator": {
        "id": 38
    },
    "updater": {
        "id": 38
    },
    "snapshot": {
        "id": 774476
    },
    "wrangledDataset": {
        "id": 28629
    },
    "flowRun": null
}

When the job has successfully completed, the returned status message includes the following:

"status": "Complete",

For more information, see API JobGroups Get v4.

Tip: You have executed the job. Results have been delivered to the designated output locations.

Step - Re-run Job

In the future, you can re-run the job using the same, simple request:

Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body
{
  "wrangledDataset": {
    "id": 28629
  }
}

The job is re-run as it was previously specified.

For more information, see API JobGroups Create v4.

Run Job with Overrides - Files

As needed, you can specify runtime overrides for any of the settings related to the job definition or its outputs. For file-based jobs, these overrides include:

  • Execution environment
  • profiling
  • Output file, format, and other settings

NOTE: Override values applied to a job are not validated. Invalid overrides may cause your job to fail.

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "profiler": true,
        "writesettings": [
          {
            "path": "<new_path_to_output>",
            "format": "csv",
            "header": true,
            "asSingleFile": true
          }
        ]
      },
      "ranfrom": null
    }
    
  3.  In the above example, the job has been launched with the following overrides:
    1. Job will be executed with profiling enabled.
    2. Output is written to a new file path.
    3. Output format is CSV to the designated path. 
    4. Output has a header and is generated as a single file.
  4. A response code of 201 - Created is returned. The response body should look like the following:

    {
    
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962221,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }
  5. Retain the id value, which is the job identifier, for monitoring.

Step - Run Job with Overrides - Tables

You can also pass job definition overrides for table-based outputs. For table outputs, overrides include:

  • Path to database to which to write (must have write access)
  • Connection to write to the target.

    Tip: This identifier is for the connection used to write to the target system. This connection must already exist. For more information on how to retrieve the identifier for a connection, see API Connections Get List v4.

  • Name of output table
  • Target table type

    Tip: You can acquire the target type from the vendor value in the connection response. For more information, see API Connections Get List v4.

  • action:

    Key valueDescription
    createCreate a new table with each publication.
    createAndLoadAppend your data to the table.
    truncateAndLoadTruncate the table and load it with your data.

    dropAndLoad

    Drop the table and write the new table in its place.
  • Identifier of connection to use to write data.
  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "publications": [
          {
            "path": "["prod_db"]",
            "tableName": "Table_CaseFctn2",
            "action": "createAndLoad",
            "targetType": "postgres",
            "connectionId": 3,
          }
        ]
      },
      "ranfrom": null
    }
    
  3.  In the above example, the job has been launched with the following overrides:

    NOTE: When overrides are applied to publishing, any publications that are already attached to the recipe are ignored.

    1. Output path is to the prod_db database, using table name is Table_CaseFctn2.
    2. Output action is "create and load." See above for definitions. 
    3. Target table type is a PostgreSQL table.
  4. A response code of 201 - Created is returned. The response body should look like the following:

    {
    
    
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962222,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }
  5. Retain the id value, which is the job identifier, for monitoring.

Run Job with Parameter Values

If the imported dataset or outputs have parameters defined for them, you can pass overrides of the default parameter values as part of the job definition.

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

     

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "runParameters": {
          "overrides": {
            "data": [{
              "key": "varRegion",
              "value": "02"
            }
          ]}
        },
      },
      "ranfrom": null
    }
    
  3.  In the above example, the specified job has been launched for recipe 28629. The run parameter varRegion has been set to 02 for this specific job. Depending on how it's defined in the flow, this parameter could influence change either of the following:
    1. The source for the imported dataset. 
    2. The path for the generated output.
    3. For more information, see Overview of Parameterization
  4. A response code of 201 - Created is returned. The response body should look like the following:

    {
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962223,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }
  5. Retain the id value, which is the job identifier, for monitoring.


This page has no comments.