Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0682

...

This example walks through the process of creating, identifying, and executing a job through automated methods. For this example, these tasks are accomplished using the following methods:

Info

NOTE: This API workflow applies to a Development instance of the

D s platform
rtrue
, which is the default platform instance type. For more information on Development and Production instance, see Overview of Deployment Manager.

  1. Locate or create flow. The datasets that you wrangle must be contained within a flow. You can add them to an existing flow or create a new one through the APIs.
  2. Create dataset. Through the APIs, you create an imported dataset from an asset that is accessible through one of the established connections. Then, you create the recipe object through the API.
    1. For the recipe, you must retrieve the internal identifier.
    2. Through the application, you modify the recipe for the dataset.
  3. Automate job execution. Using the APIs, you can automate execution of the wrangling of the dataset. 
    1. As needed, this job can be re-executed on a periodic basis or whenever the source files are updated.

...


For more information on creating outputObjects, writeSettings, and publications, see API Workflow - Manage Outputs.

Step - Run Job

Through the APIs, you can specify and run a job. In the above example, you must run the job for the terminal dataset, which is POS-r01 in this case. This dataset contains references to all of the other datasets. When the job is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output reflects the proper integration of these other datasets into POS-r01

Info

NOTE: In the following example, writeSettings have been specified as overrides in the job definition. These overrides are applied for this job run only. If you need to re-run the job with these settings, you must either 1) re-apply the overrides or 2) create the writeSettings objects. see For more information, see API Workflow - Manage Outputs.

...

Tip

Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new file, you can change the primary input dataset for the dataset and then use the following API call to re-run the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4.

Info

NOTE: In the following example, writeSettings have been specified as overrides in the job definition. To re-run the job using the wrangledDataset ID only, you create the writeSettings objects. See API Workflow - Manage Outputs.

Endpointhttp://www.example.com:3005/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body
Code Block
{
  "wrangledDataset": {
    "id": 23
  },
  "overrides": {
    "execution": "photon",
    "profiler": true,
    "writesettings": [
      {
        "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv",
        "action": "create",
        "format": "csv",
        "compression": "none",
        "header": false,
        "asSingleFile": false
      }
    ]
  },
  "ranfrom": null}

...