Page tree

 

Contents:


Overview

This example workflow describes how to run jobs on datasets with parameters. A dataset with parameters is a dataset in which some part of the path to the data objects has been parameterized. Since one or more of the parts of the path can vary, you can build a dataset with parameters to capture data that spans multiple files. For example, datasets with parameters can be used to parameterize serialized data by region or data or other variable.

NOTE: This API workflow only works with version 4 (v4) or later of the APIs.

For more information on datasets with parameters, see Overview of Parameterization.

Basic Workflow

The basic method by which you build and run a job for a dataset with parameters is very similar to the non-parameterized dataset method with a few notable exceptions. The steps in this workflow follow the same steps for the standard workflow. Where the steps overlap links have been provided to the non-parameterized workflow. For more information, see API Workflow - Develop a Flow.

Example Datasets

This example covers three different datasets, each of which features a different type of dataset with parameters. 

Example NumberParameter TypeDescription
1Datetime parameterIn this example, a directory is used to store daily orders transactions. This dataset must be defined with a Datetime parameter to capture the preceding 7 days of data. Jobs can be configured to process all of this data as it appears in the directory.
2VariableThis dataset segments data into four timezones across the US. These timezones are defined using the following text values in the path: pacific, mountain, central, and eastern. In this case, you can create a parameter called region, which can be overridden at runtime to be set to one of these four values during job execution.
3Pattern parameter

This example is a directory containing point-of-sale transactions captured into individual files for each region. Since each region is defined by a numeric value (01, 02, 03), the dataset can be defined using a pattern parameter.

Step - Create Containing Flow

You must create the flow to host your dataset with parameters. 

In the response, you must capture and retain the flow Identifer. 

For more information, see API Workflow - Develop a Flow.

Step - Create Datasets with Parameters

NOTE: When you import a dataset with parameters, only the first matching dataset is used for the initial file. If you want to see data from other matching files, you must collect a new sample within the Transformer page.

Example 1 - Dataset with Datetime parameter

Suppose your files are stored in the following paths: 

MyFiles/1/Datetime/2018-04-06-orders.csv
MyFiles/1/Datetime/2018-04-05-orders.csv
MyFiles/1/Datetime/2018-04-04-orders.csv
MyFiles/1/Datetime/2018-04-03-orders.csv
MyFiles/1/Datetime/2018-04-02-orders.csv
MyFiles/1/Datetime/2018-04-01-orders.csv
MyFiles/1/Datetime/2018-03-31-orders.csv

When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.

In the window, select the date value (e.g. YYYY-MM-DD) and then click the Datetime icon.

Datetime Parameter:

  • Format: YYYY-MM-DD
  • Date Range: Date is last 7 days.
  • Click Save.

The Datetime parameter should match with all files in the directory. Import this dataset and wrangle it.

After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL. 

For purposes of this example, here are some key values:

  • flowId: 35
  • recipeId: 127

Example 2 - Dataset with Variable

Suppose your files are stored in the following paths: 

MyFiles/1/variable/census-eastern.csv
MyFiles/1/variable/census-central.csv
MyFiles/1/variable/census-mountain.csv
MyFiles/1/variable/census-pacific.csv

When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.

In the window, select the region value, which could be one of the following depending on the file: easterncentralmountain, or pacific. Click the Variable icon.

Variable Parameter:

  • Name: region
  • Default Value:Set this default to pacific.
  • Click Save.

In this case, the variable only matches one value in the directory. However, when you apply runtime overrides to the region variable, you can set it to any value.

Import this dataset and wrangle it.

After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL. 

For purposes of this example, here are some key values:

  • flowId: 33
  • recipeId: 123

Example 3 - Dataset with pattern parameter

Suppose your files are stored in the following paths: 

MyFiles/1/pattern/POS-r01.csv
MyFiles/1/pattern/POS-r02.csv
MyFiles/1/pattern/POS-r03.csv

When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.

In the window, select the two numeric digits (e.g. 02). Click the Pattern icon. 

Pattern Parameter:

  • Type: Regular expression
  • Matching regular expression: [0-9][0-9]
  • Click Save.

In this case, the regular expression should match any sequence of two digits in a row. In the above example, this expression matches: 0102, and 03, all of the files in the directory.

Import this dataset and wrangle it.

After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL. 

For purposes of this example, here are some key values:

  • flowId: 32
  • recipeId: 121

Checkpoint: You have created flows for each type of dataset with parameters.

Step - Wrangle Data

After you have created your dataset with parameter, you can wrangle it through the application. For more information, see Transformer Page.

Step - Run Job

Below, you can review the API calls to run a job for each type of dataset with parameters, including relevant information about overrides. 

Example 1 - Dataset with Datetime parameter

In the following example, the Datetime parameter has been overridden with the value 2018-04-03 as part of the job creation.

NOTE: You cannot apply overrides to these types of datasets with parameters.

 

  1. Endpointhttp://www.example.com:3005/v4/jobGroups
    AuthenticationRequired
    MethodPOST
    Request Body
    {
      "wrangledDataset": {
        "id": 127
      },
      "overrides": {
        "execution": "photon",
        "profiler": true,
        "writesettings": [
          {
            "path": "MyFiles/queryResults/joe@example.com/2018-04-03-orders.csv",
            "action": "create",
            "format": "csv",
            "compression": "none",
            "header": false,
            "asSingleFile": false
          }
        ],
        "runParameters": {}
        }
      }
    }
    
  2.  In the above example, the job has been launched for recipe 127 to execute on the Photon running environment with profiling enabled. 
    1. Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
    2. Output is written as a new file with no overwriting of previous files.
  3. A response code of 201 - Created is returned. The response body should look like the following:

    {
        "reason": "JobStarted",
        "sessionId": "5b883530-3920-11e8-a37a-db6dae3c6e43",
        "id": 29,
        "jobs": {
            "data": [
                {
                    "id": 62
                },
                {
                    "id": 63
                }
            ]
        }
    }
  4. Retain the jobgroupId=29 value for monitoring. 

Example 2 - Dataset with Variable

In the following example, the region variable has been overwritten with the value central to execute the job on orders-central.csv:

  1. Endpointhttp://www.example.com:3005/v4/jobGroups
    AuthenticationRequired
    MethodPOST
    Request Body
    {
      "wrangledDataset": {
        "id": 123
      },
      "overrides": {
        "execution": "photon",
        "profiler": true,
        "writesettings": [
          {
            "path": "MyFiles/queryResults/joe@example.com/region-eastern.csv",
            "action": "create",
            "format": "csv",
            "compression": "none",
            "header": false,
            "asSingleFile": false
          }
        ]
      }
      "runParameters": {
        "overrides": {
          "data": [{
            "key": "region",
            "value": "central"
          }
        ]}
      }
    }
  2.  In the above example, the job has been launched for recipe 123 to execute on the Photon running environment with profiling enabled. 
    1. Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
    2. Output is written as a new file with no overwriting of previous files.
  3. A response code of 201 - Created is returned. The response body should look like the following:

    {
        "reason": "JobStarted",
        "sessionId": "aa0f9f00-391f-11e8-a37a-db6dae3c6e43",
        "id": 27,
        "jobs": {
            "data": [
                {
                    "id": 58
                },
                {
                    "id": 59
                }
            ]
        }
    }
  4. Retain the jobgroupId=27 value for monitoring. 

 

Example 3 - Dataset with pattern parameter

In the following example, the value 02 has been inserted into the pattern to execute the job on POS-r02.csv

 

NOTE: You cannot apply overrides to these types of datasets with parameters.


  1. Endpointhttp://www.example.com:3005/v4/jobGroups
    AuthenticationRequired
    MethodPOST
    Request Body
    {
      "wrangledDataset": {
        "id": 121
      },
      "overrides": {
        "execution": "photon",
        "profiler": false,
        "writesettings": [
          {
            "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@trifacta.local/POS-r02.txt",
            "action": "create",
            "format": "csv",
            "compression": "none",
            "header": false,
            "asSingleFile": false
          }
        ],
        "runParameters": {}
      }
    }
  2.  In the above example, the job has been launched for recipe 121 to execute on the Photon running environment with profiling enabled. 
    1. Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
    2. Output is written as a new file with no overwriting of previous files.
  3. A response code of 201 - Created is returned. The response body should look like the following:

    {
        "reason": "JobStarted",
        "sessionId": "16424a60-3920-11e8-a37a-db6dae3c6e43",
        "id": 28,
        "jobs": {
            "data": [
                {
                    "id": 60
                },
                {
                    "id": 61
                }
            ]
        }
    }
  4. Retain the jobgroupId=28 value for monitoring. 

Step - Monitoring Your Job

After the job has been created and you have captured the jobGroup Id, you can use it to monitor the status of your job. For more information, see API JobGroups Get Status v3.

Step - Re-run Job

If you need to re-run the job as specified, you can use the wrangledDataset identifier to re-run the most recent job.

Tip: When you re-run a job, you can change any variable values as part of the request.

Example request:

Endpointhttp://www.example.com:3005/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body
{
  "wrangledDataset": {
    "id": 123
  },
  "runParameters": {
    "overrides": {
      "data": [{
        "key": "region",
        "value": "central"
      }
    ]}
  }
}

For more information, see API Workflow - Develop a Flow.

This page has no comments.