This example workflow describes how to run jobs on datasets with parameters through the . A dataset with parameters is a dataset in which some part of the path to the data objects has been parameterized. Since one or more of the parts of the path can vary, you can build a dataset with parameters to capture data that spans multiple files. For example, datasets with parameters can be used to parameterize serialized data by region or data or other variable.
NOTE: This API workflow only works with version 4 (v4) or later of the APIs. |
For more information on datasets with parameters, see Overview of Parameterization.
The basic method by which you build and run a job for a dataset with parameters is very similar to the non-parameterized dataset method with a few notable exceptions. The steps in this workflow follow the same steps for the standard workflow. Where the steps overlap links have been provided to the non-parameterized workflow. For more information, see API Workflow - Develop a Flow.
This example covers three different datasets, each of which features a different type of dataset with parameters.
Example Number | Parameter Type | Description |
---|---|---|
1 | Datetime parameter | In this example, a directory is used to store daily orders transactions. This dataset must be defined with a Datetime parameter to capture the preceding 7 days of data. Jobs can be configured to process all of this data as it appears in the directory. |
2 | Variable | This dataset segments data into four timezones across the US. These timezones are defined using the following text values in the path: pacific , mountain , central , and eastern . In this case, you can create a parameter called region , which can be overridden at runtime to be set to one of these four values during job execution. |
3 | Pattern parameter | This example is a directory containing point-of-sale transactions captured into individual files for each region. Since each region is defined by a numeric value ( |
You must create the flow to host your dataset with parameters.
In the response, you must capture and retain the flow Identifer.
For more information, see API Workflow - Develop a Flow.
NOTE: When you import a dataset with parameters, only the first matching dataset is used for the initial file. If you want to see data from other matching files, you must collect a new sample within the Transformer page. |
Suppose your files are stored in the following paths:
MyFiles/1/Datetime/2018-04-06-orders.csv MyFiles/1/Datetime/2018-04-05-orders.csv MyFiles/1/Datetime/2018-04-04-orders.csv MyFiles/1/Datetime/2018-04-03-orders.csv MyFiles/1/Datetime/2018-04-02-orders.csv MyFiles/1/Datetime/2018-04-01-orders.csv MyFiles/1/Datetime/2018-03-31-orders.csv |
When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.
In the window, select the date value (e.g. YYYY-MM-DD
) and then click the Datetime icon.
Datetime Parameter:
YYYY-MM-DD
The Datetime parameter should match with all files in the directory. Import this dataset and wrangle it.
After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL.
For purposes of this example, here are some key values:
Suppose your files are stored in the following paths:
MyFiles/1/variable/census-eastern.csv MyFiles/1/variable/census-central.csv MyFiles/1/variable/census-mountain.csv MyFiles/1/variable/census-pacific.csv |
When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.
In the window, select the region value, which could be one of the following depending on the file: eastern
, central
, mountain
, or pacific
. Click the Variable icon.
Variable Parameter:
region
pacific
.In this case, the variable only matches one value in the directory. However, when you apply runtime overrides to the region
variable, you can set it to any value.
Import this dataset and wrangle it.
After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL.
For purposes of this example, here are some key values:
Suppose your files are stored in the following paths:
MyFiles/1/pattern/POS-r01.csv MyFiles/1/pattern/POS-r02.csv MyFiles/1/pattern/POS-r03.csv |
When you navigate to the directory through the application, you mouse over one of these files and select Parameterize.
In the window, select the two numeric digits (e.g. 02
). Click the Pattern icon.
Pattern Parameter:
{digit}{2}
In this case, the should match any sequence of two digits in a row. In the above example, this expression matches:
01
, 02
, and 03
, all of the files in the directory.
Import this dataset and wrangle it.
After you wrangle the dataset, return to its flow view and select the recipe. You should be able to extract the flowId and recipeId values from the URL.
For purposes of this example, here are some key values:
Checkpoint: You have created flows for each type of dataset with parameters. |
After you have created your dataset with parameter, you can wrangle it through the application. For more information, see Transformer Page.
Below, you can review the API calls to run a job for each type of dataset with parameters, including relevant information about overrides.
In the following example, the Datetime parameter has been overridden with the value 2018-04-03
as part of the job creation.
NOTE: You cannot apply overrides to these types of datasets with parameters. |
Endpoint | http://www.example.com:3005/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
127
to execute on the Photon running environment with profiling enabled. A response code of 201 - Created
is returned. The response body should look like the following:
{ "reason": "JobStarted", "sessionId": "5b883530-3920-11e8-a37a-db6dae3c6e43", "id": 29 } |
Retain the jobgroupId=29
value for monitoring.
In the following example, the region
variable has been overwritten with the value central
to execute the job on orders-central.csv
:
Endpoint | http://www.example.com:3005/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
123
to execute on the Photon running environment with profiling enabled. A response code of 201 - Created
is returned. The response body should look like the following:
{ "reason": "JobStarted", "sessionId": "aa0f9f00-391f-11e8-a37a-db6dae3c6e43", "id": 27 } |
Retain the jobgroupId=27
value for monitoring.
In the following example, the value 02
has been inserted into the pattern to execute the job on POS-r02.csv
:
NOTE: You cannot apply overrides to these types of datasets with parameters. |
Endpoint | http://www.example.com:3005/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
121
to execute on the Photon running environment with profiling enabled. A response code of 201 - Created
is returned. The response body should look like the following:
{ "reason": "JobStarted", "sessionId": "16424a60-3920-11e8-a37a-db6dae3c6e43", "id": 28 } |
Retain the jobgroupId=28
value for monitoring.
After the job has been created and you have captured the jobGroup Id, you can use it to monitor the status of your job. For more information, see API JobGroups Get v4.
If you need to re-run the job as specified, you can use the wrangledDataset identifier to re-run the most recent job.
Tip: When you re-run a job, you can change any variable values as part of the request. |
Example request:
Endpoint | http://www.example.com:3005/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
For more information, see API Workflow - Develop a Flow.