...
Info |
---|
Checkpoint: You have created a flow with multiple datasets and have integrated all of the relevant data into a single dataset. |
Step -
...
Create Output Objects
Before you run a job. In the above example, you must run the job for the terminal dataset, which is POS-r01
in this case. This dataset contains references to all of the other datasets. When the job is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output reflects the proper integration of these other datasets into POS-r01
.
Steps:
...
...
Request Body:
Code Block |
---|
{
"wrangledDataset": {
"id": 23
},
"overrides": {
"execution": "photon",
"profiler": true,
"writesettings": [
{
"path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv",
"action": "create",
"format": "csv",
"compression": "none",
"header": false,
"asSingleFile": false
}
]
},
"ranfrom": null
}
|
...
D s photon |
---|
...
- Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
- Output is written as a new file with no overwriting of previous files.
...
define output objects, which specify the following:
- Running environment where the job is executed
- Profiling on or off
- outputObjects have the following objects associated with them:
- writeSettings: These objects define the file-based outputs that are produced for the output object
- publications: These objects define the database target, table, and other settings for publication to a relational datastore.
Info |
---|
NOTE: You can continue with this workflow without creating outputObjects yet. In this workflow, overrides are applied during the job definition, so you don't have to create the outputObjects and writeSettings at this time. |
For more information on creating outputObjects, writeSettings, and publications, see API Workflow - Manage Outputs.
Step - Run Job
Through the APIs, you can specify and run a job. In the above example, you must run the job for the terminal dataset, which is POS-r01
in this case. This dataset contains references to all of the other datasets. When the job is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output reflects the proper integration of these other datasets into POS-r01
.
Info |
---|
NOTE: In the following example, writeSettings have been specified as overrides in the job definition. These overrides are applied for this job run only. If you need to re-run the job with these settings, you must either 1) re-apply the overrides or 2) create the writeSettings objects. see API Workflow - Manage Outputs. |
Steps:
- Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was
23
. - Construct a request using the following:
Endpoint http://www.example.com:3005/v4/jobGroups
Authentication Required Method POST
Request Body:
Code Block { "wrangledDataset": { "sessionIdid": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",23 }, "overrides": { "reasonexecution": "JobStartedphoton", "jobGraphprofiler": {true, "verticeswritesettings": [ 21,{ 22 "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv", ]"action": "create", "edgesformat": ["csv", "compression": "none", { "header": false, "sourceasSingleFile": 21,false } "target": 22 } ] }, "idranfrom": 3, "jobs": { "data": [ { "id": 21 }, { "id": 22 } null }
- In the above example, the specified job has been launched for recipe
23
to execute on the
running environment with profiling enabled.D s photon - Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
- Output is written as a new file with no overwriting of previous files.
A response code of
201 - Created
is returned. The response body should look like the following:Code Block { "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ ] } }
Retain the
id
value, which is the job identifier, for monitoring.
Step - Monitoring Your Job
...
21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 3, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } }
Retain the
id
value, which is the job identifier, for monitoring.
Step - Monitoring Your Job
You can monitor the status of your job through the following endpoint:
Endpoint | http://www.example.com:3005/v4/jobGroups/<id>/status |
---|---|
Authentication | Required |
Method | GET |
Request Body | None. |
When the job has successfully completed, the returned status message is the following:
Code Block |
---|
"Complete" |
For more information, see API JobGroups Create v4.
Step - Re-run Job
In the future, you can re-run the job exactly as you specified it by executing the following call:
Tip |
---|
Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new file, you can change the primary input dataset for the dataset and then use the following API call to re-run the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4. |
Info |
---|
NOTE: In the following example, writeSettings have been specified as overrides in the job definition. To re-run the job using the wrangledDataset ID only, you create the writeSettings objects. See API Workflow - Manage Outputs. |
Endpoint | http://www.example.com:3005/v4/jobGroups/<id>/status |
---|---|
Authentication | Required |
Method | GET POST |
Request Body | None. |
When the job has successfully completed, the returned status message is the following:
Code Block |
---|
"Complete" |
For more information, see API JobGroups Create v4.
Step - Re-run Job
In the future, you can re-run the job exactly as you specified it by executing the following call:
Tip |
---|
Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new file, you can change the primary input dataset for the dataset and then use the following API call to re-run the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4. |
Endpoint | http://www.example.com:3005/v4/jobGroups | ||||
---|---|---|---|---|---|
Authentication | Required | ||||
Method | POST | ||||
Request Body |
|
The job is re-run as it was previously specified.If you need to modify any job parameters, you must create a new job definition.