Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

Checkpoint: You have created a flow with multiple datasets and have integrated all of the relevant data into a single dataset.

Step -

...

Create Output Objects

Before you run a job. In the above example, you must run the job for the terminal dataset, which is POS-r01 in this case. This dataset contains references to all of the other datasets. When the job is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output reflects the proper integration of these other datasets into POS-r01

Steps:

...

 

...

Request Body:

Code Block
{
  "wrangledDataset": {
    "id": 23
  },
  "overrides": {
    "execution": "photon",
    "profiler": true,
    "writesettings": [
      {
        "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv",
        "action": "create",
        "format": "csv",
        "compression": "none",
        "header": false,
        "asSingleFile": false
      }
    ]
  },
  "ranfrom": null
}

...

D s photon

...

  1. Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
  2. Output is written as a new file with no overwriting of previous files.

...

define output objects, which specify the following:

  • Running environment where the job is executed
  • Profiling on or off
  • outputObjects have the following objects associated with them:
    • writeSettings: These objects define the file-based outputs that are produced for the output object
    • publications: These objects define the database target, table, and other settings for publication to a relational datastore. 
Info

NOTE: You can continue with this workflow without creating outputObjects yet. In this workflow, overrides are applied during the job definition, so you don't have to create the outputObjects and writeSettings at this time.

For more information on creating outputObjects, writeSettings, and publications, see API Workflow - Manage Outputs.

Step - Run Job

Through the APIs, you can specify and run a job. In the above example, you must run the job for the terminal dataset, which is POS-r01 in this case. This dataset contains references to all of the other datasets. When the job is run, the recipes for the other datasets are also applied to the terminal dataset, which ensures that the output reflects the proper integration of these other datasets into POS-r01

Info

NOTE: In the following example, writeSettings have been specified as overrides in the job definition. These overrides are applied for this job run only. If you need to re-run the job with these settings, you must either 1) re-apply the overrides or 2) create the writeSettings objects. see API Workflow - Manage Outputs.

Steps:

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 23.
  2. Construct a request using the following:

     

    Endpointhttp://www.example.com:3005/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    Code Block
    {
      "wrangledDataset": {
        "sessionIdid": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",23
      },
      "overrides": {
        "reasonexecution": "JobStartedphoton",
        "jobGraphprofiler": {true,
            "verticeswritesettings": [
                21,{
                22
     "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv",
             ]"action": "create",
            "edgesformat": ["csv",
            "compression": "none",
       {
           "header": false,
             "sourceasSingleFile": 21,false
          }
              "target": 22
                }
            ]
        },
        "idranfrom": 3,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
    null
    }
    


  3.  In the above example, the specified job has been launched for recipe 23 to execute on the
    D s photon
     running environment with profiling enabled. 
    1. Output format is CSV to the designated path. For more information on these properties, see API JobGroups Create v4.
    2. Output is written as a new file with no overwriting of previous files.
  4. A response code of 201 - Created is returned. The response body should look like the following:

    Code Block
    {
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                ]
        }
    }
  5. Retain the id value, which is the job identifier, for monitoring.

Step - Monitoring Your Job

...

  1. 21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 3,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }


  2. Retain the id value, which is the job identifier, for monitoring.

Step - Monitoring Your Job

You can monitor the status of your job through the following endpoint:

Endpointhttp://www.example.com:3005/v4/jobGroups/<id>/status
AuthenticationRequired
MethodGET
Request BodyNone.

When the job has successfully completed, the returned status message is the following:

Code Block
"Complete"

For more information, see API JobGroups Create v4.

Step - Re-run Job

In the future, you can re-run the job exactly as you specified it by executing the following call:

Tip

Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new file, you can change the primary input dataset for the dataset and then use the following API call to re-run the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4.


Info

NOTE: In the following example, writeSettings have been specified as overrides in the job definition. To re-run the job using the wrangledDataset ID only, you create the writeSettings objects. See API Workflow - Manage Outputs.


Endpointhttp://www.example.com:3005/v4/jobGroups/<id>/status
AuthenticationRequired
MethodGETPOST
Request BodyNone.

When the job has successfully completed, the returned status message is the following:

Code Block
"Complete"

For more information, see API JobGroups Create v4.

Step - Re-run Job

In the future, you can re-run the job exactly as you specified it by executing the following call:

Tip

Tip: You can swap imported datasets before re-running the job. For example, if you have uploaded a new file, you can change the primary input dataset for the dataset and then use the following API call to re-run the job as specified. See API WrangledDatasets Put PrimaryInputDataset v4.

Endpointhttp://www.example.com:3005/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body


{ "wrangledDataset": { "id": 23 }
Code Block
Code Block
{
  "wrangledDataset": {
    "id": 23
  },
  "overrides": {
    "execution": "photon",
    "profiler": true,
    "writesettings": [
      {
        "path": "hdfs://hadoop:50070/trifacta/queryResults/admin@example.com/POS-r01.csv",
        "action": "create",
        "format": "csv",
        "compression": "none",
        "header": false,
        "asSingleFile": false
      }
    ]
  },
  "ranfrom": null
}


The job is re-run as it was previously specified.If you need to modify any job parameters, you must create a new job definition.