Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This section describes how to run a job using the APIs available in 

D s product
rtrue
.

D s api baseurl

Run Job Endpoints

Depending on the type of job that you are running, you must use one of the following endpoints:

Run job

Run a job to generate the outputs from a single recipe in a flow.

Tip

Tip: This method is covered on this page.


Endpoint/v4/jobGroups/:id
MethodPOST
Reference documentation


D s api refdoclink
operation/runJobGroup


Run flow

Run all outputs specified in a flow. Optionally, you can run all scheduled outputs. 

Endpoint/v4/flows/:id/run
MethodPOST
Reference documentation


D s api refdoclink
operation/runFlow


Run deployment

Run the primary flow in the active release of the specified deployment.

Deployments are available only through the Deployment Manager. For more information, see Overview of Deployment Manager.

Endpoint/v4/deployments/:id/run
MethodPOST
Reference documentation


D s api refdoclink
operation/runDeployment


Pre-requisites

Before you begin, you should verify the following:

...

Info

NOTE: A wrangledDataset is an internal object name for the recipe that you wish to run. Please see previous section for how to acquire this value.


Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body


Code Block
{
  "wrangledDataset": {
    "id": 28629
  }
}


Response Code201 - Created
Response Body


Code Block
{
    "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
    "reason": "JobStarted",
    "jobGraph": {
        "vertices": [
            21,
            22
        ],
        "edges": [
            {
                "source": 21,
                "target": 22
            }
        ]
    },
    "id": 961247,
    "jobs": {
        "data": [
            {
                "id": 21
            },
            {
                "id": 22
            }
        ]
    }
}


If the 201 response code is returned, then the job has been queued for execution. 

...

You can monitor the status of your job through the following endpoint:

Endpoint<protocol>://<platform_base_url>/v4/jobGroups/<id>/
AuthenticationRequired
MethodGET
Request BodyNone.
Response Code200 - Ok
Response Body


Code Block
{
    "id": 961247,
    "name": null,
    "description": null,
    "ranfrom": "ui",
    "ranfor": "recipe",
    "status": "Complete",
    "profilingEnabled": true,
    "runParameterReferenceDate": "2019-08-20T17:46:27.000Z",
    "createdAt": "2019-08-20T17:46:28.000Z",
    "updatedAt": "2019-08-20T17:53:17.000Z",
    "workspace": {
        "id": 22
    },
    "creator": {
        "id": 38
    },
    "updater": {
        "id": 38
    },
    "snapshot": {
        "id": 774476
    },
    "wrangledDataset": {
        "id": 28629
    },
    "flowRun": null
}


When the job has successfully completed, the returned status message includes the following:

...

In the future, you can re-run the job using the same, simple request:

Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body


Code Block
{
  "wrangledDataset": {
    "id": 28629
  }
}


The job is re-run as it was previously specified.

...

Info

NOTE: Overrides for data sources apply only to file-based sources. File-based sources that are converted during ingestion, such as Microsoft Excel files, cannot be swapped in this manner.



Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST
Request Body


Code Block
{
  "wrangledDataset": {
    "id": 28629
  },
  "overrides": {
    "datasources": {
      "airlines–2.csv parameterized": [
        "s3://my-new-bucket/test-override-input/airlines1.csv",
        "s3://my-new-bucket/test-override-input/airlines2.csv",
        "s3://my-new-bucket/test-override-input/airlines3.csv"
      ],
      "airlines–4.csv": [
        "s3://my-new-bucket/test-override-input/airlines1.csv",
        "s3://my-new-bucket/test-override-input/airlines2.csv"
      ]
    }
  }
}


The job specified for recipe 28629 is re-run using the new data sources.

...

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    Code Block
    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "profiler": true,
        "execution": "spark",
        "writesettings": [
          {
            "path": "<new_path_to_output>",
            "format": "csv",
            "header": true,
            "asSingleFile": true
          }
        ]
      },
      "ranfrom": null
    }
    


  3.  In the above example, the job has been launched with the following overrides:
    1. Job will be executed on the Spark cluster. Other supported values depend on your deployment:

      Value for overrides.executionDescription
      photon

      Running environment on 

      D s node

      sparkSpark on integrated cluster, with the following exceptions.
      databricksSpark

      Spark on Azure Databricks

      emrSpark

      Spark on AWS EMR


    2. Job will be executed with profiling enabled.
    3. Output is written to a new file path.
    4. Output format is CSV to the designated path. 
    5. Output has a header and is generated as a single file.
  4. A response code of 201 - Created is returned. The response body should look like the following:

    Code Block
    {
    
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962221,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }


  5. Retain the id value, which is the job identifier, for monitoring.

...

  • Path to database to which to write (must have write access)
  • Connection to write to the target.

    Tip

    Tip: This identifier is for the connection used to write to the target system. This connection must already exist. For more information on how to retrieve the identifier for a connection, see

    D s api refdoclink
    operation/listConnections



  • Name of output table
  • Target table type

    Tip

    Tip: You can acquire the target type from the vendor value in the connection response. For more information, see

    D s api refdoclink
    operation/listConnections



  • action:

    Key valueDescription
    createCreate a new table with each publication.
    createAndLoadAppend your data to the table.
    truncateAndLoadTruncate the table and load it with your data.

    dropAndLoad

    Drop the table and write the new table in its place.


  • Identifier of connection to use to write data.

...

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    Code Block
    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "publications": [
          {
            "path": "["prod_db"]",
            "tableName": "Table_CaseFctn2",
            "action": "createAndLoad",
            "targetType": "postgres",
            "connectionId": 3,
          }
        ]
      },
      "ranfrom": null
    }
    


  3.  In the above example, the job has been launched with the following overrides:

    Info

    NOTE: When overrides are applied to publishing, any publications that are already attached to the recipe are ignored.

    1. Output path is to the prod_db database, using table name is Table_CaseFctn2.
    2. Output action is "create and load." See above for definitions. 
    3. Target table type is a PostgreSQL table.
  4. A response code of 201 - Created is returned. The response body should look like the following:

    Code Block
    {
    
    
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962222,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }


  5. Retain the id value, which is the job identifier, for monitoring.

...

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    Code Block
    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "webhooks": [{
          "name": "webhook override",
          "url": "http://example.com",
          "method": "post",
          "triggerEvent": "onJobFailure",
          "body": {
            "text": "override" 
           },
          "headers": {
            "testHeader": "val1" 
           },
          "sslVerification": true,
          "secretKey": "123",
      }]
     }
    }


  3.  In the above example, the job has been launched with the following overrides:

    Override settingDescription
    nameName of the webhook.
    urlURL to which to send the webhook message.
    methodThe HTTP method to use. Supported values: POST, PUT, PATCH, GET, or DELETE. Body is ignored for GET and DELETE methods.
    triggerEvent

    Supported values: onJobFailure - send webhook message if job fails onJobSuccess - send webhook message if job completes successfully onJobDone - send webhook message when job fails or finishes successfully

    body

    (optional) The value of the text field is the message that is sent.

    Info

    NOTE: Some special token values are supported. See Create Flow Webhook Task.


    header(optional) Key-value pairs of headers to include in the HTTP request.
    sslVerification(optional) Set to true if SSL verification should be completed. If not specified, the value is true.
    secretKey(optional) If enabled, this value should be set to the secret key to use.


  4. A response code of 201 - Created is returned. The response body should look like the following:

    Code Block
    {
    
    
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962222,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }


  5. Retain the id value, which is the job identifier, for monitoring.

...

  1. Acquire the internal identifier for the recipe for which you wish to execute a job. In the previous example, this identifier was 28629.
  2. Construct a request using the following:

     

    Endpoint<protocol>://<platform_base_url>/v4/jobGroups
    AuthenticationRequired
    MethodPOST

    Request Body:

    Code Block
    {
      "wrangledDataset": {
        "id": 28629
      },
      "overrides": {
        "runParameters": {
          "overrides": {
            "data": [{
              "key": "varRegion",
              "value": "02"
            }
          ]}
        },
      },
      "ranfrom": null
    }
    


  3.  In the above example, the specified job has been launched for recipe 28629. The run parameter varRegion has been set to 02 for this specific job. Depending on how it's defined in the flow, this parameter could influence change either of the following:
    1. The source for the imported dataset. 
    2. The path for the generated output.
    3. A flow parameter reference in the recipe
    4. For more information, see Overview of Parameterization
  4. A response code of 201 - Created is returned. The response body should look like the following:

    Code Block
    {
        "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
        "reason": "JobStarted",
        "jobGraph": {
            "vertices": [
                21,
                22
            ],
            "edges": [
                {
                    "source": 21,
                    "target": 22
                }
            ]
        },
        "id": 962223,
        "jobs": {
            "data": [
                {
                    "id": 21
                },
                {
                    "id": 22
                }
            ]
        }
    }


  5. Retain the id value, which is the job identifier, for monitoring.

...

This feature and the Spark properties to override must be enabled. For more information on enabling this feature, see Enable Spark Job Overrides.

Tip

Tip: This example applies the overrides to a specific job execution. To apply these overrides to every job for the outputobject, see API Workflow - Manage Outputs.

The following example, shows how to run a job for a specified recipe with Spark property overrides applied to it. This example assumes that the job has already been configured to be executed on Spark ("execution": "spark"):

Endpoint<protocol>://<platform_base_url>/v4/jobGroups
AuthenticationRequired
MethodPOST

Request Body:

Code Block
{
  "wrangledDataset": {
    "id": 28629
  },
  "sparkOptions": [
    {
      "key": "spark.executor.cores",
      "value": "2"
    },
    {
      "key": "spark.executor.memory",
      "value": "4GB"
    }
  ]
}

...