This section describes how to run a job using the APIs available in .
Depending on the type of job that you are running, you must use one of the following endpoints:
Run a job to generate the outputs from a single recipe in a flow.
Tip: This method is covered on this page. |
Endpoint | /v4/jobGroups/:id | |
---|---|---|
Method | POST | |
Reference documentation |
|
Run all outputs specified in a flow. Optionally, you can run all scheduled outputs.
Endpoint | /v4/flows/:id/run | |
---|---|---|
Method | POST | |
Reference documentation |
|
Run the primary flow in the active release of the specified deployment.
Deployments are available only through the Deployment Manager. For more information, see Overview of Deployment Manager.
Endpoint | /v4/deployments/:id/run | |
---|---|---|
Method | POST | |
Reference documentation |
|
Before you begin, you should verify the following:
Get authentication credentials. As part of each request, you must pass in authentication credentials to the platform. For more information, see Manage API Access Tokens.
For more information, see
section/Authentication |
Acquire recipe (wrangled dataset) identifier. In Flow View, click the icon for the recipe whose outputs you wish to generate. Acquire the numeric value for the recipe from the URL. In the following, the recipe Id is 28629
:
http://<platform_base_url>/flows/5479?recipe=28629&tab=recipe |
If you wish to apply overrides to the inputs or outputs of the recipe, you should acquire those identifiers or paths now. For more information, see "Run Job with Parameter Overrides" below.
Through the APIs, you can specify and run a job. To run a job with all default settings, construct a request like the following:
NOTE: A |
Endpoint |
/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
| |
Response Code | 201 - Created | |
Response Body |
|
If the 201
response code is returned, then the job has been queued for execution.
Tip: Retain the |
For more information, see
operation/runJobGroup |
Checkpoint: You have queued your job for execution. |
You can monitor the status of your job through the following endpoint:
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups/<id>/ | |
---|---|---|
Authentication | Required | |
Method | GET | |
Request Body | None. | |
Response Code | 200 - Ok | |
Response Body |
|
When the job has successfully completed, the returned status message includes the following:
"status": "Complete", |
For more information, see
operation/getJobGroup |
Tip: You have executed the job. Results have been delivered to the designated output locations. |
In the future, you can re-run the job using the same, simple request:
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
The job is re-run as it was previously specified.
For more information, see
operation/createJobGroup |
As needed, you can specify runtime overrides for any of the settings related to the job definition or its outputs. For file-based jobs, these overrides include:
You can override the file-based data sources your job run. In the following example, two parameterized datasets are overridden with new files.
NOTE: Overrides for data sources apply only to file-based sources. File-based sources that are converted during ingestion, such as Microsoft Excel files, cannot be swapped in this manner. |
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups | |
---|---|---|
Authentication | Required | |
Method | POST | |
Request Body |
|
The job specified for recipe 28629
is re-run using the new data sources.
Notes:
For more information, see
operation/createJobGroup |
See
operation/getWriteSetting |
28629
.Construct a request using the following:
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 28629 }, "overrides": { "profiler": true, "execution": "spark", "writesettings": [ { "path": "<new_path_to_output>", "format": "csv", "header": true, "asSingleFile": true } ] }, "ranfrom": null } |
Job will be executed on the Spark cluster. Other supported values depend on your product edition and available running environments:
Value for overrides.execution | Description |
---|---|
photon | Running environment on |
spark | Spark on integrated cluster, with the following exceptions. |
databricksSpark | Spark on Azure Databricks |
emrSpark | Spark on AWS EMR |
dataflow |
A response code of 201 - Created
is returned. The response body should look like the following:
{ "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ 21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 962221, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } } |
Retain the id
value, which is the job identifier, for monitoring.
You can also pass job definition overrides for table-based outputs. For table outputs, overrides include:
Connection to write to the target.
Tip: This identifier is for the connection used to write to the target system. This connection must already exist. For more information on how to retrieve the identifier for a connection, see
|
Target table type
Tip: You can acquire the target type from the
|
action:
Key value | Description |
---|---|
create | Create a new table with each publication. |
createAndLoad | Append your data to the table. |
truncateAndLoad | Truncate the table and load it with your data. |
| Drop the table and write the new table in its place. |
See
operation/getPublication |
28629
.Construct a request using the following:
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 28629 }, "overrides": { "publications": [ { "path": [ "prod_db" ], "tableName": "Table_CaseFctn2", "action": "createAndLoad", "targetType": "postgres", "connectionId": 3 } ] }, "ranfrom": null } |
In the above example, the job has been launched with the following overrides:
NOTE: When overrides are applied to publishing, any publications that are already attached to the recipe are ignored. |
prod_db
database, using table name is Table_CaseFctn2
.A response code of 201 - Created
is returned. The response body should look like the following:
{ "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ 21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 962222, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } } |
Retain the id
value, which is the job identifier, for monitoring.
When you execute a job, you can pass in a set of parameters as overrides to generate a webhook message to a third-party application, based on the success or failure of the job.
For more information on webhooks, see Create Flow Webhook Task.
28629
.Construct a request using the following:
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 28629 }, "overrides": { "webhooks": [{ "name": "webhook override", "url": "http://example.com", "method": "post", "triggerEvent": "onJobFailure", "body": { "text": "override" }, "headers": { "testHeader": "val1" }, "sslVerification": true, "secretKey": "123" }] } } |
In the above example, the job has been launched with the following overrides:
Override setting | Description | |
---|---|---|
name | Name of the webhook. | |
url | URL to which to send the webhook message. | |
method | The HTTP method to use. Supported values: POST , PUT , PATCH , GET , or DELETE. Body is ignored for GET and DELETE methods. | |
triggerEvent | Supported values: | |
body | (optional) The value of the
| |
header | (optional) Key-value pairs of headers to include in the HTTP request. | |
sslVerification | (optional) Set to true if SSL verification should be completed. If not specified, the value is true . | |
secretKey | (optional) If enabled, this value should be set to the secret key to use. |
A response code of 201 - Created
is returned. The response body should look like the following:
{ "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ 21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 962222, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } } |
Retain the id
value, which is the job identifier, for monitoring.
You can pass overrides of the default parameter values as part of the job definition. You can use the following mechanism to pass in parameter overrides of the following types:
The syntax is the same for each type.
28629
.
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 28629 }, "overrides": { "runParameters": { "overrides": { "data": [ { "key": "varRegion", "value": "02" } ] } } }, "ranfrom": null } |
28629
. The run parameter varRegion
has been set to 02
for this specific job. Depending on how it's defined in the flow, this parameter could influence change either of the following:A response code of 201 - Created
is returned. The response body should look like the following:
{ "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1", "reason": "JobStarted", "jobGraph": { "vertices": [ 21, 22 ], "edges": [ { "source": 21, "target": 22 } ] }, "id": 962223, "jobs": { "data": [ { "id": 21 }, { "id": 22 } ] } } |
Retain the id
value, which is the job identifier, for monitoring.
When it is enabled, you can submit overrides to a specific set of Spark properties for your job.
This feature and the Spark properties to override must be enabled. For more information on enabling this feature, see Enable Spark Job Overrides.
The following example, shows how to run a job for a specified recipe with Spark property overrides applied to it. This example assumes that the job has already been configured to be executed on Spark ("execution": "spark"
):
Endpoint | <protocol>://<platform_base_url>/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 28629 }, "overrides": { "sparkOptions": [ { "key": "spark.executor.cores", "value": "2" }, { "key": "spark.executor.memory", "value": "4GB" } ] } } |
You can submit overrides to a specific set of Databricks properties for your job execution. These overrides can be applied to AWS Databricks or Azure Databricks.
The following example shows how to run a job on Databricks for a specified recipe with several property overrides applied to it:
Endpoint | https://www.example.com/v4/jobGroups |
---|---|
Authentication | Required |
Method | POST |
Request Body:
{ "wrangledDataset": { "id": 60 }, "overrides": { "execution": "databricksSpark", "profiler": true, "databricksOptions": [ {"key": "maxWorkers", "value": 8}, {"key": "poolId", "value": "pool-123456789"}, {"key": "enableLocalDiskEncryption", "value": true} ] } } |
The above overrides do the following:
8
. Databricks is permitted to adjust the number of nodes for job execution up to this limit.pool-123456789
for the job.The following properties can be overridden for AWS Databricks and Azure Databricks jobs:
{ "wrangledDataset": {"id": 60}, "overrides": { "databricksOptions": [ "autoterminationMinutes" : <integer_override_value>, "awsAttributes.availability" : "<string_override_value>", "awsAttributes.availabilityZone" : "<string_override_value>", "awsAttributes.ebsVolume.count" : <integer_override_value>, "awsAttributes.ebsVolume.size" : <integer_override_value>, "awsAttributes.ebsVolume.type" : "<string_override_value>", "awsAttributes.firstOnDemandInstances" : <integer_override_value>, "awsAttributes.instanceProfileArn" : "<string_override_value>", "awsAttributes.spotBidPricePercent" : <decimal_override_value>, "clusterMode" : "<string_override_value>", "clusterPolicyId" : "<string_override_value>", "driverNodeType" : "<string_override_value>", "enableAutotermination" : <boolean_override_value>, "enableLocalDiskEncryption" : <boolean_override_value>, "logsDestination" : "<string_override_value>", "maxWorkers" : <integer_override_value>, "minWorkers" : <integer_override_value>, "poolId" : "<string_override_value>", "poolName" : "<string_override_value>", "driverPoolId" : "<string_override_value>", "driverPoolName" : "<string_override_value>", "serviceUrl" : "<string_override_value>", "sparkVersion" : "<string_override_value>", "workerNodeType" : "<string_override_value>", ] } } |
NOTE: Overrides that begin with |
NOTE: If a Databricks cluster policy is used, all job-level overrides except for |
For more information: