Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r080

...

Step - Create a Publication

You can create publications that publish table-based outputs through specified connections. In the following, a Hive table is written out to the default database through connectionId = 1. This publication is associated with the outputObject id=4.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/publications
AuthenticationRequired
MethodPOST
Request Body
Code Block
{
    "path": [
        "default"
    ],
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}

Response:

Status Code201 - Created
Response Body
Code Block
{
    "path": [
        "default"
    ],
    "id": 3,
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "updatedAt": "2018-11-13T01:25:39.698Z",
    "createdAt": "2018-11-13T01:25:39.698Z",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}

cURL example:

Code Block
curl -X POST \
  http://example.com:3005/v4/publications \
  -H 'authorization: Basic <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
    "path": [
        "default"
    ],
    "tableName": "myPublishedHiveTable",
    "targetType": "hive",
    "action": "create",
    "outputObject": {
        "id": 4
    },
    "connection": {
        "id": 1
    }
}' 

D s terms
areacurl

For more information, see 

D s api refdoclink
operation/createPublication

Tip

Checkpoint: You're done.

You have done the following:

  1. Created an output object:
    1. Embedded a writeSettings object to define an Avro output.
    2. Associated the outputObject with a recipe.
  2. Added another writeSettings object to the outputObject.
  3. Added a table-based publication object to the outputObject.

You can now generate results for these three different outputs whenever you run a job (create a jobgroup) for the associated recipe. 

Step - Apply Overrides

When you are publishing results to a relational source, you can optionally apply overrides to the job to redirect the output or change the action applied to the target table. For more information, see API Workflow - Run Job.


Step - Apply
D s dataflow
Job Overrides

D s ed
rtrue
editionsgdppr
Info

NOTE: Overrides applied to the output objects are merged with any overrides specified as part of the jobGroup at the time of execution. For more information, see API Workflow - Run Job.

If neither object has a specified override for a

D s dataflow
 property, the applicable project setting is used. See Project Execution Settings Page.

You can optionally submit override values for a predefined set of

D s dataflow
properties on the output object. These overrides are applied each time that the outputobject is used to generate a set of results.

Info

NOTE: If you are using automatic VPC network mode, then network, subnetwork, and usePublicIPs do not apply.


Tip

Tip: You can apply job overrides to the job itself, instead of applying overrides to the outputobject. For more information, see API Workflow - Run Job.

Example - Apply labels to output object

In the following example, an existing outputObject (id=4) is modified to include override values for the labels of the job. Each property and its value as specified as a key-value pair in the request:

Request:

Endpoint
https://www.api.clouddataprep.com/v4/outputObjects/4
AuthenticationRequired
MethodPATCH
Request Body
Code Block
{
  "execution": "dataflow",
  "profiler": true,
  "outputObjectDataflowOptions": {
    "region": "us-central1",
    "zone": "us-central1-a", 
    "machineType": "n1-standard-64",
    "network": "my-network-name",
    "subnetwork": "regions/us-central1/subnetworks/my-subnetwork",
    "autoscalingAlgorithm": "THROUGHPUT_BASED",
    "serviceAccount": "my-service-account-name@<project-id>.iam.gserviceaccount.com",
    "numWorkers": "1",
    "maxNumWorkers": "1000",
    "usePublicIps": "true",
    "labels": [
       {
         "key": "my-billing-label-key",
         "value": "my-billing-label-value"      
       }
     ]
   }
}

Response:

Status Code200 - Ok
Response Body
Code Block
{
    "id": 4,
    "updater": {
        "id": 1
    },
    "updatedAt": "2020-03-21T00:27:00.937Z",
    "createdAt": "2020-03-20T23:30:42.991Z"
}

cURL example:

Code Block
curl -X PATCH \
  http://www.wrangle-dev.example.com/v4/outputObjects/4 \
  -H 'authorization: Bearer <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
  "execution": "dataflow",
  "profiler": true,
  "outputObjectDataflowOptions": {
    "region": "us-central1",
    "zone": "us-central1-a", 
    "machineType": "n1-standard-64",
    "network": "my-network-name",
    "subnetwork": "regions/us-central1/subnetworks/my-subnetwork",
    "autoscalingAlgorithm": "THROUGHPUT_BASED",
    "serviceAccount": "my-service-account-name@<project-id>.iam.gserviceaccount.com",
    "numWorkers": "1",
    "maxNumWorkers": "1000",
    "usePublicIps": "true",
    "labels": [
       {
         "key": "my-billing-label-key",
         "value": "my-billing-label-value"      
       }
     ]
   }
}'

D s terms
areacurl

Notes on properties:

  • If a network value, subnetwork value, or both is specified, then the VPC mode is custom. This setting is available in the UI for convenience.

  • You can submit empty or null values for property values in the payload. These values are submitted.

  • If you are not using auto-scaling on your job:
    • "autoscalingAlgorithm": "NONE",
    • Use "numWorkers" instead to specify the number of compute nodes to use for the job.
    • D s ed
      editionsgdppr
      oneLinetrue
  • If you are using auto-scaling on your job:
    • "autoscalingAlgorithm": "throughput_based",
    • Use the "maxNumWorkers" and "numWorkers" instead to specify the number of compute nodes to use for the job.
      • D s ed
        editionsgdppr
        oneLinetrue

Notes on labels:

You can use labels to assign billing information for the job in your project.

D s ed
editionsgdppr
oneLinetrue

You can apply up to 64 labels for a job. For more information on the available properties, see Dataflow Execution Settings.

Example - Override VPC settings

In the following example, an existing outputObject (id=4) is modified to override the VPC settings to use a non-local VPC:

Request:

Endpoint
https://www.api.clouddataprep.com/v4/outputObjects/4
AuthenticationRequired
MethodPATCH
Request Body
Code Block
{
  "execution": "dataflow",
  "outputObjectDataflowOptions": {
    "region": "second-region",
    "zone": "us-central1-a", 
    "network": "my-other-network",
    "subnetwork": "regions/second-region/subnetworks/my-other-subnetwork"
  }
}

Response:

Status Code200 - Ok
Response Body
Code Block
{
    "id": 4,
    "updater": {
        "id": 1
    },
    "updatedAt": "2020-03-21T00:27:00.937Z",
    "createdAt": "2020-03-20T23:30:42.991Z"
}

cURL example:

Code Block
curl -X PATCH \
  http://www.wrangle-dev.example.com/v4/outputObjects/4 \
  -H 'authorization: Bearer <auth_token>' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
  "execution": "dataflow",
  "outputObjectDataflowOptions:" {
    "region": "second-region",
    "zone": "us-central1-a", 
    "network": "my-other-network",
    "subnetwork": "regions/second-region/subnetworks/my-other-subnetwork"
  }
}'

D s terms
areacurl

Notes on properties:

  • If a network value, subnetwork value, or both is specified, then the VPC mode is custom. This setting is available in the UI for convenience.

  • Subnetwork values must be specified as a short URL or a full URL.

    • To specify the VPC associated with a different project to which you have access, use the full URL pattern for the subnetwork value:

      Code Block
      https://www.googleapis.com/compute/v1/projects/<HOST_PROJECT_ID>/regions/<REGION>/subnetworks/<SUBNETWORK>

      <HOST_PROJECT_ID> corresponds to the project identifier. This value must be between 6 and 30 characters. The value can contain only lowercase letters, digits, or hyphens. It must start with a letter. Trailing hyphens are prohibited.

    • To specify a different VPC subnetwork, you can also use a short URL pattern for the subnetwork value:

      Code Block
      regions/<REGION>/subnetworks/<SUBNETWORK>

For more information on these properties, see Dataflow Execution Settings.