D toc |
---|
Overview
After you have run a job to generate results, you can publish those results to different targets as needed. This section describes how to automate those publishing steps through the APIs.
Info |
---|
NOTE: This workflow applies to re-publishing job results after you have already generated them. |
Info |
---|
NOTE: After you have generated results and written them to one target, you cannot publish to the same target. You must configure the outputs to specify a different format and location and then run a new job. |
In the application, you can publish after generating results. See Publishing Dialog.
Basic Workflow
- Create connections to each target to which you wish to publish. Connections must support write operations.
- Specify a job whose output meets the requirements for the target.
- Run the job.
- When the job completes, publish the results to the target(s).
Step - Create Connections
For each target, you must have access to create a connection to it. After a connection is created, it can be reused, so you may find it easier to create them through the application.
- Some connections can be created via API. For more information, see API Connections Create v4.
- Other connections must be created through the application. Links to instructions are provided below.
Info |
---|
NOTE: Connections created through the application must be created through the Connections page, which is used for creating read/write connections. Do not create these connections through the Import Data page. See Connections Page. |
Hive connection
- Required Output Format: Avro
- Example Id: 1
- Create via API: Y
- Doc Link:Create Hive Connections
- Other Requirements:
- Requires integration with a Hadoop cluster.
Tableau Server connection
- Required Output Format: TDE
- Example Id: 3
- Create via API: Y
- Doc Link: Create Tableau Server Connections
- Other Requirements:
- None.
Redshift connection
- Required Output Format: Avro
- Example Id: 2
- Create via API: N
- Doc Link: Create Redshift Connections
- Other Requirements:
- Requires S3 set as the base storage layer. See Set Base Storage Layer.
SQL DW connection
- Required Output Format: Parquet
- Example Id: 4
- Create via API: N
- Doc Link: Create SQL DW Connections
- Other Requirements:
- Available only on Azure deployments. See Configure for Azure.
Step - Run Job
Before you publish results to a different datastore, you must generate results and store them in HDFS.
Info |
---|
NOTE: To produce some output formats, you must run the job on the Spark running environment. |
In the examples below, the following example data is assumed:
Identifier | Value |
---|---|
jobId | 2 |
flowId | 3 |
wrangledDatasetId (also flowNodeId) | 10 |
For more information on running a job, see API JobGroups Create v4.
For more information on the publishing endpoint, see API JobGroups Put Publish v4.
Step - Publish Results to Hive
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table
table in the default
Hive schema through connectionId=1.
Info |
---|
NOTE: To publish to Hive, the targeted database is predefined in the connection object. For the |
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish | ||
---|---|---|---|
Authentication | Required | ||
Method | PUT | ||
Request Body |
|
Response:
Status Code | 200 - OK | ||
---|---|---|---|
Response Body |
|
Step - Publish Results to Tableau Server
The following uses the TDE results from the specified job (jobId = 2) to publish the results to the test_table3
table in the default
Tableau Server database through connectionId=3.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish | ||
---|---|---|---|
Authentication | Required | ||
Method | PUT | ||
Request Body |
|
Response:
Status Code | 200 - OK | ||
---|---|---|---|
Response Body |
|
Step - Publish Results to Redshift
The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table2
table in the public
Redshift schema through connectionId=2.
Info |
---|
NOTE: To publish to Redshift, the targeted database is predefined in the connection object. For the |
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish | ||
---|---|---|---|
Authentication | Required | ||
Method | PUT | ||
Request Body |
|
Response:
Status Code | 200 - OK | ||
---|---|---|---|
Response Body |
|
Step - Publish Results to SQL DW
The following uses the Parquet results from the specified job (jobId = 2) to publish the results to the test_table4
table in the dbo
SQL DW database through connectionId=4.
Request:
Endpoint | http://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish | ||
---|---|---|---|
Authentication | Required | ||
Method | PUT | ||
Request Body |
|
Response:
Status Code | 200 - OK | ||
---|---|---|---|
Response Body |
|