Page tree

Outdated release! Latest docs are Release 8.2: API Workflow - Publish Results

   

For the latest updates on available API endpoints and documentation, see api.trifacta.com.

Contents:


Overview

After you have run a job to generate results, you can publish those results to different targets as needed. This section describes how to automate those publishing steps through the APIs.

NOTE: This workflow applies to re-publishing job results after you have already generated them.

NOTE: After you have generated results and written them to one target, you cannot publish to the same target. You must configure the outputs to specify a different format and location and then run a new job.

In the application, you can publish after generating results. See Publishing Dialog.

Basic Workflow

  1. Create connections to each target to which you wish to publish. Connections must support write operations. 
  2. Specify a job whose output meets the requirements for the target. 
  3. Run the job.
  4. When the job completes, publish the results to the target(s). 

Step - Create Connections

For each target, you must have access to create a connection to it. After a connection is created, it can be reused, so you may find it easier to create them through the application.

  • Other connections must be created through the application. Links to instructions are provided below. 

NOTE: Connections created through the application must be created through the Connections page, which is used for creating read/write connections. Do not create these connections through the Import Data page. See Connections Page.

ConnectionRequired Output FormatExample IdCreate via APIDoc LinkOther Requirements
HiveAvro1YCreate Hive ConnectionsRequires integration with a Hadoop cluster.
RedshiftAvro2NCreate Redshift ConnectionsRequires S3 set as the base storage layer. See Set Base Storage Layer.
Tableau ServerTDE3YCreate Tableau Server Connections 
SQL DWParquet4NCreate SQL DW ConnectionsAvailable only on Azure deployments. See Configure for Azure.

Step - Run Job

Before you publish results to a different datastore, you must generate results and store them in HDFS.

NOTE: To produce some output formats, you must run the job on the Spark running environment.

In the examples below, the following example data is assumed:

IdentifierValue

jobId

2
flowId3
wrangledDatasetId (also flowNodeId)10

For more information on running a job, see API JobGroups Create v4.

For more information on the publishing endpoint, see API JobGroups Put Publish v4.

Step - Publish Results to Hive

The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table table in the default Hive schema through connectionId=1.

NOTE: To publish to Hive, the targeted database is predefined in the connection object. For the path value in the request body, you must specify the schema in this database to use. Schema information is not available through API. To explore the available schemas, click the Hive icon in the Import Data page. The schemas are the first level of listed objects. For more information, see Import Data Page.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
AuthenticationRequired
MethodPUT
Request Body
{
  "connection": {
    "id": 1
  },
  "path": ["default"],
  "table": "test_table",
  "action": "create",
  "inputFormat": "avro",
  "flowNodeId": 10
}

Response:

Status Code200 - OK
Response Body
{
    "jobgroupId":2,
    "reason":"JobStarted",
    "sessionId":"24862060-4fcd-11e8-8622-fda0fbf6f550"
}

 

Step - Publish Results to Redshift

The following uses the Avro results from the specified job (jobId = 2) to publish the results to the test_table2 table in the public Redshift schema through connectionId=2.

NOTE: To publish to Redshift, the targeted database is predefined in the connection object. For the path value in the request body, you must specify the schema in this database to use. Schema information is not available through API. To explore the available schemas, click the Redshift icon in the Import Data page. The schemas are the first level of listed objects. For more information, see Import Data Page.


Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
AuthenticationRequired
MethodPUT
Request Body
{
  "connection": {
    "id": 2
  },
  "path": ["public"],
  "table": "test_table2",
  "action": "create",
  "inputFormat": "avro",
  "flowNodeId": 10
}

Response:

Status Code200 - OK
Response Body
{
    "jobgroupId":2,
    "reason":"JobStarted",
    "sessionId":"fae64760-4fc4-11e8-8cba-0987061e4e16"
}

 

Step - Publish Results to Tableau Server

The following uses the TDE results from the specified job (jobId = 2) to publish the results to the test_table3 table in the default Tableau Server database through connectionId=3.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
AuthenticationRequired
MethodPUT
Request Body
{
  "connection": {
    "id": 3
  },
  "path": ["default"],
  "table": "test_table3",
  "action": "createAndLoad",
  "inputFormat": "tde",
  "flowNodeId": 10
}

Response:

Status Code200 - OK
Response Body
{
    "jobgroupId":2,
    "reason":"JobStarted",
    "sessionId":"24862060-4fcd-11e8-8622-fda0fbf6f552"
}

Step - Publish Results to SQL DW

The following uses the Parquet results from the specified job (jobId = 2) to publish the results to the test_table4 table in the dbo SQL DW database through connectionId=4.

Request:

Endpointhttp://www.wrangle-dev.example.com:3005/v4/jobGroups/2/publish
AuthenticationRequired
MethodPUT
Request Body
{
  "connection": {
    "id": 4
  },
  "path": ["dbo"],
  "table": "test_table4",
  "action": "createAndLoad",
  "inputFormat": "pqt",
  "flowNodeId": 10
}

Response:

Status Code200 - OK
Response Body
{
    "jobgroupId": 2,
    "jobIds": 22,
    "reason": "JobStarted",
    "sessionId": "855f83a0-dc94-11e8-bd1a-f998d808020d"
}

This page has no comments.