Page tree

 

Contents:


When a job has successfully completed, you can publish your job results to one of your connected datastores. In the Job Details page, click the Output Destinations tab. Then, click Publish.

NOTE: You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job through the application or the command line interface. Please wait until the previous job has been published before retrying to publish the failing job. This is a known issue.


NOTE: If you run a job and then attempt to export the results to a relational source, Datetime columns are written in the relational table as String values. Direct publication of Datetime columns publishes the output in the designated target data type. For more information, see Type Conversions.


NOTE: If you run a job with a single relational target and it fails at the publication step, you cannot publish the transformation job through the Export Results window.


NOTE: JSON-formatted files that are generated by Trifacta Wrangler Enterprise are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.



Figure: Export Results window


Publish to Hive

NOTE: If you created a publishing action to deliver results to Hive as part of this job definition, the Hive tab identifies the database and table where the results were written. Any available options are for ad-hoc publishing of results to Hive.

If you have enabled publishing to Hive, you can specify the database and table to which you would like to publish results.

NOTE: When launching the job, you must choose to generate results in Avro or Parquet format to publish to Hive. If you are publishing a wide dataset to Hive, you should generate results using Parquet.

NOTE: Some Trifacta data types may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.

Administrators can connect the Trifacta platform to an available instance of Hive. For more information, see Configure for Hive.

Hive publishing options:

  • Database: Name of Hive database. This value is case-insensitive.

    NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.

  • Table: Name of Hive table in database. A default value is generated based on your dataset name. This value is case-insensitive.
  • Format: Choose publishing format: avro or pqt (parquet).

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.

    Tip: Optionally, users can be permitted to publish to Hive staging schemas to which they do not have full create and drop permissions. This feature must be enabled. For more information, see Configure for Hive.

    When enabled, the name of the staging DB must be inserted into your user profile. See User Profile Page.

  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

To export the job results to the designated Hive table, click Publish. Publication happens in the background as a Trifacta job. You can track status in the Jobs page. See Jobs Page.

Publish to SQL DW

To publish to Microsoft SQL DW storage, please specify the following information.

NOTE: Publishing to Microsoft SQL DW requires deployment of the Trifacta platform on Azure and a base storage layer of WASB. For more information, see Configure for Azure.

NOTE: Results must be in Parquet format to publish to SQL DW.

Options:

  • Database: Name of SQL DW database. This value is case-insensitive.

    NOTE: You cannot publish to a SQL DW database that is empty. The database must contain at least one table.

  • Schema: Name of the schema to use to publish. Schema and results must match in terms of column names, order, and data type.
  • Table: Name of table in database. A default value is generated based on your dataset name. This value is case-insensitive.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

Publish to Redshift

If you have enabled publishing to Redshift, you can specify the database, schema, and table to which you would like to publish results.

Notes:

  • When launching the job, your output must be delivered to S3 with results in Avro, CSV, or JSON format.

    A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.
  • When publishing output results to Redshift, you cannot publish CSV files with headers. You can publish single-file CSV without headers and multi-file CSV outputs, which have no headers by default.

Administrators can connect the Trifacta platform to an available instance of Redshift. For more information, see Create Redshift Connections.

 

Publish to Tableau

If you have created a Tableau Server connection, you can export results that have been generated in TDE format to the connected server.

NOTE: Generated results must be in TDE format for export.

NOTE: If you encounter errors generating results in TDE format, additional configuration may be required. See Supported File Formats.

Options:

  • Site Name: Name of the Tableau Server site.
  • Project Name: Name of the Tableau Server project.
  • Datasource Name: Name of the Tableau Server datasource. This value is displayed for selection in Tableau Server.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new datasource: The platform creates the datasource and then loads it with the results from this job. If you attempt to use this option on a source that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing datasource: The results from this job are appended to the data that is already stored in Tableau Server. If you attempt to append to a source that does not exist, the publishing job fails, and an error is generated in the log. Append operations also fail if you publish to a target with a different schema.
  • Replace contents of existing datasource: Target datasource is dropped. A new datasource is created using the schema of the generated output and filled with the job results




This page has no comments.