Page tree

 

Contents:


When a job has successfully completed, you can export the results through one of the following methods:

  • In the Job Results page, click Export Results.
  • In the Jobs tab in the Flow View page, select Export Results.

NOTE: Transforms that use the group parameter can result in non-deterministic re-ordering in the data grid.  However, if you're running your job on the Spark running environment, you should apply the group parameter, or your job may run out of memory and fail. To avoid this issue and to enforce row ordering, use the sort transform. For more information, see Sort Transform.

NOTE: You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job through the application or the command line interface. Please wait until the previous job has been published before retrying to publish the failing job. This is a known issue.


NOTE: If you run a job and then attempt to export the results to a relational source, Datetime columns are written in the relational table as String values. Direct publication of Datetime columns publishes the output in the designated target data type. For more information, see Type Conversions.

NOTE: If you run a job with a single relational target and it fails at the publication step, you cannot publish the transformation job through the Export Results window.

Figure: Export Results window

There are multiple ways to export the results.

NOTE: You can only publish to destinations that are applicable to your base storage layer. For example, results written to HDFS can be published to Hive and cannot be published to Redshift. Base storage layer cannot be modified after install. See Set Base Storage Layer.

Direct file download

NOTE: If none of these options is available, data download may have been disabled by an administrator. See Admin Settings Page.

  • Click one of the provided links to download the file through your browser to your local desktop.
  • Avro output enabled: Jobs that were executed with Avro output enabled include the option to publish the results to Hive, as long as the platform has been integrated with a supported instance of Hive.
  • TDE:

    If you have generated output in TDE format and have configured a connection to Tableau Server, you can publish directly to the server.You can download TDE formatted outputs to your desktop.

Publish to HDFS

When a job is run on Hadoop, results are published to the specified locations on HDFS.

Tip: These locations are available through the job results stored in the Jobs page. If these results are used by another analytics tool, you may want to copy these locations from this window.

Create Dataset

Optionally, you can turn your generated results into new datasets for immediate use in Trifacta Wrangler Enterprise. Select the format of the new dataset and click Create.

NOTE: If you generated results in Parquet format only, you cannot create a dataset from it, even if the Create button is present. This is a known issue.

NOTE: When you create a new dataset as part of your job results, the file or files are written to the designated output location for your user account. Depending on your backend datastore permissions are configured, this location may not be accessible to other users.

After the new output has been written, you can create new recipes from it. See Build Sequence of Datasets.

Publish to Hive

NOTE: If you created a publishing action to deliver results to Hive as part of this job definition, the Hive tab identifies the database and table where the results were written. Any available options here are for ad-hoc publishing of results to Hive.

If you have enabled publishing to Hive, you can specify the database and table to which you would like to publish results.

NOTE: When launching the job, you must choose to generate results in Avro or Parquet format to publish to Hive. If you are publishing a wide dataset to Hive, you should generate results using Parquet.

NOTE: Some Trifacta data types may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.

Administrators can connect the Trifacta platform to an available instance of Hive. For more information, see Configure for Hive.

Figure: Hive Publishing Options

Options:

  • Database: Name of Hive database. This value is case-insensitive.

    NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.

  • Table: Name of Hive table in database. A default value is generated based on your dataset name. This value is case-insensitive.
  • Format: Choose publishing format: avro or pqt (parquet).

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

To export the job results to the designated Hive table, click Publish. Publication happens in the background as a Trifacta job. You can track status in the Jobs page. See Jobs Page.

Publish to Redshift

If you have enabled publishing to Redshift, you can specify the database, schema, and table to which you would like to publish results.

Notes:

  • When launching the job, your output must be delivered to S3 with results in Avro, CSV, or JSON format. A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.
  • When publishing output results to Redshift, you cannot publish CSV files with headers. You can publish single-file CSV without headers and multi-file CSV outputs, which have no headers by default.

Administrators can connect the Trifacta platform to an available instance of Redshift. For more information, see Create Redshift Connections.

Publish to SQL DW

To publish to Microsoft SQL DW storage, please specify the following information.

NOTE: Publishing to Microsoft SQL DW requires deployment of the Trifacta platform on Azure and a base storage layer of WASB. For more information, see Configure for Azure.

NOTE: Results must be in Parquet format to publish to SQL DW.

Options:

  • Database: Name of SQL DW database. This value is case-insensitive.

    NOTE: You cannot publish to a SQL DW database that is empty. The database must contain at least one table.

  • Schema: Name of the schema to use to publish. Schema and results must match in terms of column names, order, and data type.
  • Table: Name of table in database. A default value is generated based on your dataset name. This value is case-insensitive.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

Publish to Tableau

If you have created a Tableau Server connection, you can export results that have been generated in TDE format to the connected server.

NOTE: Generated results must be in TDE format for export.

NOTE: If you encounter errors generating results in TDE format, additional configuration may be required. See Supported File Formats.

Options:

  • Site Name: Name of the Tableau Server site.
  • Project Name: Name of the Tableau Server project.
  • Datasource Name: Name of the Tableau Server datasource. This value is displayed for selection in Tableau Server.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new datasource: The platform creates the datasource and then loads it with the results from this job. If you attempt to use this option on a source that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing datasource: The results from this job are appended to the data that is already stored in Tableau Server. If you attempt to append to a source that does not exist, the publishing job fails, and an error is generated in the log. Append operations also fail if you publish to a target with a different schema.
  • Replace contents of existing datasource: Target datasource is dropped. A new datasource is created using the schema of the generated output and filled with the job results.

 



This page has no comments.