Page tree

Release 6.4.2


Contents:

   

Contents:


When a job has successfully completed, you can publish your job results to one of your connected datastores. In the Job Details page, click the Output Destinations tab. Then, click Publish.

NOTE: You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job through the application. Please wait until the previous job has been published before retrying to publish the failing job. This is a known issue.


NOTE: If you run a job and then attempt to export the results to a relational source, Datetime columns are written in the relational table as String values. Direct publication of Datetime columns publishes the output in the designated target data type. For more information, see Type Conversions.


NOTE: If you run a job with a single relational target and it fails at the publication step, you cannot publish the transformation job through the Export Results window.


NOTE: JSON-formatted files that are generated by Designer Cloud Enterprise Edition are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.




Figure: Publishing dialog


Publish to Cloudera Navigator

NOTE: This feature must be enabled i your environment. For more information, see Configure Publishing to Cloudera Navigator.

If you have enabled the Designer Cloud Powered by Trifacta® platform to integrate with Cloudera Navigator, metadata information is automatically published to Cloudera Navigator.

NOTE: When Cloudera Navigator publishing is enabled, the Designer Cloud Powered by Trifacta platform automatically attempts to publish to Navigator when the job completes. If it's successful, additional publishing is not necessary. These links may not be immediately available in Cloudera Navigator, which refreshes on a predefined polling interval.

Locating your job metadata in Cloudera Navigator:

  1. In the Job Details page, acquire the job ID from the Job summary in the Overview tab.
  2. Login to Navigator.
  3. Search for trifacta.<jobId>.
  4. If the job completed successfully and Navigator has been able to poll the platform for new job results, you should see individual entries for each sub-job of the job that completed:

    sub-job identifierDescription
    trifacta.<jobId>.wrangle.<subJobId1>Link to metadata on transformation job that was executed on the running environment.
    trifacta.<jobId>.filewriter.<subJobId2>Link to metadata on job that generated the results in the targeted datastore.
  5. Click any of these links to review metadata details about the job.

Publish to Hive

NOTE: If you created a publishing action to deliver results to Hive as part of this job definition, the Hive tab identifies the database and table where the results were written. Any available options are for ad-hoc publishing of results to Hive.

If you have enabled publishing to Hive, you can specify the database and table to which you would like to publish results.

NOTE: When launching the job, you must choose to generate results in Avro or Parquet format to publish to Hive. If you are publishing a wide dataset to Hive, you should generate results using Parquet.

NOTE: Some Alteryx data types may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.

Administrators can connect the Designer Cloud Powered by Trifacta platform to an available instance of Hive. For more information, see Configure for Hive.

Hive publishing options:

  • Database: Name of Hive database. This value is case-insensitive.

    NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.

  • Table: Name of Hive table in database. A default value is generated based on your dataset name. This value is case-insensitive.
  • Format: Choose publishing format: avro or pqt (parquet).

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.

    Tip: Optionally, users can be permitted to publish to Hive staging schemas to which they do not have full create and drop permissions. This feature must be enabled. For more information, see Configure for Hive.

    When enabled, the name of the staging DB must be inserted into your user profile. See User Profile Page.

  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

To export the job results to the designated Hive table, click Publish. Publication happens in the background as a Alteryx job. You can track status in the Jobs page. See Jobs Page.

Publish to SQL DW

To publish to Microsoft SQL DW storage, please specify the following information.

NOTE: Publishing to Microsoft SQL DW requires deployment of the Designer Cloud Powered by Trifacta platform on Azure and a base storage layer of WASB. For more information, see Configure for Azure.

NOTE: Results must be in Parquet format to publish to SQL DW.

Options:

  • Database: Name of SQL DW database. This value is case-insensitive.

    NOTE: You cannot publish to a SQL DW database that is empty. The database must contain at least one table.

  • Schema: Name of the schema to use to publish. Schema and results must match in terms of column names, order, and data type.
  • Table: Name of table in database. A default value is generated based on your dataset name. This value is case-insensitive.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

Publish to Redshift

If you have enabled publishing to Redshift, you can specify the database, schema, and table to which you would like to publish results.

Notes:

  • When launching the job, your output must be delivered to S3 with results in Avro, CSV, or JSON format.

    A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.
  • When publishing output results to Redshift, you cannot publish CSV files with headers. You can publish single-file CSV without headers and multi-file CSV outputs, which have no headers by default.

Administrators can connect the Designer Cloud Powered by Trifacta platform to an available instance of Redshift. For more information, see Create Redshift Connections.

Publish to Tableau

If you have created a Tableau Server connection, you can export results that have been generated in TDE format to the connected server.

NOTE: Generated results must be in TDE format for export.

NOTE: If you encounter errors generating results in TDE format, additional configuration may be required. See Supported File Formats.

Options:

  • Connection: If you have created multiple connections to Tableau Server, please select the connection to use from the list.
  • Project Name: Name of the Tableau Server project.
  • Datasource Name: Name of the Tableau Server datasource. This value is displayed for selection in Tableau Server.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new datasource: The platform creates the datasource and then loads it with the results from this job. If you attempt to use this option on a source that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing datasource: The results from this job are appended to the data that is already stored in Tableau Server. If you attempt to append to a source that does not exist, the publishing job fails, and an error is generated in the log. Append operations also fail if you publish to a target with a different schema.
  • Replace contents of existing datasource: Target datasource is dropped. A new datasource is created using the schema of the generated output and filled with the job results.



This page has no comments.