Contents:
When a job has successfully completed, you can export the results through one of the following methods:
- In the Job Results page, click Export Results.
- In the Jobs tab in the Flow View page, select Export Results.
group
parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the group
parameter, particularly on larger datasets, or your job may run out of memory and fail.
To enforce row ordering, you can use the sort
transform. For more information, see Sort Transform.
NOTE: If you run a job with a single relational target and it fails at the publication step, you cannot publish the transformation job through the Export Results window.
Figure: Export Results window
There are multiple ways to export the results.
NOTE: You can only publish to destinations that are applicable to your base storage layer. For example, results written to HDFS can be published to Hive and cannot be published to Redshift. Base storage layer cannot be modified after install. See Set Base Storage Layer.
Direct file download
NOTE: If none of these options is available, data download may have been disabled by an administrator. See Admin Settings Page.
- Click one of the provided links to download the file through your browser to your local desktop.
- Avro output enabled: Jobs that were executed with Avro output enabled include the option to publish the results to Hive, as long as the platform has been integrated with a supported instance of Hive.
TDE:
If you have generated output in TDE format and have configured a connection to Tableau Server, you can publish directly to the server.You can download TDE formatted outputs to your desktop.
Publish to HDFS
When a job is run on Hadoop, results are published to the specified locations on HDFS.
Tip: These locations are available through the job results stored in the Jobs page. If these results are used by another analytics tool, you may want to copy these locations from this window.
Create Dataset
Optionally, you can turn your generated results into new datasets for immediate use in Designer Cloud Enterprise Edition. Select the format of the new dataset and click Create.
NOTE: When you create a new dataset as part of your job results, the file or files are written to the designated output location for your user account. Depending on your Hadoop permissions are configured, this location may not be accessible to other users.
After the new output has been written, you can create new recipes from it. See Build Sequence of Datasets.
Publish to Hive
NOTE: If you created a publishing action to deliver results to Hive as part of this job definition, the Hive tab identifies the database and table where the results were written. Any available options here are for ad-hoc publishing of results to Hive.
If you have enabled publishing to Hive, you can specify the database and table to which you would like to publish results.
NOTE: When launching the job, you must choose to generate results in Avro or Parquet format to publish to Hive. If you are publishing a wide dataset to Hive, you should generate results using Parquet.
NOTE: Some Alteryx data types may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.
Administrators can connect the Designer Cloud Powered by Trifacta platform to an available instance of Hive. For more information, see Configure for Hive.
Figure: Hive Publishing Options
Options:
Database: Name of Hive database. This value is case-insensitive.
NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.
- Table: Name of Hive table in database. A default value is generated based on your dataset name. This value is case-insensitive.
- Format: Choose publishing format: avro or pqt (parquet).
Data Option:
If you are publishing to a pre-existing table, schema validation is automatically performed.
- Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
- Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
- Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
- Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.
To export the job results to the designated Hive table, click Publish. Publication happens in the background as a Alteryx job. You can track status in the Jobs page. See Jobs Page.
Publish to Redshift
If you have enabled publishing to Redshift, you can specify the database, schema, and table to which you would like to publish results.
Notes:
- When launching the job, your output must be delivered to S3 with results in Avro, CSV, or JSON format. A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.
- When publishing output results to Redshift, you cannot publish CSV files with headers. You can publish single-file CSV without headers and multi-file CSV outputs, which have no headers by default.
Administrators can connect the Designer Cloud Powered by Trifacta platform to an available instance of Redshift. For more information, see Create Redshift Connections.
Publish to SQL DW
To publish to Microsoft SQL DW storage, please specify the following information.
NOTE: Publishing to Microsoft SQL DW requires deployment of the Designer Cloud Powered by Trifacta platform on Azure and a base storage layer of WASB. For more information, see Configure for Azure.
NOTE: Results must be in Parquet format to publish to SQL DW.
Options:
Database: Name of SQL DW database. This value is case-insensitive.
NOTE: You cannot publish to a SQL DW database that is empty. The database must contain at least one table.
- Schema: Name of the schema to use to publish. Schema and results must match in terms of column names, order, and data type.
- Table: Name of table in database. A default value is generated based on your dataset name. This value is case-insensitive.
Data Option:
If you are publishing to a pre-existing table, schema validation is automatically performed.
- Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
- Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
- Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
- Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.
Publish to Tableau
If you have created a Tableau Server connection, you can export results that have been generated in TDE format to the connected server.
NOTE: Generated results must be in TDE format for export.
Options:
- Site Name: Name of the Tableau Server site.
- Project Name: Name of the Tableau Server project.
- Datasource Name: Name of the Tableau Server datasource. This value is displayed for selection in Tableau Server.
Data Option:
If you are publishing to a pre-existing table, schema validation is automatically performed.
- Create new datasource: The platform creates the datasource and then loads it with the results from this job. If you attempt to use this option on a source that already exists, the publishing job fails, and an error is generated in the log.
- Append data to existing datasource: The results from this job are appended to the data that is already stored in Tableau Server. If you attempt to append to a source that does not exist, the publishing job fails, and an error is generated in the log. Append operations also fail if you publish to a target with a different schema.
- Replace contents of existing datasource: Target datasource is dropped. A new datasource is created using the schema of the generated output and filled with the job results.
This page has no comments.