Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

D toc

When a job has successfully completed, you can export the results through one of the following methods:

  • In the Job Results page, click Export Results.
  • In the Jobs tab in the Flow View page, select Export Results.

D s ordering

Info

NOTE: If you run a job with a single relational target and it fails at the publication step, you cannot publish the transformation job through the Export Results window.


D caption
typefigure
Export Results window

 

There are multiple ways to export the results.

Info

NOTE: You can only publish to destinations that are applicable to your base storage layer. For example, results written to HDFS can be published to Hive and cannot be published to Redshift. Base storage layer cannot be modified after install. See Set Base Storage Layer.

Direct file download

Info

NOTE: If none of these options is available, data download may have been disabled by an administrator. See Admin Settings Page.

  • Click one of the provided links to download the file through your browser to your local desktop.
  • Avro output enabled: Jobs that were executed with Avro output enabled include the option to publish the results to Hive, as long as the platform has been integrated with a supported instance of Hive.
  • TDE:

    If you have generated output in TDE format and have configured a connection to Tableau Server, you can publish directly to the server.You can download TDE formatted outputs to your desktop.

Publish to HDFS

When a job is run on Hadoop, results are published to the specified locations on HDFS.

Tip

Tip: These locations are available through the job results stored in the Jobs page. If these results are used by another analytics tool, you may want to copy these locations from this window.

Create Dataset

Optionally, you can turn your generated results into new datasets for immediate use in

D s product
. Select the format of the new dataset and click Create.

Info

NOTE: When you create a new dataset as part of your job results, the file or files are written to the designated output location for your user account. Depending on your Hadoop permissions are configured, this location may not be accessible to other users.

After the new output has been written, you can create new recipes from it. See Build Sequence of Datasets.

Publish to Hive

Info

NOTE: If you created a publishing action to deliver results to Hive as part of this job definition, the Hive tab identifies the database and table where the results were written. Any available options here are for ad-hoc publishing of results to Hive.

If you have enabled publishing to Hive, you can specify the database and table to which you would like to publish results.

Info

NOTE: When launching the job, you must choose to generate results in Avro or Parquet format to publish to Hive. If you are publishing a wide dataset to Hive, you should generate results using Parquet.

Info

NOTE: Some

D s item
itemdata types
may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.

Administrators can connect the 

D s platform
 to an available instance of Hive. For more information, see Configure for Hive.

D caption
typefigure
Hive Publishing Options

Options:

  • Database: Name of Hive database. This value is case-insensitive.

    Info

    NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.

  • Table: Name of Hive table in database. A default value is generated based on your dataset name. This value is case-insensitive.
  • Format: Choose publishing format: avro or pqt (parquet).

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

To export the job results to the designated Hive table, click Publish. Publication happens in the background as a 

D s item
job
job
. You can track status in the Jobs page. See Jobs Page.

Publish to Redshift

If you have enabled publishing to Redshift, you can specify the database, schema, and table to which you would like to publish results.

Notes:

  • When launching the job, your output must be delivered to S3 with results in Avro, CSV, or JSON format. A Redshift connection requires S3 as your base storage layer. See Set Base Storage Layer.
  • When publishing output results to Redshift, you cannot publish CSV files with headers. You can publish single-file CSV without headers and multi-file CSV outputs, which have no headers by default.

Administrators can connect the 

D s platform
 to an available instance of Redshift. For more information, see Create Redshift Connections.

Publish to SQL DW

To publish to Microsoft SQL DW storage, please specify the following information.

Info

NOTE: Publishing to Microsoft SQL DW requires deployment of the

D s platform
on Azure and a base storage layer of WASB. For more information, see Configure for Azure.

Info

NOTE: Results must be in Parquet format to publish to SQL DW.

Options:

  • Database: Name of SQL DW database. This value is case-insensitive.

    Info

    NOTE: You cannot publish to a SQL DW database that is empty. The database must contain at least one table.

  • Schema: Name of the schema to use to publish. Schema and results must match in terms of column names, order, and data type.
  • Table: Name of table in database. A default value is generated based on your dataset name. This value is case-insensitive.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new table & load data: The platform creates the table and then loads it with the results from this job. If you attempt to use this option on a table that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing table: The results from this job are appended to the data that is already stored in the table, which already exists in the Hive database. If you attempt to append to a table that does not exist, the publishing job fails, and an error is generated in the log.
  • Truncate table & load data: Data is cleared from the target table, and new data is added to the existing schema.
  • Drop, recreate table & load data: Target table is dropped. A new table is created using the schema of the generated output and filled with the job results.

Publish to Tableau

If you have created a Tableau Server connection, you can export results that have been generated in TDE format to the connected server.

Info

NOTE: Generated results must be in TDE format for export.

Options:

  • Site Name: Name of the Tableau Server site.
  • Project Name: Name of the Tableau Server project.
  • Datasource Name: Name of the Tableau Server datasource. This value is displayed for selection in Tableau Server.

Data Option:

If you are publishing to a pre-existing table, schema validation is automatically performed.

  • Create new datasource: The platform creates the datasource and then loads it with the results from this job. If you attempt to use this option on a source that already exists, the publishing job fails, and an error is generated in the log.
  • Append data to existing datasource: The results from this job are appended to the data that is already stored in Tableau Server. If you attempt to append to a source that does not exist, the publishing job fails, and an error is generated in the log. Append operations also fail if you publish to a target with a different schema.
  • Replace contents of existing datasource: Target datasource is dropped. A new datasource is created using the schema of the generated output and filled with the job results.