...
- Create a new file: Enter the filename to create. A filename extension is automatically added for you, so you should omit the extension from the filename.
- Output directory: Read-only value for the current directory.
To change it, navigate to the proper directory.
- Data Storage Format: Select the output format you want to generate for the job.
Avro:
This format is used to support data serialization within a Hadoop environment.CSV and JSON: These formats are supported for all types of imported datasets and all running environments.
Info NOTE: JSON-formatted files that are generated by
are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.D s product - Parquet: This format is a columnar storage format.
TDE: Choose TDE (Tableau Data Extract) to generate results that can be imported into Tableau.
If you have created a Tableau Server connection, you can publish the results directly into Tableau Server after they have been generated.
Info NOTE: If you encounter errors generating results in TDE format, additional configuration may be required. See Supported File Formats.
- For more information, see Supported File Formats.
Publishing action: Select one of the following:
Info NOTE: If multiple jobs are attempting to publish to the same filename, a numeric suffix (
_N
) is added to the end of subsequent filenames (e.g.filename_1.csv
).- Create new file every run: For each job run with the selected publishing destination, a new file is created with the same base name with the job number appended to it (e.g.
myOutput_2.csv
,myOutput_3.csv
, and so on). Append to this file every run: For each job run with the selected publishing destination, the same file is appended, which means that the file grows until it is purged or trimmed.
Info NOTE: When publishing single files to S3 or WASB, the
append
action is not supported.Info NOTE: When appending data into a Hive table, the columns displayed in the Transformer page must match the order and data type of the columns in the Hive table.
Info NOTE: This option is not available for outputs in TDE format.
Info NOTE: Compression of published files is not supported for an
append
action.- Replace this file every run: For each job run with the selected publishing destination, the existing file is overwritten by the contents of the new results.
- Create new file every run: For each job run with the selected publishing destination, a new file is created with the same base name with the job number appended to it (e.g.
More Options:
Include headers as first row on creation: For CSV outputs, you can choose to include the column headers as the first row in the output. For other formats, these headers are included automatically.
Info NOTE: Headers cannot be applied to compressed outputs.
Include quotes: For CSV outputs, you can choose to include double quote marks around all values, including headers.
Delimiter: For CSV outputs, you can enter the delimiter that is used to separate fields in the output. The default value is the global delimiter, which you can override on a per-job basis in this field.
Tip Tip: If needed for your job, you can entire Unicode characters in the following format:
\uXXXX
.Info NOTE: The Spark running environment does not support use of multi-character delimiters for CSV outputs. You can switch your job to a different running environment or use single-character delimiters. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.
Single File: Output is written to a single file.
- Multiple Files: Output is written to multiple files.
Compression: For text-based outputs, compression can be applied to significantly reduce the size of the output. Select a preferred compression format for each format you want to compress.
Info NOTE: If you encounter errors generating results using Snappy, additional configuration may be required. See Supported File Formats.
- To save the publishing action, click Add.
...