Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the Run Job page, you can specify transformation and profiling jobs for the currently loaded dataset. Available options include output formats and output destinations.You can also configure the environment where the job is to be executed.

Tip

Tip: Jobs can be scheduled for periodic execution through Flow View page. For more information, see Add Schedule Dialog.



Tip

Tip: Columns that have been hidden in the Transformer page still appear in the generated output. Before you run a job, you should verify that all currently hidden columns are ok to include in the output.


D caption
typefigure
Run Job Page

...

From the available datastores in the left column, select the target for your publication. 

Image Modified

D caption
typefigure
Add Publishing Action

...

Info

NOTE: Do not create separate publishing actions that apply to the same file or database table.

New/Edit: You can create new or modify existing connections. See Create Connection Window.

 

Steps:

  1. Select the publishing target. Click an icon in the left column.
    1. If Hive publishing is enabled, you must select or specify a database table to which to publish.

      Depending on the running environment, results are generated in Avro or Parquet format. See below for details on specifying the action and the target table.

      If you are publishing a wide dataset to Hive, you should generate results using Parquet.

      For more information on how data is written to Hive, see Hive Data Type Conversions.

  2. Locate a publishing destination: Do one of the following.

    1. Explore: 

      Info

      NOTE: The publishing location must already exist before you can publish to it. The publishing user must have write permissions to the location.


      Info

      NOTE: If your HDFS environment is encrypted, the default output home directory for your user and the output directory where you choose to generate results must be in the same encryption zone. Otherwise, writing the job results fails with a Publish Job Failed error. For more information on your default output home directory, see Storage Config Page.

      1. To sort the listings in the current directory, click the carets next to any column name.
      2. For larger directories, browse using the paging controls.
      3. Use the breadcrumb trail to explore the target datastore. Navigate folders as needed.
    2. Search: Use the search bar to search for specific locations in the current folder only.
    3. Manual entry: Click the Edit icon to manually edit or paste in a destination.
  3. Choose an existing file or folder: When the location is found, select the file to overwrite or the folder into which to write the results.

    Info

    NOTE: You must have write permissions to the folder or file that you select.

    1. To write to a new file, click Create a new file

    Create a new file: See below.

  4. Create Folder: Depending on the storage destination, you can click it to create a new folder for the job inside the currently selected one. Do not include spaces in your folder name.
  5. As needed, you can parameterize the outputs that you are creating. Click Parameterize destination in the right panel. See Parameterize destination settings below.

  6. To save the publishing destination, click Add.

To update a publishing action, hover over its entry. Then, click Edit.

To delete a publishing action, select Delete from its context menu.

...

When you generate file-based results, you can configure the filename, storage format, compression, number of files, and the updating actions in the right-hand panel.

Image Modified

D caption
typefigure
Output File Settings

...

  1. Create a new file: Enter the filename to create. A filename extension is automatically added for you, so you should omit the extension from the filename.
  2. Output directory: Read-only value for the current directory. 
    1. To change it, navigate to the proper directory.

       

  3. Data Storage Format: Select the output format you want to generate for the job.
    1. Avro: 

      This format is used to support data serialization within a Hadoop environment.
    2. CSV and JSON: These formats are supported for all types of imported datasets and all running environments. 

      Info

      NOTE: JSON-formatted files that are generated by

      D s product
      are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.


    3. Parquet: This format is a columnar storage format.
    4. TDE: Choose TDE (Tableau Data Extract) to generate results that can be imported into Tableau.

      If you have created a Tableau Server connection, you can write results to Tableau Server or publish them after they have been generated in TDE format.

      Info

      NOTE: If you encounter errors generating results in TDE format, additional configuration may be required. See Supported File Formats.


    5. For more information, see Supported File Formats.
  4. Publishing action: Select one of the following:

    Info

    NOTE: If multiple jobs are attempting to publish to the same filename, a numeric suffix (_N) is added to the end of subsequent filenames (e.g. filename_1.csv).

     

     

     

    1. Create new file every run: For each job run with the selected publishing destination, a new file is created with the same base name with the job number appended to it (e.g. myOutput_2.csvmyOutput_3.csv, and so on). 
    2. Append to this file every run: For each job run with the selected publishing destination, the same file is appended, which means that the file grows until it is purged or trimmed.

      Info

      NOTE: When publishing single files to S3 or WASB, the append action is not supported.


      Info

      NOTE: When appending data into a Hive table, the columns displayed in the Transformer page must match the order and data type of the columns in the Hive table.


      Info

      NOTE: This option is not available for outputs in TDE format.


      Info

      NOTE: Compression of published files is not supported for an append action.


    3. Replace this file every run: For each job run with the selected publishing destination, the existing file is overwritten by the contents of the new results.
  5. More Options:

    1. Include headers as first row on creation: For CSV outputs, you can choose to include the column headers as the first row in the output. For other formats, these headers are included automatically.

      Info

      NOTE: Headers cannot be applied to compressed outputs.


    2. Include quotes: For CSV outputs, you can choose to include double quote marks around all values, including headers.

    3. Delimiter: For CSV outputs, you can enter the delimiter that is used to separate fields in the output. The default value is the global delimiter, which you can override on a per-job basis in this field.

      Tip

      Tip: If needed for your job, you can entire Unicode characters in the following format: \uXXXX.


      Info

      NOTE: The Spark running environment does not support use of multi-character delimiters for CSV outputs. You can switch your job to a different running environment or use single-character delimiters. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.


    4. Single File: Output is written to a single file. Default setting for smaller, file-based jobs

      or for jobs executed on the
      D s server

      .

    5. Multiple Files: Output is written to multiple files. Default setting for larger file-based jobs

      or for jobs executed on in a remote, cluster-based running environment

      .

  6. Compression: For text-based outputs, compression can be applied to significantly reduce the size of the output. Select a preferred compression format for each format you want to compress.

    Info

    NOTE: If you encounter errors generating results using Snappy, additional configuration may be required. See Supported File Formats.


  7. To save the publishing action, click Add.

...