Page tree

Release 5.0.1


Contents:

   

Contents:


In the Run Job page, you can specify transformation and profiling jobs for the currently loaded dataset. Available options include output formats and output destinations.

You can also configure the environment where the job is to be executed.

Tip: Columns that have been hidden in the Transformer page still appear in the generated output. Before you run a job, you should verify that all currently hidden columns are ok to include in the output.

Figure: Run Job Page

Running Environment

Select the environment where you wish to execute the job. Some of the following environments may not be available to you. These options appear only if there are multiple accessible running environments.

NOTE: Running a job executes the transformations on the entire dataset and saves the transformed data to the specified location. Depending on the size of the dataset and available processing resources, this process can take a while.

Tip: The application attempts to identify the best running environment for you. You should choose the default option, which factors in the available environments and the size of your dataset to identify the most efficient processing environment.

Alteryx Server: Executes the job on the running environment hosted on the same server as the  Designer Cloud Enterprise Edition

Hadoop: Executes the job using the configured running environment for your Hadoop cluster.

Options

Profile Results: Optionally, you can disable profiling of your output, which can improve the speed of overall job execution. When the profiling job finishes, details are available through the Job Results page, including links to download results.

NOTE: Percentages for valid, missing, or mismatched column values may not add up to 100% due to rounding.This issue applies to the Photon running environment.

See Job Results Page.

Publishing Actions

 

You can add, remove, or edit the outputs generated from this job. By default, a CSV output for your home directory on the selected datastore is included in the list of destinations, which can be removed if needed. You must include at least one output destination. 

Columns:

  • Actions: Lists the action and the format for the output.
  • Location: The directory and filename or table information where the output is to be written.
  • Settings: Identifies the output format and any compression, if applicable, for the publication.

Actions:

  • To change format, location, and settings of an output, click the Edit icon.
  • To delete an output, click the X icon.

Add Publishing Action

From the available datastores in the left column, select the target for your publication. 

Figure: Add Publishing Action

NOTE: Do not create separate publishing actions that apply to the same file or database table.

New/Edit: You can create new or modify existing connections. See Create Connection Window.

 

Steps:

  1. Select the publishing target. Click an icon in the left column.
    1. If Hive publishing is enabled, you must select or specify a database table to which to publish.

      Depending on the running environment, results are generated in Avro or Parquet format. See below for details on specifying the action and the target table.

      If you are publishing a wide dataset to Hive, you should generate results using Parquet.

      For more information on how data is written to Hive, see Hive Data Type Conversions.

  2.  

    Locate a publishing destination: Do one of the following.

    1. Explore: 

      NOTE: The publishing location must already exist before you can publish to it. The publishing user must have write permissions to the location.

      NOTE: If your HDFS environment is encrypted, the default output home directory for your user and the output directory where you choose to generate results must be in the same encryption zone. Otherwise, writing the job results fails with a Publish Job Failed error. For more information on your default output home directory, see User Profile Page.

       

      1. To sort the listings in the current directory, click the carets next to any column name.
      2. For larger directories, browse using the paging controls.
      3. Use the breadcrumb trail to explore the target datastore. Navigate folders as needed.
    2. Search: Use the search bar to search for specific locations in the current folder only.
    3. Manual entry: Click the Edit icon to manually edit or paste in a destination.
  3. Choose an existing file or folder: When the location is found, select the file to overwrite or the folder into which to write the results.

    NOTE: You must have write permissions to the folder or file that you select.

    1. To write to a new file, click Create a new file

    Create a new file: See below.

  4. Create Folder: Depending on the storage destination, you can click it to create a new folder for the job inside the currently selected one. Do not include spaces in your folder name.
  5. To save the publishing destination, click Save Settings.

 

Variable Overrides

If the underlying datasets use variable parameters, you can apply overrides to the default values of the variables. Click the listed default value and insert a new value.

NOTE: Override values applied to a job are not validated. Invalid overrides may cause your job to fail.

NOTE: Variable overrides apply only to this job. Subsequent jobs use the default variable values, unless specified again. No data validation is performed on entries for override values.

For more information on variables, see Overview of Parameterization.

File Settings

When you generate file-based results, you can configure the filename, storage format, compression, number of files, and the updating actions in the right-hand panel.

Figure: Output File Settings

Configure the following settings.

  1. Create a new file: Enter the filename to create. A filename extension is automatically added for you, so you should omit the extension from the filename.
  2. Output directory: Read-only value for the current directory. 
    1. To change it, navigate to the proper directory.

       

  3. Data Storage Format: Select the output format you want to generate for the job.
    1. Avro: 

      This format is used to support data serialization within a Hadoop environment.
    2. CSV and JSON: These formats are supported for all types of imported datasets and all running environments. 

    3. Parquet: This format is a columnar storage format primarily available through Hadoop, although you can also use the Photon running environment for processing Parquet sources.
    4. TDE: Choose TDE (Tableau Data Extract) to generate results that can be imported into Tableau. If you have created a Tableau Server connection, you can publish the results directly into Tableau Server after they have been generated.
    5. For more information, see Supported File Formats.
  4. Publishing action: Select one of the following:

    1. Create new file every run: For each job run with the selected publishing destination, a new file is created with the same base name with the job number appended to it (e.g. myOutput_2.csvmyOutput_3.csv, and so on). 
    2. Append to this file every run: For each job run with the selected publishing destination, the same file is appended, which means that the file grows until it is purged or trimmed.

      NOTE: When publishing single files to S3, the append action is not supported.

      NOTE: When appending data into a Hive table, the columns displayed in the Transformer page must match the order and data type of the columns in the Hive table.

      NOTE: This file is not available for outputs in TDE format.

      NOTE: Compression of published files is not supported for an append action.

    3. Replace this file every run: For each job run with the selected publishing destination, the existing file is overwritten by the contents of the new results.
  5. More Options:

    1. Include headers as first row: For CSV outputs, you can choose to include the column headers as the first row in the output. For other formats, these headers are included automatically.

      NOTE: Headers cannot be applied to compressed outputs.

    2. Single File: Output is written to a single file.

    3. Multiple Files: Output is written to multiple files.
  6. Compression: For text-based outputs, compression can be applied to significantly reduce the size of the output. Select a preferred compression format for each format you want to compress.
  7. To save the publishing action, click Save Settings.

Hive Table Settings

When publishing to Hive, please complete the following steps to configure the table and settings to apply to the publish action.

NOTE: Some Alteryx data types may be exported to Hive using different data types. For more information on how types are exported to Hive, see Hive Data Type Conversions.

Steps:

  1. Select location: Navigate the Hive browser to select the database and table to which to publish.
    1. To create a new table, click Create a new table.
  2. Select table options:
    1. Table name:
      1. New table: enter a name for it. You may use a pre-existing table name, and schema checks are performed against it.
      2. Existing table: you cannot modify the name.
    2. Output database: To change the database to which you are publishing, click the Hive icon in the sidebar. Select a different database.

      NOTE: You cannot publish to a Hive database that is empty. The database must contain at least one table.

    3. Publish actions: Select one of the following.
      1. Create new table every run: Each run generates a new table with a timestamp appended to the name.
      2. Append to this table every run: Each run adds any new results to the end of the table.
      3. Truncate the table every run: With each run, all data in the table is truncated and replaced with any new results.
      4. Drop the table every run: With each run, the table is dropped (deleted), and all data is deleted. A new table with the same name is created, and any new results are added to it.
  3. To save the publishing action, click Save Settings.

Redshift Table Settings

If you are creating a publishing action for a Redshift database table, you must provide the following information.

NOTE: Some Alteryx data types may be exported to Redshift using different data types. For more information, see Redshift Data Type Conversions.

Steps:

  1. Select location: Navigate the Redshift browser to select the schema and table to which to publish.
    1. To create a new table, click Create a new table.
  2. Select table options:
    1. Table name:
      1. New table: enter a name for it. You may use a pre-existing table name, and schema checks are performed against it.
      2. Existing table: you cannot modify the name.
    2. Output database: To change the database to which you are publishing, click the Redshift icon in the sidebar. Select a different database.

    3. Publish actions: Select one of the following.
      1. Create new table every run: Each run generates a new table with a timestamp appended to the name.
      2. Append to this table every run: Each run adds any new results to the end of the table.
      3. Truncate the table every run: With each run, all data in the table is truncated and replaced with any new results.
      4. Drop the table every run: With each run, the table is dropped (deleted), and all data is deleted. A new table with the same name is created, and any new results are added to it.
  3. To save the publishing action, click Save Settings.

 

Run Job

To execute the job as configured, click Run Job. The job is queued for execution.

After a job has been queued, you can track its progress toward completion. See Dataset Details Page.

Automation

Run jobs via API

You can use the available REST APIs to execute jobs for known datasets. For more information, see API JobGroups Create v3.

For more information on the entire API workflow, see API Workflow - Develop a Flow.

Run jobs from the Command Line

Designer Cloud Enterprise Edition provides a command line interface that enables administrators to execute and monitor from the command line. As needed, completed jobs can be published to other datastores. 

For more information, see CLI for Jobs.

This page has no comments.