Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Info

NOTE: If you chose to generate a profile of your job results, the transformation and profiling tasks may be combined into a single task, depending on your environment. If they are combined and profiling fails, any publishing tasks defined in the job are not launched. You may be able to ad-hoc publish the generated results. See below.


  • If present, you can click the Show Warnings link to see any warnings pertaining to recipe errors, including the relevant step number.To review the recipe and dependencies in your job, click View steps and dependencies. See the Dependencies tab below.

  • If you chose to profile results of your job, click View profile to review. See Profile tab below.
    • A visual profile provides a graphical snapshot of the results of a successful transformation job for the entire dataset and individual columns in the dataset. 
    • For more information on enabling a visual profile job, see Run Job Page.
    • For more information, see Overview of Visual Profiling.

Job Monitoring:

You can hover over the status of each stage of a job to review breakdowns for individual phases of each stage:

Tip

Tip: Depending on the operation, you may be able to monitor transfer rate performance for larger datasets.

  • Connect: The platform is attempting to connect to the datastore hosting the asset sources for the datasets.

  • Request: The platform is requesting the set of assets to deliver.

  • Ingesting: Depending on the type of source data, some jobs ingest data to the base storage layer in a converted format before processing begins. This ingested data is purged after job completion.

  • Prepare: (Publishing only) Depending on the destination, the Prepare phase includes the creation of temporary tables, generation of manifest files, and the fetching of extra connections for parallel data transfer.

  • Transfer: Assets are transferred to the target, which can be the platform or to the output datastore.

  • Process: Cleanup after data transfer, including the dropping of temporary tables or copying data within the instance.

For more information, see Overview of Job Monitoring.

Publish:

You can also review the outputs generated as a result of your job. To review and export any of the generated results, click View all. See Outputs Destinations tab below.

Job summary:

  • Job ID: Unique identifier for the job

    Tip

    Tip: If you are using the REST APIs, this value can be used to retrieve and modify specifics related to this job. For more information, see API Reference.

  • Job status: Current status of the job:
    • Queued:  Job has been queued for execution.
    • Running: Job is in progress.
    • Completed: Job has successfully executed.

      Info

      NOTE:  Invalid steps in a recipe are skipped, and it's still possible for the job to be executed successfully.

    • Failed: Job failed to complete. 

      Info

      NOTE:  You can re-run a failed job from the Transformer page. If you have since modified the recipe, those changes are applied during the second run. See Transformer Page.

    • Canceled: Job was canceled by the user.

  • Flow: Name of the flow from which the job was executed. Click the link to open the flow. See Flow View Page.
  • Output: Name of the output object that was used to define the generated results. Click the link to open the output. See Flow View Page.

Execution summary:

  • Job type: The method by which the job was executed:
    • Manual - Job was executed through the application interface.

    • Scheduled - Job was executed according to a predefined schedule. See Add Schedule Dialog.

  • User: The user who launched the job
  • Environment: Where applicable, the running environment where the job was executed is displayed.
  • Start time: Timestamp for when processing began on the job. This value may not correspond to when the job was queued for execution.
  • Finish time: Timestamp for when processing ended on the job, successful or not
  • Last update: Timestamp for when the job was last updated
  • Duration: Elapsed time of job execution

Optimization summary:

For jobs sourced from relational datasets, you can optionally enable SQL-based optimizations, which apply some of the steps specified in your recipe back in the datasource, where they can be executed before the data is transferred to the running environment for execution. Using these optimizations means faster performance based on a lower volume of data transfer.

  • Workspace administrators must enable the optimization feature for the workspace. For more information, see Workspace Settings Page.
  • When the feature is enabled, optimizations must be enabled for each flow. You can also select the optimizations to apply. For more information, see Flow Optimization Settings Dialog.

When optimizations have been applied to your flow, they are listed on the Overview tab:

  • Optimization: This setting is displayed if flow optimizations have been enabled for this flow.
  • Columns pruned: If one or more unused columns have been pruned in the datasource via SQL, the count of columns is listed here.
  • Filters pushed down: If one or more row filters has been applied in the datasource via SQL, the count of filters is listed here.

If an optimization is disabled or was not applied to the job run, it is not listed.

Output Destinations Tab

If the job has successfully completed, you can review the set of generated outputs and export results.

Image Removed

D caption
Output Destinations tab

Actions:

For each output, you can do the following:

  • View details: View details about the generated output in the side bar.

  • Download result: Download the generated output to your local desktop. 

    Info

    NOTE: Some file formats may not be downloadable to your desktop. See below.

  • Create imported dataset: Use the generated output to create a new imported dataset for use in your flows. See below.

    Info

    NOTE: This option is not available for all file formats.

Direct file download

Click one of the provided links to download the file through your browser to your local desktop.

Info

NOTE: If these options are not available, data download may have been disabled by an administrator.

HYPER: You can download HYPER formatted outputs to your desktop.

If you have generated output in a Tableau format and have configured a connection to Tableau Server, you can publish directly to the server. See Publishing Dialog .

Create imported dataset

Optionally, you can turn your generated results into new datasets for immediate use in 

D s product
. For the generated output, select Create imported dataset from its context menu.

Info

NOTE: When you create a new dataset from your job results, the file or files that were written to the designated output location are used as the source. Depending on your backend datastore permissions are configured, this location may not be accessible to other users.

After the new output has been written, you can create new recipes from it. See Build Sequence of Datasets.

Publish

If 

D s product
 is connected to an external storage system, you may publish your job results to it. Requirements:

  • Your version of the product supports publishing.
  • Your connection to the storage system includes write permissions.
  • Your results are generated in a format that the target system supports for writing.
  • All sub-jobs, including profiling, successfully completed.

...

  • If your job output specified SQL scripts to run before or after job execution, you can track their progress in the following stages:
    • Pre-ingest SQL: Script that is configured to run before the source data is ingested to the platform.
    • Post-publish SQL: Script that is configure to run after the output data has been published.
    • For additional details, see the SQL scripts tab below.
    • For more information on SQL scripts in job execution, see Create Output SQL Scripts.

...

Job Monitoring:

You can hover over the status of each stage of a job to review breakdowns for individual phases of each stage:

Tip

Tip: Depending on the operation, you may be able to monitor transfer rate performance for larger datasets.


  • Connect: The platform is attempting to connect to the datastore hosting the asset sources for the datasets.

  • Request: The platform is requesting the set of assets to deliver.

  • Ingesting: Depending on the type of source data, some jobs ingest data to the base storage layer in a converted format before processing begins. This ingested data is purged after job completion.

  • Prepare: (Publishing only) Depending on the destination, the Prepare phase includes the creation of temporary tables, generation of manifest files, and the fetching of extra connections for parallel data transfer.

  • Transfer: Assets are transferred to the target, which can be the platform or to the output datastore.

  • Process: Cleanup after data transfer, including the dropping of temporary tables or copying data within the instance.

For more information, see Overview of Job Monitoring.

Publish:

You can also review the outputs generated as a result of your job. To review and export any of the generated results, click View all. See Outputs Destinations tab below.

Job summary:

  • Job ID: Unique identifier for the job

    Tip

    Tip: If you are using the REST APIs, this value can be used to retrieve and modify specifics related to this job. For more information, see API Reference.

  • Job status: Current status of the job:
    • Queued:  Job has been queued for execution.
    • Running: Job is in progress.
    • Completed: Job has successfully executed.

      Info

      NOTE:  Invalid steps in a recipe are skipped, and it's still possible for the job to be executed successfully.

    • Failed: Job failed to complete. 

      Info

      NOTE:  You can re-run a failed job from the Transformer page. If you have since modified the recipe, those changes are applied during the second run. See Transformer Page.

    • Canceled: Job was canceled by the user.

  • Flow: Name of the flow from which the job was executed. Click the link to open the flow. See Flow View Page.
  • Output: Name of the output object that was used to define the generated results. Click the link to open the output. See Flow View Page.

Execution summary:

  • Job type: The method by which the job was executed:
    • Manual - Job was executed through the application interface.

    • Scheduled - Job was executed according to a predefined schedule. See Add Schedule Dialog.

  • User: The user who launched the job
  • Environment: Where applicable, the running environment where the job was executed is displayed.
  • Start time: Timestamp for when processing began on the job. This value may not correspond to when the job was queued for execution.
  • Finish time: Timestamp for when processing ended on the job, successful or not
  • Last update: Timestamp for when the job was last updated
  • Duration: Elapsed time of job execution

  • vCPU usage: Total vCPU hours used to run the job. For more information, see Usage Metrics.

...

Optimization summary:

For jobs sourced from relational datasets, you can optionally enable SQL-based optimizations, which apply some of the steps specified in your recipe back in the datasource, where they can be executed before the data is transferred to the running environment for execution. Using these optimizations means faster performance based on a lower volume of data transfer.

  • Workspace administrators must enable the optimization feature for the workspace. For more information, see Workspace Settings Page.
  • When the feature is enabled, optimizations must be enabled for each flow. You can also select the optimizations to apply. For more information, see Flow Optimization Settings Dialog.

When optimizations have been applied to your flow, they are listed on the Overview tab:

  • Optimization: This setting is displayed if flow optimizations have been enabled for this flow.
  • Columns pruned: If one or more unused columns have been pruned in the datasource via SQL, the count of columns is listed here.
  • Filters pushed down: If one or more row filters has been applied in the datasource via SQL, the count of filters is listed here.

If an optimization is disabled or was not applied to the job run, it is not listed.

Output Destinations Tab

If the job has successfully completed, you can review the set of generated outputs and export results.

Image Added

D caption
Output Destinations tab

Actions:

For each output, you can do the following:

  • View details: View details about the generated output in the side bar.

  • Download result: Download the generated output to your local desktop. 

    Info

    NOTE: Some file formats may not be downloadable to your desktop. See below.

  • Create imported dataset: Use the generated output to create a new imported dataset for use in your flows. See below.

    Info

    NOTE: This option is not available for all file formats.

Direct file download

Click one of the provided links to download the file through your browser to your local desktop.

Info

NOTE: If these options are not available, data download may have been disabled by an administrator.


HYPER: You can download HYPER formatted outputs to your desktop.

If you have generated output in a Tableau format and have configured a connection to Tableau Server, you can publish directly to the server. See Publishing Dialog.

Create imported dataset

Optionally, you can turn your generated results into new datasets for immediate use in 

D s product
. For the generated output, select Create imported dataset from its context menu.


Info

NOTE: When you create a new dataset from your job results, the file or files that were written to the designated output location are used as the source. Depending on your backend datastore permissions are configured, this location may not be accessible to other users.

After the new output has been written, you can create new recipes from it. See Build Sequence of Datasets.

Publish

If 

D s product
 is connected to an external storage system, you may publish your job results to it. Requirements:

  • Your version of the product supports publishing.
  • Your connection to the storage system includes write permissions.
  • Your results are generated in a format that the target system supports for writing.
  • All sub-jobs, including profiling, successfully completed.

For more information, see Publishing Dialog

SQL scripts Tab

If the output for your job included one or more pre- or post-job SQL script executions, you can review the status of their execution during the job.

Info

NOTE: If a SQL script fails to execute, all downstream phases of the job fail to execute.

Tip

Tip: If the SQL script execution for this job encountered errors, you can review those errors through this tab. For more detailed information, click Download logs.


Image Added

D caption
SQL scripts tab

Columns:

  • Connection: Name of the connection through which the script was executed.
  • SQL statement: The first part of the SQL script that was executed.
  • Settings:
    • Run before data ingest - script was executed pre-job.
    • Run after data publish - script was executed post-job, after the job results had been written.
  • Status: Current status and execution duration of the SQL script.

    Info

    NOTE: If you have multiple SQL scripts for each settings, they may execute in parallel. For example, if you created three pre-job SQL scripts, there is no guarantee that they executed in the order in which they are listed.

View details:

Hover over a SQL script entry and click View details.

In the SQL script details window, you can review:

  • Connection and SQL of the executed script.
  • Any error messages that occurred during execution.

    Tip

    Tip: To review log information for any error messages, click Download logs.

For more information on these types of SQL scripts, see Create Output SQL Scripts.

...

Profile Tab

Review the visual profile of your generated results in the Profile tab. Visual profiling can assist in identifying issues in your dataset that require further attention, including outlier values. 

...