Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

  • Connect: The platform is attempting to connect to the datastore hosting the asset sources for the datasets.
  • Schema validation: When enabled, the schemas of a job's datasources are checked as the first step of job execution.
    • Datasets with changes in them are reported at the top of the list. Click View all to see schema validation for all of the datasets used in the job in the Data sources tab.
    • Optionally, the job can be halted if there are differences between the schema that is read and the schema that has been stored from the previous job run. This option can prevent data corruption.
    • If no errors are detected, then the job is completed as normal.
    • For more information on schema validation, see Overview of Schema Management.
  • Request: The platform is requesting the set of assets to deliver.
  • Ingest: Depending on the type of source data, some jobs ingest data to the base storage layer in a converted format before processing begins. This ingested data is purged after job completion.
  • Prepare: (Publishing only) Depending on the destination, the Prepare phase includes the creation of temporary tables, generation of manifest files, and the fetching of extra connections for parallel data transfer.
  • Transfer: Assets are transferred to the target, which can be the platform or to the output datastore.
  • Transform: This stage covers the execution of your recipe steps in order to transform transformations on the source data. 

  • Profile: If you chose to profile your output data, this stage is completed after transformation is complete. Results are available in the Profile tab.

    Info

    NOTE: If you chose to generate a profile of your job results, the transformation and profiling tasks may be combined into a single task, depending on your environment. If they are combined and profiling fails, any publishing tasks defined in the job are not launched. You may be able to ad-hoc publish the generated results. See below.

  • Publish: This stage covers the writing of the outputs of the transformed data. These outputs are available through the Output destinations tab.

  • Process: Cleanup after data transfer, including the dropping of temporary tables or copying data within the instance.

...

  • Job ID: Unique identifier for the job

    Tip

    Tip: If you are using the REST APIs, this value can be used to retrieve and modify specifics related to this job. 

  • Job status: Current status of the job:
    • Queued:  Job has been queued for execution.
    • Running: Job is in progress.
    • Completed: Job has successfully executed.

      Info

      NOTE:  Invalid steps in a recipe are skipped, and it's still possible for the job to be executed successfully.

    • Failed: Job failed to complete. 

      Info

      NOTE:  You can re-run a failed job from the Transformer page. If you have since modified the recipe, those changes are applied during the second run.

...

For jobs sourced from relational datasets, you can optionally enable SQL-based optimizations, which apply some of the steps specified in your recipe back in the datasource, where they can be executed before the data is transferred to the running environment for execution. Using these optimizations means faster performance based on a lower volume of data transfer.

...

When the feature is enabled, optimizations Optimizations must be enabled for each flow. You can also select the optimizations to apply. 

When optimizations have been applied to your flow, they are listed on the Overview tab:

Optimization: This setting is displayed if flow optimizations have been enabled for this flow.

Columns pruned: If one or more unused columns have been pruned in the datasource via SQL, the count of columns is listed here.

...

In this tab, you can review a simplified representation of the flow from which what was executed for the job was executed. This flow view displays only the recipes and datasets assets that contributed to the generated results.

Tip

Tip: To open the full flowsource asset, you can click its name in the upper-left corner.

...

You can zoom the dependency graph canvas to display areas of interest in the flow graph

The zoom control options are available at the top-right corner of the dependency graph canvas. The following are the available zoom options:

...

Zoom out: Zoom out 10% from the canvas to see more of it.

Zoom to fit:  Change Change the zoom level to fit all of the objects of your flow onto the screen.

25%, 50%, or 100%: Change the zoom level to one of the preset levels.

...

  • You can select only recipes in the flow graph.
  • Context controls and menus are not available.

...

D caption
Data sources tab
Info

NOTE: If a flow an asset is unshared with you, you cannot see or access the datasources for any jobs that you have already run on the flowit, including any PDF profiles that you generated. You can still access the job results. This is a known issue.

...

This tab can be a good check to ensure that you have specified your dataset parameters correctly.

Parameters Tab

If your flow job references parameters, you can review the state of the parameters at the time of job execution.

Info

NOTE: This tab appears only if the job is sourced from a flow that references parameterscontains parameter references. For more information, see Overview of Parameterization.

...