Explore the assets that you create and their relationships. Flows, imported datasets, and recipes are created to transform your sampled data. After you build your output object, you can run a job to transform the entire dataset based on your recipe and deliver the results according to your output definitions. |
Within , the basic unit for organizing your work is the flow. The following diagram illustrates the components of a flow and how they are related:
Assets in a Flow |
A flow is a container for holding one or more datasets, associated recipes and other assets. This container is a means for packaging for the following types of actions:
Creating relationships between datasets, their recipes, and other datasets.
Execution of pre-configured jobs
Data that is imported to the platform is referenced as an imported dataset. An imported dataset is simply a reference to the original data; the data does not exist within the platform. An imported dataset can be a reference to a file, multiple files, database table, or other type of data.
NOTE: An imported dataset is a pointer to a source of data. It cannot be modified or stored within |
After you have created an imported dataset, it becomes usable after it has been added to a flow. You can do this as part of the import process or later.
A recipe is a user-defined sequence of steps that can be applied to transform a dataset.
In a flow, the following asset types are associated with each recipe, which are described below:
Outputs contain one or more publishing destinations, which define the output format, location, and other publishing options that are applied to the results generated from a job run on the recipe.
When you select a recipe's output objects in a flow, you can:
When you select a recipe's reference, you can add it to another flow. This asset is then added as a reference dataset in the target flow. A reference dataset is a read-only version of the output data generated from the execution of a recipe's steps.
The following diagram illustrates the flexibility of relationships between assets within a flow.
Flow Example |
Type | Datasets | Description |
---|---|---|
Standard job execution | Recipe 1/Job 1 | Results of the job are used to create a new imported dataset (I-Dataset 2) from the Job Details page. |
Create dataset from generated results | Recipe 2/Job 2 | Recipe 2 is created off of I-Dataset 2 and then modified. A job has been specified for it, but the results of the job are unused. |
Chaining datasets | Recipe 3/Job 3 | Recipe 3 is chained off of Recipe 2. The results of running jobs off of Recipe 2 include all of the upstream changes as specified in I-Dataset 1/Recipe1 and I-Dataset 2/Recipe 2. |
Reference dataset | Recipe 4/Job 4 | I-Dataset 4 is created as a reference off of Recipe 3. It can have its own recipe, job, destinations, and results. |
Flows are created in the Flows page.
Recipes are edited in the Transformer page, which provides multiple methods for quickly selecting and building recipe steps.
Samples: Within the Transformer page, you build the steps of your recipe against a sample of the dataset.
Macros: As needed, you can create reusable sequences of steps that can be parameterized for use in other recipes.
Run Jobs: When you are satisfied with the recipe that you have created in the Transformer page, you can execute a job. A job may be composed of one or more of the following job types:
A connection is a configuration object that provides a personal or global integration to an external datastore. Reading data from remote sources and writing results are managed through connections.
You can associate a schedule with a flow. A schedule is a combination of one or more triggers and the outputs that are generated from them.
NOTE: A flow can have only one schedule associated with it. |
Below, you can see the hierarchy within a schedule.
+ schedule for Flow 1 + trigger 1 + trigger 2 + scheduled destination a + scheduled destination b + schedule for Flow 2 + trigger 3 + scheduled destination c + scheduled destination d |
Schedules are created for a flow through Flow View page.
A plan is a sequence of triggers and tasks that can be executed across multiple flows. A plan is executed on a snapshot of all assets at the time that the plan is triggered.
Plans are created through the Plans page.