For more information on the objects available in the platform, see Object Overview.
This release introduces macros, which are reusable sequences of parameterized steps. These sequences can be saved independently and references in other recipes in other flows. See Overview of Macros.
Datasets with parameters
Beginning in Release 5.0, imported datasets can be augmented with parameters, which enables operationalizing sampling and jobs based on date ranges, wildcards, or variables applied to the input path. For more information, see Overview of Parameterization.
In Release 4.2, the object model has undergone the following revisions to improve flexibility and control over the objects you create in the platform.
Wrangled datasets are removed
In Release 3.2, the object model introduced the concepts of imported datasets, recipes, and wrangled datasets. These objects represented data that you imported, steps that were applied to that data, and data that was modified by those steps.
In Release 4.2, the wrangled dataset object has been removed in place of two objects listed below. All of the functionality associated with a wrangled dataset remains, including the following actions. Next to these actions are the new object with which the action is associated.
|Wrangled Dataset action||Release 4.2 object|
|Run or schedule a job||Output object|
|Preview data||Recipe object|
|Reference to the dataset||Reference object|
NOTE: At the API level, the
wrangledDataset endpoint continues to be in use. In a future release, separate endpoints will be available for recipes, outputs, and references. For more information, see API Reference.
These objects are described below.
Recipes can be reused and chained
Since recipes are no longer tied to a specific wrangled dataset, you can now reuse recipes in your flow. Create a copy with or without inputs and move it to a new flow if needed. Some cleanup may be required.
This flexibility allows you to create, for example, recipes that are applicable to all of your datasets for initial cleanup or other common wrangling tasks.
Additionally, recipes can be created from recipes, which allows you to create chains of recipes. This sequencing allows for more effective management of common steps within a flow.
Before Release 4.2, reference datasets existed and were represented in the user interface. However, these objects existed in the downstream flow that consumes the source. If you had adequate permissions to reference a dataset from outside of your flow, you could pull it in as a reference dataset for use.
In Release 4.2, a reference is a link between a recipe in your flow to other flows. This object allows you to expose your flow's recipe for use outside of the flow. So, from the source flow, you can control whether your recipe is available for use.
This object allows you to have finer-grained control over the availability of data in other flows. It is a dependent object of a recipe.
NOTE: For multi-dataset operations such as union or join, you must now explicitly create a reference from the source flow and then union or join to that object. In previous releases, you could directly join or union to any object to which you had access.
In Release 4.1, outputs became a configurable object that was part of the wrangled dataset. For each wrangled dataset, you could define one or more publishing actions, each with its own output types, locations, and other parameters. For scheduled executions, you defined a separate set of publishing actions. These publishing actions were attached to the wrangled dataset.
In Release 4.2, an output is a defined set of scheduled or ad-hoc publishing actions. With the removal of the wrangled dataset object, outputs are now top-level objects attached to recipes. Each output is a dependent object of a recipe.
Flow View Differences
Below, you can see the same flow as it appears in Release 4.1 and Release 4.2. In each Flow View:
- The same datasets have been imported.
- POS-r01 has been unioned to POS-r02 and POS-r03.
- POS-r01 has been joined to REF-PROD, and the column containing the duplicate join key in the result has been dropped.
- In addition to the default CSV publishing action (output), a scheduled one has been created in JSON format and scheduled for weekly execution.
Release 4.1 Flow View
Release 4.2 Flow View
Flow View differences
- Wrangled dataset no longer exists.
- In Release 4.1, scheduling is managed off of the wrangled dataset. In Release 4.2, it is managed through the new output object.
- Outputs are configured in a very similar manner, although in Release 4.2, the tab is labeled, "Destinations."
- No changes to scheduling UI.
- Like the output object, the reference object is an externally visible link to a recipe in Flow View. This object just enables referencing the recipe object in other flows.
- See Flow View Page.
- In application pages where you can select tabs to view object types, the available selections are typically: All, Imported Dataset, Recipe, and Reference.
- Wrangled datasets have been removed from the Dataset Details page, which means that the job cards for your dataset runs have been removed.
- These cards are still available in the Jobs page when you click the drop-down next to the jjob entry.
- The list of jobs for a recipe is now available through the output object in Flow View. Select the object and review the job details through the right panel.
- In Flow View and the Transformer page, context menu items have changed.
Connections as a first-class object
In Release 4.1.1 and earlier, connections appeared as objects to be created or explored in the Import Data page. Through the left navigation bar, you could create or edit connections to which you had permission to do so. Connections were also selections in the Run Job page.
- Only administrators could create public connections.
- End-users could create private connections.
In Release 4.2, the Connections Manager enables you to manage your personal connections and (if you're an administrator) global connections. Key features:
- Connections can be managed like other objects.
- Connections can be shared, much like flows.
- When a flow with a connection is shared, its connection is automatically shared.
- For more information, see Overview of Sharing.
- Release 4.2 introduces a much wider range of connectivity options.
- Multiple Redshift connections can be created through this interface. In prior releases, you could only create a single Redshift connection, and it had to be created through the command line interface (CLI).
NOTE: Beginning in Release 4.2, all connections are initially created as private connections, accessible only to the user who created. Connections that are available to all users of the platform are called, public connections. You can make connections public through the Connections page.
For more information, see Connections Page.
This page has no comments.