Beginning in Release 3.2, changes are being applied to the
|D s platform|
These changes may have impacts on how you access features, although most features perform as expected from previous releases. In some cases:
- Features may behave differently.
- Features may be temporarily disabled in the current release, in favor of a new and improved implementation in a future release.
- Features may be removed altogether.
These changes are described in detail below.
For more information, see Object Overview.
Datasets with parameters
Beginning in Release 5.0, imported datasets can be augmented with parameters, which enables operationalizing sampling and jobs based on date ranges, wildcards, or variables applied to the input path. For more information, see Overview of Parameterization.
In Release 4.2, the object model has undergone the following revisions to improve flexibility and control over the objects you create in the platform.
Wrangled datasets are removed
In Release 3.2, the object model introduced the concepts of imported datasets, recipes, and wrangled datasets. These objects represented data that you imported, steps that were applied to that data, and data that was modified by those steps.
In Release 4.2, the wrangled dataset object has been removed in place of two objects listed below. All of the functionality associated with a wrangled dataset remains, including the following actions. Next to these actions are the new object with which the action is associated.
|Wrangled Dataset action||Release 4.2 object|
|Run or schedule a job||Output object|
|Preview data||Recipe object|
|Reference to the dataset||Reference object|
NOTE: At the API level, the
These objects are described below.
Recipes can be reused and chained
Since recipes are no longer tied to a specific wrangled dataset, you can now reuse recipes in your flow. Create a copy with or without inputs and move it to a new flow if needed. Some cleanup may be required.
This flexibility allows you to create, for example, recipes that are applicable to all of your datasets for initial cleanup or other common wrangling tasks.
Additionally, recipes can be created from recipes, which allows you to create chains of recipes. This sequencing allows for more effective management of common steps within a flow.
Before Release 4.2, reference datasets existed and were represented in the user interface. However, these objects existed in the downstream flow that consumes the source. If you had adequate permissions to reference a dataset from outside of your flow, you could pull it in as a reference dataset for use.
In Release 4.2, a reference is a link between a recipe in your flow to other flows. This object allows you to expose your flow's recipe for use outside of the flow. So, from the source flow, you can control whether your recipe is available for use.
This object allows you to have finer-grained control over the availability of data in other flows. It is a dependent object of a recipe.
NOTE: For multi-dataset operations such as union or join, you must now explicitly create a reference from the source flow and then union or join to that object. In previous releases, you could directly join or union to any object to which you had access.
In Release 4.1, outputs became a configurable object that was part of the wrangled dataset. For each wrangled dataset, you could define one or more publishing actions, each with its own output types, locations, and other parameters. For scheduled executions, you defined a separate set of publishing actions. These publishing actions were attached to the wrangled dataset.
In Release 4.2, an output is a defined set of scheduled or ad-hoc publishing actions. With the removal of the wrangled dataset object, outputs are now top-level objects attached to recipes. Each output is a dependent object of a recipe.
Flow View Differences
Below, you can see the same flow as it appears in Release 4.1 and Release 4.2. In each Flow View:
- The same datasets have been imported.
- POS-r01 has been unioned to POS-r02 and POS-r03.
- POS-r01 has been joined to REF-PROD, and the column containing the duplicate join key in the result has been dropped.
- In addition to the default CSV publishing action (output), a scheduled one has been created in JSON format and scheduled for weekly execution.
Release 4.1 Flow View
Release 4.2 Flow View
Flow View differences
- Wrangled dataset no longer exists.
- In Release 4.1, scheduling is managed off of the wrangled dataset. In Release 4.2, it is managed through the new output object.
- Outputs are configured in a very similar manner, although in Release 4.2, the tab is labeled, "Destinations."
- No changes to scheduling UI.
- Like the output object, the reference object is an externally visible link to a recipe in Flow View. This object just enables referencing the recipe object in other flows.
- See Flow View Page.
- In application pages where you can select tabs to view object types, the available selections are typically: All, Imported Dataset, Recipe, and Reference.
- Wrangled datasets have been removed from the Dataset Details page, which means that the job cards for your dataset runs have been removed.
- These cards are still available in the Jobs page when you click the drop-down next to the jjob entry.
- The list of jobs for a recipe is now available through the output object in Flow View. Select the object and review the job details through the right panel.
- In Flow View and the Transformer page, context menu items have changed.
Connections as a first-class object
In Release 4.1.1 and earlier, connections appeared as objects to be created or explored in the Import Data page. Through the left navigation bar, you could create or edit connections to which you had permission to do so. Connections were also selections in the Run Job page.
- Only administrators could create public connections.
- End-users could create private connections.
In Release 4.2, the Connections Manager enables you to manage your personal connections and (if you're an administrator) global connections. Key features:
- Connections can be managed like other objects.
- Connections can be shared, much like flows.
- When a flow with a connection is shared, its connection is automatically shared.
- For more information, see Overview of Sharing.
- Release 4.2 introduces a much wider range of connectivity options.
- Multiple Redshift connections can be created through this interface. In prior releases, you could only create a single Redshift connection, and it had to be created through the command line interface (CLI).
NOTE: Beginning in Release 4.2, all connections are initially created as private connections, accessible only to the user who created. Connections that are available to all users of the platform are called, public connections. You can make connections public through the Connections page.
For more information, see Connections Page.
In Release 3.2, the object model has been moved from a dataset-oriented structure to a flow-based structure. Previously, datasets created in the application represented the central data objects. In the new flow-based model, all datasets that have been touched in the application are contained in a new object, called a flow. A flow is essentially a replacement of the project object with a different set of behaviors, including automatic change propagation between datasets. In the future, it will support even greater flexibility and connectivity.
Similarly, scripts created in the Transformer page are now called recipes, which will become much more flexible and reusable objects in the future.
In prior releases, datasources were references to source data that existed outside of the application and were controlled by
|D s item|
|Old Term||New Term||Notes|
|A flow is a more generalized container for datasets, which will enable greater reuse of assets. See Flows Page.|
|A recipe contains all of the transform steps of a script, as well as interfaces for their reuse in other scripts and datasets. See Recipe Panel.|
|An imported dataset is one or more files imported from outside of the platform. Functionally identical to the datasource in previous releases. Imported datasets can be associated with your flow.|
|To distinguish from an imported dataset, a Wrangled dataset refers to any dataset that has been opened and edited in the Transformer page. A Wrangled dataset is a separate object in your flow.|
This is a simple terminology change. When you configure your jobs, you select the appropriate running environment, where your job is executed.See Run Job Page .
For more information on these changes, see Object Overview.
|Old Feature||New Feature||Description|
|Projects page||Flows page||Projects have been replaced by flows, which will offer much broader functionality and connectivity over the course of several releases. A flow is a storage container for imported and Wrangled datasets. See Flows Page.|
|Datasets page||Datasets page||The Datasets page is used to import data from an outside source. In Release 3.2 and later, you interact with both imported datasets and Wrangled datasets through the Datasets page. The workflow has changed a little bit. See Library Page.|
Imported datasets (datasources) are now managed through the Datasets page. This page is no longer available in the application.
Explore the dependencies in your datasets through the Transformer page. Identify dependency issues in the target dataset and then quickly navigate to the source issue to fix it. See Recipe Navigator.
Changes to System Behavior due to Object Model Changes
Automatic change propagation: Changes in one dataset automatically propagate to dependent datasets.
Imported and Wrangled datasets can be integrated into other datasets at any time. The changes to the object model support the propagation of changes in one dataset to be automatically applied in any datasets that consume the source dataset. This applies to the following:
NOTE: This propagation does not apply to:
1. Datasets that are created from the generated output of the dataset. Since the new dataset is the product of an executed job, it no longer has any connection to the changes in the source dataset. If you wish to propagate those changes, however, you can re-run the job and write out a new dataset. See Export Results Window .2. Copies of the dataset. Dataset copies are independent objects.
In Release 3.1.2 and earlier, multi-dataset operations, such as union, join, and lookup, were executed on a snapshot of the other dataset. For example, if dataset A performed a lookup into dataset B, the application internally performed a snapshot of dataset B and used the snapshot for completing the lookup. This snapshot is maintained separately.
When the platform is upgraded to Release 3.2 and later, this snapshotting behavior is preserved. Instead of maintaining the internal snapshot, the snapshot is migrated into a wrangled dataset of the same name.
NOTE: If your upgraded datasets included multi-dataset operations, you will see additional copies of the dataset that is used in the join or union. This dataset is saved such that the pre-migration snapshot is preserved. This method maintains the pre-upgrade state of the dataset and disables change propagation on the affected dataset.
If desired, you can edit this dataset or switch to the true source dataset to enable automated change propagation.
Additional impacts of automated change propagation of specific multi-dataset operations:
Joins: In prior releases, joins were executed on a snapshot of data. With automated change propagation, snapshotting is no longer necessary. The target dataset is automatically updated with any changes to the joined-in dataset.
NOTE: Automated change propagation can cause breakages in downstream datasets. For example, if you make changes to a dataset that is used in a join, those changes can break steps in the dataset into which it is joined. The Recipe panel can be used to identify these issues, which you can navigate to fix through the new Dependencies browser in the Transformer page. See Recipe Navigator.
See Join Panel.
Lookups: Similar to joins, changes in lookup data are automatically propagated. See Lookup Wizard.
Unions: In prior releases, when a dataset that was part of a
uniontransform was changed, an alert appeared in the Recipe panel of the target dataset to indicate that there was a change. Beginning in Release 3.2:
The data is automatically updated in the target dataset.
If the changes cause breakages, you can see the effects and the source dataset in the Recipe panel for the target dataset.
You can trace back these issues through the Dependencies browser. See Recipe Navigator.
Sharing disabled: Datasets cannot be shared between users for now.
This functionality will be replaced by more robust sharing capabilities in a future release.
Tip: You can make datasets available to other users by generating results from the source and then pointing other users to the generated results to use import into their flow. However, this is not sharing, as changes to the source dataset are not propagated to any datasets generated from the output.
Random samples generated on previous releases are no longer available in upgraded systems.
Due to changes in the object model and other factors, random samples that you created on your datasets in previous releases are no longer available in Release 3.2.
When you first open a dataset in the Transformer page in Release 3.2, a new random sample is automatically generated for you. Also, given the larger sample size in Release 3.2, your entire dataset may be displayed in the Transformer page.For additional details, see Release Notes 3.2.
Undo/redo of dataset swap has been removed.
Changes to the object model mean that you cannot use undo/redo controls in the Transformer page to change the dataset.
Tip: You can still select the previous dataset.