Beginning in Release 3.2, changes are being applied to the object model. These changes are intended to improve overall operationalization of the platform, enable better reuse of objects, and drive the platform toward a more flexible, workflow-based usage. These changes are to be applied over multiple releases.
These changes may have impacts on how you access features, although most features perform as expected from previous releases. In some cases:
These changes are described in detail below.
For more information, see Object Overview.
Beginning in Release 5.0, imported datasets can be augmented with parameters, which enables operationalizing sampling and jobs based on date ranges, wildcards, or variables applied to the input path. For more information, see Overview of Parameterization.
In Release 4.2, the object model has undergone the following revisions to improve flexibility and control over the objects you create in the platform.
In Release 3.2, the object model introduced the concepts of imported datasets, recipes, and wrangled datasets. These objects represented data that you imported, steps that were applied to that data, and data that was modified by those steps.
In Release 4.2, the wrangled dataset object has been removed in place of two objects listed below. All of the functionality associated with a wrangled dataset remains, including the following actions. Next to these actions are the new object with which the action is associated.
|Wrangled Dataset action||Release 4.2 object|
|Run or schedule a job||Output object|
|Preview data||Recipe object|
|Reference to the dataset||Reference object|
NOTE: At the API level, the
These objects are described below.
Since recipes are no longer tied to a specific wrangled dataset, you can now reuse recipes in your flow. Create a copy with or without inputs and move it to a new flow if needed. Some cleanup may be required.
This flexibility allows you to create, for example, recipes that are applicable to all of your datasets for initial cleanup or other common wrangling tasks.
Additionally, recipes can be created from recipes, which allows you to create chains of recipes. This sequencing allows for more effective management of common steps within a flow.
Before Release 4.2, reference datasets existed and were represented in the user interface. However, these objects existed in the downstream flow that consumes the source. If you had adequate permissions to reference a dataset from outside of your flow, you could pull it in as a reference dataset for use.
In Release 4.2, a reference is a link between a recipe in your flow to other flows. This object allows you to expose your flow's recipe for use outside of the flow. So, from the source flow, you can control whether your recipe is available for use.
This object allows you to have finer-grained control over the availability of data in other flows. It is a dependent object of a recipe.
NOTE: For multi-dataset operations such as union or join, you must now explicitly create a reference from the source flow and then union or join to that object. In previous releases, you could directly join or union to any object to which you had access.
In Release 4.1, outputs became a configurable object that was part of the wrangled dataset. For each wrangled dataset, you could define one or more publishing actions, each with its own output types, locations, and other parameters. For scheduled executions, you defined a separate set of publishing actions. These publishing actions were attached to the wrangled dataset.
In Release 4.2, an output is a defined set of scheduled or ad-hoc publishing actions. With the removal of the wrangled dataset object, outputs are now top-level objects attached to recipes. Each output is a dependent object of a recipe.
Below, you can see the same flow as it appears in Release 4.1 and Release 4.2. In each Flow View:
In Release 4.1.1 and earlier, connections appeared as objects to be created or explored in the Import Data page. Through the left navigation bar, you could create or edit connections to which you had permission to do so. Connections were also selections in the Run Job page.
In Release 4.2, the Connections Manager enables you to manage your personal connections and (if you're an administrator) global connections. Key features:
NOTE: Beginning in Release 4.2, all connections are initially created as private connections, accessible only to the user who created. Connections that are available to all users of the platform are called, public connections. You can make connections public through the Connections page.
For more information, see Connections Page.
In Release 3.2, the object model has been moved from a dataset-oriented structure to a flow-based structure. Previously, datasets created in the application represented the central data objects. In the new flow-based model, all datasets that have been touched in the application are contained in a new object, called a flow. A flow is essentially a replacement of the project object with a different set of behaviors, including automatic change propagation between datasets. In the future, it will support even greater flexibility and connectivity.
Similarly, scripts created in the Transformer page are now called recipes, which will become much more flexible and reusable objects in the future.
In prior releases, datasources were references to source data that existed outside of the application and were controlled by . Beginning in Release 3.2, these objects, now called imported datasets, are independent objects and are associated with the dataset that uses them. They can be managed by the user who imports the data into the application.For an overview diagram, see Object Overview.
|Old Term||New Term||Notes|
|A flow is a more generalized container for datasets, which will enable greater reuse of assets. See Flows Page.|
|A recipe contains all of the transform steps of a script, as well as interfaces for their reuse in other scripts and datasets. See Recipe Panel.|
|An imported dataset is one or more files imported from outside of the platform. Functionally identical to the datasource in previous releases. Imported datasets can be associated with your flow.|
|To distinguish from an imported dataset, a Wrangled dataset refers to any dataset that has been opened and edited in the Transformer page. A Wrangled dataset is a separate object in your flow.|
This is a simple terminology change. When you configure your jobs, you select the appropriate running environment, where your job is executed.See Run Job Page .
For more information on these changes, see Object Overview.
|Old Feature||New Feature||Description|
|Projects page||Flows page||Projects have been replaced by flows, which will offer much broader functionality and connectivity over the course of several releases. A flow is a storage container for imported and Wrangled datasets. See Flows Page.|
|Datasets page||Datasets page||The Datasets page is used to import data from an outside source. In Release 3.2 and later, you interact with both imported datasets and Wrangled datasets through the Datasets page. The workflow has changed a little bit. See Library Page.|
Imported datasets (datasources) are now managed through the Datasets page. This page is no longer available in the application.
Explore the dependencies in your datasets through the Transformer page. Identify dependency issues in the target dataset and then quickly navigate to the source issue to fix it. See Recipe Navigator.
Imported and Wrangled datasets can be integrated into other datasets at any time. The changes to the object model support the propagation of changes in one dataset to be automatically applied in any datasets that consume the source dataset. This applies to the following:
NOTE: This propagation does not apply to:
1. Datasets that are created from the generated output of the dataset. Since the new dataset is the product of an executed job, it no longer has any connection to the changes in the source dataset. If you wish to propagate those changes, however, you can re-run the job and write out a new dataset. See Export Results Window .2. Copies of the dataset. Dataset copies are independent objects.
In Release 3.1.2 and earlier, multi-dataset operations, such as union, join, and lookup, were executed on a snapshot of the other dataset. For example, if dataset A performed a lookup into dataset B, the application internally performed a snapshot of dataset B and used the snapshot for completing the lookup. This snapshot is maintained separately.
When the platform is upgraded to Release 3.2 and later, this snapshotting behavior is preserved. Instead of maintaining the internal snapshot, the snapshot is migrated into a wrangled dataset of the same name.
NOTE: If your upgraded datasets included multi-dataset operations, you will see additional copies of the dataset that is used in the join or union. This dataset is saved such that the pre-migration snapshot is preserved. This method maintains the pre-upgrade state of the dataset and disables change propagation on the affected dataset.
If desired, you can edit this dataset or switch to the true source dataset to enable automated change propagation.
Additional impacts of automated change propagation of specific multi-dataset operations:
Joins: In prior releases, joins were executed on a snapshot of data. With automated change propagation, snapshotting is no longer necessary. The target dataset is automatically updated with any changes to the joined-in dataset.
NOTE: Automated change propagation can cause breakages in downstream datasets. For example, if you make changes to a dataset that is used in a join, those changes can break steps in the dataset into which it is joined. The Recipe panel can be used to identify these issues, which you can navigate to fix through the new Dependencies browser in the Transformer page. See Recipe Navigator.
See Join Panel.
Lookups: Similar to joins, changes in lookup data are automatically propagated. See Lookup Wizard.
Unions: In prior releases, when a dataset that was part of a
union transform was changed, an alert appeared in the Recipe panel of the target dataset to indicate that there was a change. Beginning in Release 3.2:
The data is automatically updated in the target dataset.
If the changes cause breakages, you can see the effects and the source dataset in the Recipe panel for the target dataset.
You can trace back these issues through the Dependencies browser. See Recipe Navigator.
This functionality will be replaced by more robust sharing capabilities in a future release.
Tip: You can make datasets available to other users by generating results from the source and then pointing other users to the generated results to use import into their flow. However, this is not sharing, as changes to the source dataset are not propagated to any datasets generated from the output.
Due to changes in the object model and other factors, random samples that you created on your datasets in previous releases are no longer available in Release 3.2.
When you first open a dataset in the Transformer page in Release 3.2, a new random sample is automatically generated for you. Also, given the larger sample size in Release 3.2, your entire dataset may be displayed in the Transformer page.For additional details, see Release Notes 3.2.
Changes to the object model mean that you cannot use undo/redo controls in the Transformer page to change the dataset.
Tip: You can still select the previous dataset.