As needed, you can deploy flows that you have created into a separate, production environment where jobs for those flows can be executed on a periodic or scheduled basis. In this manner, you can create separation between your development and production environments and their flows. Deployment management includes the tools to migrate your software between environments, manage releases of it, and separately control access to development and production flows.
This section describes how deployment management is applied to .
Key Features:
You cannot import flows that were exported from a different edition, release, or build of the product. |
In a typical environment, deployments may be segmented between Development (Dev), Testing (Test), and Production (Prod) environments. With respect to the , these deployments break down into the following:
NOTE: In some cases, Dev and Test may be the same instance. |
NOTE: Multiple browser tabs or windows open to different versions of the product is not supported. |
Platform instance | Description | |
---|---|---|
Development (Dev) | New flows and recipes are created in a Development instance of the platform. Experiments can be undertaken without concern that production use of the recipe or flow is affected.
Rules should be established on how flows, datasets, and recipes are organized and structured. Where are these assets stored? Where are shared versions of them made available? What are the rules by which items in Dev can be moved to Test/Prod? | |
Testing (Test) | In the Testing deployment, the objects in development are subjected to various stress tests. In the | |
Production (Prod) | In the Production deployment, flows and their objects are presumed to be ready for regular, read-only use. After imported flows are reconfigured for the environment, they are ready for immediate use and require no further modification.
When errors are detected, you can:
|
In the , deployment management can be addressed in either of the following ways.
Implementation type | Description | |
---|---|---|
Separate environments: Multiple instances of the platform | Dev, Test, or both environments are separate instances of the Flows are migrated between environments using the export/import mechanisms.
| |
All-in-one: Single instance of the platform, separate roles | Dev, Test, and Prod are contained in a single instance of the A user can access either Dev/Test or Prod, but not both at the same time. In this scenario, a user can access Production deployments by having the Deployment account role. |
Tip: Access to the Production environments should be tightly controlled to prevent inadvertant changes to Production jobs. |
NOTE: |
A Prod environment focuses on management of the following objects. Differences between how these objects are used in a Dev environment are noted below.
NOTE: Some objects are available only in the Production environment. These objects are described later. |
Object | Differences | |
---|---|---|
Flows | In a Prod environment, you can review a flow through Flow View.
| |
Jobs | In the Prod environment, you can execute jobs against Prod flows. For the version of the flow that is active, you trigger a job for its overall deployment. Details are below. These jobs are accessible through an interface that is very similar to a Dev environment. |
The following flow objects from the Dev environment must be replaced in the Prod environment:
Dev Object | Replacement |
---|---|
Connections | Any connections used in the Dev system must be recreated or replaced with connections in the Production system. |
Output Objects | Output objects from the Dev flow must be recreated or replaced in Flow View in the Prod environment. |
Imported Datasets | If the Prod environment is not using the same sources as the Dev environment, you must create import rules to remap the point the flow to use imported datasets that are stored in a different location for the Prod environment. |
In a Prod environment, you can explore the following objects, which are organized in a hierarchy:
Level | Item | Description |
---|---|---|
1 | deployment | A deployment is a versioned set of releases that have been uploaded to the Prod instance for use. You can think of it as a production instance of your primary flow and its dependencies. |
2 | release | A release is a specific instance of a package that has been imported to the Prod instance. Each time you import, you create a new release within the deployment where you imported. A release is created whenever you import a package into a deployment. A package is a ZIP file containing a flow definition that has been exported from an instance of the |
3 | flows | Within a release, you can explore the primary flow and any upstream flows that were included in the package. Each flow can be explored through a version of the Flow View page.
|
This feature must be enabled through configuration. When enabled, the user experience of the product changes significantly, and a number of features are no longer available, including the Transformer page and its ability to modify scripts.
Tip: When you initially set up a platform instance, you should decide whether it is a Dev instance, a Prod instance or both. |
For more information, see Configure Deployment Management.
For more information on how to configure user accounts for deployment management, see Configure Deployment Management.
To transfer your flows between instances, you must export the flow from one instance of the platform and import it into the other instance of the platform.
NOTE: If Dev and Prod are in the same instance, you must export the flow and import it into a deployment. These are separate processes. |
NOTE: As part of the import process, you must define rules for how objects and values contained in the imported flow definition are remapped in the Prod environment. See below. |
Through Flow View or the Flows page, you can export the flow through the context menu. The export is a ZIP file called a package.
NOTE: You must be the owner of a flow to export it. |
A package ZIP contains all objects required to reconstruct and use the flow in a new environment.
If the outputs of an exported flow require imported datasets or recipes from another flow, that entire flow is included as part of the export package. This package includes objects that may not be required to run the primary exported flow.
In the target instance, connections must be created prior to import. You may need to create import mapping rules to use this connections. See Connections Page.
How a flow is imported depends on the environment into which you are importing it and how you intend to use it.
NOTE: If a flow is imported into an instance that is different from the instance where it was created, you must first create remapping rules for values and objects contained in the flow definition. More information is provided below. |
For more information, see Import Flow.
When objects are moved between environments, paths and other object-related references may require updating to point to the new environment.
NOTE: Import mapping rules do not work for parameterized datasets. If the imported dataset with parameters is still accessible, you should be able to run jobs from it. |
For example, a dataset in the Dev environment may be pointing to the following location:
hdfs:///mydata-dev/1/00005a1a-81b0-4e4d-9c9b-f42ce55e1dde/Open_Order.csv |
For the Prod version, the flow may need to be changed to the following:
hdfs:///mydata-prod/1/11115z4a-92f5-9f91-7v7f-g22fk99f2rru/Open_Order.csv |
To support this kind of remapping, you can specify import rules at the level of individual deployments.
NOTE: For each deployment that you create, you must define new import remapping rules. |
These rules can be specified using literal values, , or regular expressions. For more information, see Define Import Mapping Rules.
When a user accesses a Production environment, the UI is changed to include only the following pages:
NOTE: You cannot modify recipes within a Prod instance because the Transformer page is not available. The Prod flow must be exported and re-imported into a Dev instance. |
Page | Description | |
---|---|---|
Deployments Page | On this page, you create deployments, for which you manage import of packages, activation of releases, and rollback to previous release as needed. For more information on the deployment objects, see below. | |
Flow View Page | Within a specific release, you can review and update the flow definition, including specification of outputs and schedules. Flow View for a Prod instance has some restrictions.
| |
Jobs Page | Same as Dev instance. No changes. | |
Connections Page | Connections that have been included as part of imported packages are available for review through the Production environment. | |
Admin Settings Page | Same as Dev instance. No changes to the interface.
|
When you explore a deployment, you can see the list of releases pertaining to the deployment, with the active release listed at the top of the list. The active release is the one that is triggered for execution when a job is run.
You can roll back to using previous releases. Select Activate from the context menu for the desired release.
NOTE: Do not use scheduling features available through the user interface in a Production instance. If you have defined schedules through Flow View in the Prod instance and then add a new release, the schedules in the previous release are still available. You must remove them to prevent scheduled executions of outdated flows. |
In a Prod instance, you can drill into a release to review its flows through Flow View page.
NOTE: Avoid making modifications to the flow in a Prod instance. |
For more information, see Flow View Page.
In this example, your environment contains separate Dev and Prod instances, each of which has a different set of users.
Item | Dev | Prod | |
---|---|---|---|
Environment | http://wrangle-dev.example.com:3005 | http://wrangle-prod.example.com:3005 | |
User | User1
| Admin2 | |
Source DB | devWrangleDB | prodWrangleDB | |
Source Table | Dev-Orders | Prod-Orders | |
Connection Name | Dev Redshift Conn | Prod Redshift Conn |
Example Flow:
User1
is creating a flow, which is used to wrangle weekly batches of orders for the enterprise. The flow contains:
Steps:
Admin2 chooses to fix in Dev and re-import into Prod.
NOTE: Any changes made in Production that must appear in future releases must be applied back in the Dev environment, too. You can either 1) export the flow from Prod and import back into Dev, or 2) manually apply all Prod changes back to the Dev environment and export/import into Prod when ready. |
NOTE: Running a package containing more than 5 concurrent jobs is not supported. |
You can configure jobs on-demand through the Flow View page of a Production instance. See Flow View Page.
In Dev:
When your flow is exported from a Dev instance, all scheduling-related data is removed from the export package.
In Prod:
In a Prod instance, an imported flow contains no schedules. You must configure schedules through the REST APIs to execute on the currently active release for each deployment.
NOTE: Do not schedule executions through Flow View in a Prod instance.
|
Automation of deployment management is supported through the APIs.
NOTE: When you run a deployment, you run the primary flow in the active release for that deployment. Running the flow generates the output objects for all recipes in the flow. |
NOTE: Scheduled execution of jobs in a deployment environment must be managed through external tools such as cron. For more information on the endpoint to schedule, see API Deployments Run v3. |
For more information on the APIs for deployment management, see API Endpoints.
For more information on an API-based method for deploying flows, see API Workflow - Deploy a Flow.