Page tree

 

Contents:


As needed, you can deploy flows that you have created into a separate, production environment where jobs for those flows can be executed on a periodic or scheduled basis. In this manner, you can create separation between your development and production environments and their flows. Deployment management includes the tools to migrate your software between environments, manage releases of it, and separately control access to development and production flows.

  • Deployment management involves the transfer of flows between development and production instances of the platform. A customer may have one or more instances of the platform.
  • For more managing user access to flows within the same development instance, you can use sharing. See Overview of Sharing.

This section describes how deployment management is applied to  Trifacta® Wrangler Enterprise

Key Features:

  • Development environment:
    • Export of flows and all dependent objects
    • Import back into Development deployments for further development
  • Production environment:
    • Import of flows
      • Import global and object-level mapping rules
    • Manage releases of flows
    • Rollback to previous versions as needed
  • APIs to manage deployments

You cannot import flows that were exported from a different edition, release, or build of the product.

Dev/Test and Prod Deployments

In a typical environment, deployments may be segmented between Development (Dev), Testing (Test), and Production (Prod) environments. With respect to the Trifacta platform, these deployments break down into the following:

NOTE: In some cases, Dev and Test may be the same instance.

NOTE: Multiple browser tabs or windows open to different versions of the product is not supported.

Platform instanceDescription
Development (Dev)

New flows and recipes are created in a Development instance of the platform. Experiments can be undertaken without concern that production use of the recipe or flow is affected.

Tip: You should do all of your recipe development and testing in Dev/Test. Avoid making changes in a Prod environment.

 

Rules should be established on how flows, datasets, and recipes are organized and structured. Where are these assets stored? Where are shared versions of them made available? What are the rules by which items in Dev can be moved to Test/Prod?

Testing (Test)

In the Testing deployment, the objects in development are subjected to various stress tests. In the Trifacta platform, this testing can include load testing, malformed inputs, and changes to any parameters affecting the use of the object.For example, scheduled executions of flows should be thoroughly tested in this deployment.When errors are detected, they can be corrected in Dev or Test. Ideally, they are first applied in Test to address the issue at hand. Changes should then be applied back into the Dev deployment, so that future versions can consume the fix.

Production (Prod)

In the Production deployment, flows and their objects are presumed to be ready for regular, read-only use. After imported flows are reconfigured for the environment, they are ready for immediate use and require no further modification.

  • Management of flows and jobs is typically handled via API.
  • The UI should be used for checking and modifying settings and perform on-demand job executions to verify operations.

When errors are detected, you can:

  • Revert to a previous version of the flow
  • Apply any fixes in the Dev/Test instance for refinement and eventual updating back to the Prod instance.

Implementation in the platform

In the Trifacta platform, deployment management can be addressed in either of the following ways. 

Implementation typeDescription
Separate environments: Multiple instances of the platform

Dev, Test, or both environments are separate instances of the Trifacta platform from the Production environment.

Flows are migrated between environments using the export/import mechanisms.

NOTE: Each platform instance is configured to be either a Dev instance or a Prod instance.

All-in-one: Single instance of the platform, separate roles

Dev, Test, and Prod are contained in a single instance of the Trifacta platform. This scenario can apply to cloud-based environments as well.

A user can access either Dev/Test or Prod, but not both at the same time. In this scenario, a user can access Production deployments by having the Deployment account role.

Tip: Access to the Production environments should be tightly controlled to prevent inadvertant changes to Production jobs.

 

Licensing

NOTE: Trifacta Wrangler Enterprise is licensed on a per-node and per-user basis. If you have a sufficient number of nodes and users in your license to support multiple instances including any additional cluster nodes for running jobs, no additional licensing is required.

 

Terminology

Production environment terminology changes

A Prod environment focuses on management of the following objects. Differences between how these objects are used in a Dev environment are noted below.

NOTE: Some objects are available only in the Production environment. These objects are described later.

ObjectDifferences
Flows

In a Prod environment, you can review a flow through Flow View.

NOTE: Avoid making changes to your flows in a Prod environment. Any changes in the Prod version should be exported and then imported to the Dev version. Otherwise, when the next release is imported as a package into the Prod environment, those changes are lost.

Jobs

In the Prod environment, you can execute jobs against Prod flows. For the version of the flow that is active, you trigger a job for its overall deployment. Details are below.

These jobs are accessible through an interface that is very similar to a Dev environment.

The following flow objects from the Dev environment must be replaced in the Prod environment:

Dev ObjectReplacement
ConnectionsAny connections used in the Dev system must be recreated or replaced with connections in the Production system.
Output ObjectsOutput objects from the Dev flow must be recreated or replaced in Flow View in the Prod environment.
Imported DatasetsIf the Prod environment is not using the same sources as the Dev environment, you must create import rules to remap the point the flow to use imported datasets that are stored in a different location for the Prod environment.

Deployment Objects

In a Prod environment, you can explore the following objects, which are organized in a hierarchy:

LevelItemDescription
1deploymentA deployment is a versioned set of releases that have been uploaded to the Prod instance for use. You can think of it as a production instance of your primary flow and its dependencies.
2release

A release is a specific instance of a package that has been imported to the Prod instance. Each time you import, you create a new release within the deployment where you imported.

A release is created whenever you import a package into a deployment. A package is a ZIP file containing a flow definition that has been exported from an instance of the Trifacta platform.

3flows

Within a release, you can explore the primary flow and any upstream flows that were included in the package. Each flow can be explored through a version of the Flow View page.

  • The primary flow is the flow that you chose to export in the Dev instance.
  • A secondary flow is any flow that is included with the package for the primary flow because the primary one depends on it.


Enable Deployment

This feature must be enabled through configuration. When enabled, the user experience of the product changes significantly, and a number of features are no longer available, including the Transformer page and its ability to modify scripts. 

Tip: When you initially set up a platform instance, you should decide whether it is a Dev instance, a Prod instance or both.

For more information, see Configure Deployment Management.

User management

For more information on how to configure user accounts for deployment management, see Configure Deployment Management.

Import/Export

To transfer your flows between instances, you must export the flow from one instance of the platform and import it into the other instance of the platform. 

NOTE: If Dev and Prod are in the same instance, you must export the flow and import it into a deployment. These are separate processes.

NOTE: As part of the import process, you must define rules for how objects and values contained in the imported flow definition are remapped in the Prod environment. See below.  

Exported Flows

Through Flow View or the Flows page, you can export the flow through the context menu. The export is a ZIP file called a package.

NOTE: You must be the owner of a flow to export it.

A package ZIP contains all objects required to reconstruct and use the flow in a new environment.

  • It includes the exported flow and any flows on which it depends. 
  • It does not include data, samples, or jobs.

Upstream dependencies

If the outputs of an exported flow require imported datasets or recipes from another flow, that entire flow is included as part of the export package. This package includes objects that may not be required to run the primary exported flow. 

Connections

In the target instance, connections must be created prior to import. You may need to create import mapping rules to use this connections. See Connections Page.

Import

How a flow is imported depends on the environment into which you are importing it and how you intend to use it. 

NOTE: If a flow is imported into an instance that is different from the instance where it was created, you must first create remapping rules for values and objects contained in the flow definition. More information is provided below. 

For more information, see Import Flow.

Value and Object Mapping Rules

When objects are moved between environments, paths and other object-related references may require updating to point to the new environment.

NOTE: Import mapping rules do not work for parameterized datasets. If the imported dataset with parameters is still accessible, you should be able to run jobs from it.

For example, a dataset in the Dev environment may be pointing to the following location: 

hdfs:///mydata-dev/1/00005a1a-81b0-4e4d-9c9b-f42ce55e1dde/Open_Order.csv

For the Prod version, the flow may need to be changed to the following:

hdfs:///mydata-prod/1/11115z4a-92f5-9f91-7v7f-g22fk99f2rru/Open_Order.csv

 

 

To support this kind of remapping, you can specify import rules at the level of individual deployments.

 

NOTE: For each deployment that you create, you must define new import remapping rules.

These rules can be specified using literal values, Trifacta patterns, or regular expressions. For more information, see Define Import Mapping Rules.

Production Environment

When a user accesses a Production environment, the UI is changed to include only the following pages:

NOTE: You cannot modify recipes within a Prod instance because the Transformer page is not available. The Prod flow must be exported and re-imported into a Dev instance.

PageDescription
Deployments Page

On this page, you create deployments, for which you manage import of packages, activation of releases, and rollback to previous release as needed.

For more information on the deployment objects, see below.

Flow View Page

Within a specific release, you can review and update the flow definition, including specification of outputs and schedules. Flow View for a Prod instance has some restrictions.

NOTE: Use of scheduling through Flow View of a Prod instance is not supported. When a new release of a flow is imported, the schedule still points to the older release and is orphaned until the old release is reactivated or the schedule or release is removed.

Jobs PageSame as Dev instance. No changes.
Connections PageConnections that have been included as part of imported packages are available for review through the Production environment.
Admin Settings Page

Same as Dev instance. No changes to the interface.

NOTE: In a multi-instance environment, some settings do not apply to the Prod environment.

Version Management

When you explore a deployment, you can see the list of releases pertaining to the deployment, with the active release listed at the top of the list. The active release is the one that is triggered for execution when a job is run. 

You can roll back to using previous releases. Select Activate from the context menu for the desired release.

NOTE: Do not use scheduling features available through the user interface in a Production instance. If you have defined schedules through Flow View in the Prod instance and then add a new release, the schedules in the previous release are still available. You must remove them to prevent scheduled executions of outdated flows.

Flow View Page

In a Prod instance, you can drill into a release to review its flows through Flow View page. 

NOTE: Avoid making modifications to the flow in a Prod instance.

For more information, see Flow View Page.

Example Workflow

In this example, your environment contains separate Dev and Prod instances, each of which has a different set of users.

ItemDevProd
Environment http://wrangle-dev.example.com:3005 http://wrangle-prod.example.com:3005
User

User1

NOTE: User1 has no access to Prod.


Admin2
Source DBdevWrangleDBprodWrangleDB
Source TableDev-OrdersProd-Orders 
Connection NameDev Redshift ConnProd Redshift Conn

 

Example Flow:

User1 is creating a flow, which is used to wrangle weekly batches of orders for the enterprise. The flow contains:

  • A single imported dataset that is created from a Redshift database table.
  • A single recipe that modifies the imported dataset.
  • A single output to a JSON file.
  • Production data is hosted in a different Redshift database. So, the Prod connection is different from the Dev connection.

Steps:

  1. Build in Dev instance: User1 creates the flow and its steps.

  2. Export: When User1 is ready to push the flow to production, User1 exports the flow from the Flows page and delivers the export package ZIP to Admin2. See Export Flow.

  3. Deploy to Prod instance:
    1. Admin2 creates a new deployment in the Prod instance. See Deployments Page.
    2. Admin2 creates a new connection (Prod Redshift Conn) to the Redshift database ProdWrangleDB. See Create Connection Window.
    3. Admin2 creates an import rule to map the old connection (Dev Redshift Conn) to the new one (Prod Redshift Conn). See Define Import Mapping Rules.
    4. Admin2 uploads the export ZIP package provided by User1. See Import Flow.
    5. The deployment now contains a single release.

  4. Test deployment:
    1. Through Flow View in the Prod instance, Admin2 runs a job.
    2. In reviewing the profile results of the job, Admin2 discovers a problem with the script. One column contains a number of mismatched values.
    3. Admin2 chooses to fix in Dev and re-import into Prod.

      NOTE: Any changes made in Production that must appear in future releases must be applied back in the Dev environment, too. You can either 1) export the flow from Prod and import back into Dev, or 2) manually apply all Prod changes back to the Dev environment and export/import into Prod when ready.

  5. Fix in development: Back in the Dev environment, Admin2 opens the recipe for the flow.
    1. Admin2 adds a step to the recipe to delete the rows containing mismatched values for the column.
    2. Admin2 runs a job and verifies that the problem is fixed. In the visual profile for the dataset, the mismatched rows are removed from the dataset.

  6. Deploy again: Admin2 exports the flow and imports it again as a new release in the deployment.
    1. Since import rules have already been created for this deployment, the connection is automatically re-mapped for this second import.
    2. Admin2 runs a job. The results look fine.
    3. Admin2 removes profiling from the output object, since profiling takes time and is unnecessary in this production environment.

  7. Set schedule: Using cron, Admin2 sets a schedule to run the active release for this deployment once per week.
    1. Each week, the Prod-Orders table must be refreshed with data.
    2. The dataset is now operational in the Prod environment.

 

Recommended Practices

  • If possible, you should maintain separate instances of the platform for Dev and Prod. 
    • If you must use the All-in-One method of managing Dev and Prod instances, you should maintain a small number of non-admin accounts that are specifically used for deployment management.
  • Avoid scheduling Prod executions through Flow View. While possible, these schedules continue to exist even if the version of the flow has been replaced by another. Consequently, schedules that were specified through the application continue to execute, even though the flow itself is outdated. Instead, scheduled executions should be specified at the command line through cron jobs pointing at the latest release of each at all times.
  • Do not modify Flow View settings through a Prod instance. These settings are not applied back to the Dev version and are lost when the next release package is imported.

Job Execution

NOTE: Running a package containing more than 5 concurrent jobs is not supported.

On-demand jobs

You can configure jobs on-demand through the Flow View page of a Production instance. See Flow View Page.

Scheduled jobs

In Dev:

When your flow is exported from a Dev instance, all scheduling-related data is removed from the export package. 

In Prod:

In a Prod instance, an imported flow contains no schedules. You must configure schedules through the REST APIs to execute on the currently active release for each deployment. 

NOTE: Do not schedule executions through Flow View in a Prod instance.

  • Schedules defined in Flow View are applied to Active and Non-Active releases in Production environments.
  • If the scheduled release is deactivated, the schedule still exists, and the jobs are executed on an flow that is now out-of-date.

Automation

Automation of deployment management is supported through the APIs. 

NOTE: When you run a deployment, you run the primary flow in the active release for that deployment. Running the flow generates the output objects for all recipes in the flow.

NOTE: Scheduled execution of jobs in a deployment environment must be managed through external tools such as cron.  For more information on the endpoint to schedule, see API Deployments Run v3.

For more information on the APIs for deployment management, see API Endpoints.

For more information on an API-based method for deploying flows, see API Workflow - Deploy a Flow.

This page has no comments.