This release of Trifacta® introduces scheduling of dataset execution from within your flows, as well as a number of bug fixes and system improvements.
Admin, Install, & Config:
Support for Cloudera 5.12. See Supported Deployment Scenarios for Cloudera.
NOTE: Support for Cloudera 5.9 has been deprecated. For more information, see End of Life and Deprecated Features.
- Schedule executions of one or more wrangled datasets within a flow. See Flow View Page.
- Disable individual steps in your recipes. See Recipe Panel.
- Search for columns by name. See Data Grid Panel.
Changes to System Behavior
Single-file run_job action is deprecated for CLI
Key Bug Fixes
|TD-25615||Error in flat-aggregation generation in Spark running environment.|
|TD-25438||Deleting an upstream reference node does not propagate results correctly to the Transformer page.|
TDE files generated by the TDE download option may fail to open in Tableau if column names are more than 124 characters in length.
NOTE: When you run the job, include a publishing option to publish to TDE format. When you export the generated results, this issue no longer appears in the output.
New Known Issues
When editing a schedule that was set for
Workaround: This bug is a display bug. The correct value is saved when the value is set to
This release of Trifacta Wrangler Enterprise includes the ability to share flows and a completely revamped Transformer page for a simpler, faster, and more consistent user experience. From the Transformer page, you can now collect ad-hoc samples using a wider variety of techniques. New integration and publishing options make the Trifacta platform broader in its reach throughout the enterprise. Read below for additional features and details.
Admin, Install, & Config:
Support for integration with MapR Hadoop clusters has been deprecated. The Trifacta platform continues to support Cloudera and Hortonworks. For more information on other available options, please contact your Trifacta representative.
NOTE: Support for CentOS 6.2.x and CentOS 6.3.x has been deprecated. Please upgrade to the latest CentOS 6.x release.
Support for Cloudera 5.11. See Supported Deployment Scenarios for Cloudera.
NOTE: Support for CDH 5.8 has been deprecated. See End of Life and Deprecated Features.
Support for HDP 2.6. See Supported Deployment Scenarios for Hortonworks.
NOTE: Support for HDP 2.4 has been deprecated. See End of Life and Deprecated Features.
- Integration with Alation data catalog service. See Enable Alation Sources.
- Integration with Waterline data catalog service. See Enable Waterline Sources.
Support for large-scale relational sources when executing jobs on Hadoop. See Enable Relational Connections.
Per-file import settings including file encoding type, automatic structure detection. See Import Data Page.
NOTE: The list of supported encoding types has changed. See Configure Global File Encoding Type.
Read/write support for Snappy compression. See Supported File Formats.
NOTE: Integration with fully compressed Hadoop clusters requires additional configuration. See Enable Integration with Compressed Clusters.
- Improved user experience with flows. See Flow View Page.
- Share a flow with one or more users, so you can collaborate on the same assets. See Flow View Page.
- New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.
- Sampling improvements:
- Highlight the recipe steps where a specific column is referenced. See Column Menus.
- Publishing to Hive:
- Trifacta Photon jobs can be automatically killed based on configurable runtime and memory consumption thresholds. See Configure Photon Running Environment.
- The Trifacta Photon running environment now supports Parquet format.
- SSO integration with AD/LDAP now supports auto-registration for users visiting the Trifacta application. See Configure SSO for AD-LDAP.
- For more information, see Changes to the Language.
Changes to System Behavior
Hadoop Pig running environment is no longer available
As of Release 4.1, the Pig running environment is no longer available for execution of jobs. Implications:
- Deployments that are connected to a Hadoop cluster must use Spark for job execution. See Configure Spark Running Environment.
- CLI scripts that reference running jobs on the
pigrunning environment must be updated. See Changes to the Command Line Interface.
- Integration with Cloudera Navigator is not supported in this release.
- Integration with HDI/WASB is supported but may require further configuration. Please contact Trifacta Support.
Python UDFs are no longer available
With the removal of the Hadoop Pig running environment, Python user-defined functions are no longer available.
NOTE: As of Release 4.1, all user-defined functions must be migrated to or created in Java. For more information, see Java UDFs.
For more information on migrating UDFs, see Changes to the User-Defined Functions.
Transform Editor has been removed
In Release 4.0.1 and earlier, you could type in Wrangle transformation steps as plain text in the Transform Editor as well as use the Transform Builder.
In Release 4.1 and later, the Transform Editor has been removed, in favor of an enhanced version of the Transform Builder.
Tip: You can copy and paste raw Wrangle commands into the Transformation/Choose a transformation textbox of the Transform Builder. The documentation still displays example transformation steps as Wrangle text commands.
See Transform Builder.
Dependencies Browser has been replaced
In Release 4.0.1, you could explore dependencies between your datasets through the Dependencies Browser, which was accessible through a graph in the toolbar in the Transformer page.
In Release 4.1, this browser has been replaced by the Dataset Navigator. In the Transformer page, click the drop-down next to the name of the current dataset. In the Dataset Navigator, you can browse the datasets through a list or flow view to locate another wrangled dataset to load.
In Release 4.2 and later, this browser has been renamed to the Recipe Navigator. See Recipe Navigator.
Manual database installation is no longer required
Prior to Release 4.0, the databases had to be installed manually.
In Release 4.0 and later, the databases are installed for you on the local server as part of the basic install process. For more information, see Install Databases.
If you need to re-install the databases, manual steps are still available. See Install Databases for PostgreSQL.
Head sample replaced by random sample on upgrade
In Release 4.0 and earlier, if your dataset used the initial rows (head) sample in the data grid, this sample is replaced by the random sample after the upgrade.
Tip: When the dataset is loaded in the Transformer page after upgrade, you can switch the sample back to the first rows sample. For more information, see Samples Panel.
- The Send a Copy feature introduced in Release 4.0 has been integrated with the general sharing capabilities. See Share Flow Dialog.
- Ah-hoc publishing to Redshift in CSV format is no longer supported. See Publishing Dialog.
Key Bug Fixes
|TD-23787||When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.|
|TD-22467||Last active sample is not displayed during preview of multi-dataset operations.|
|TD-22128||Cannot read multi-file Avro stream if data is greater than 500 KB.|
|TD-20796||For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.|
You cannot configure a publishing location to be a directory that does not already exist.
See Run Job Page.
New Known Issues
When a pivot transform is applied, some column histograms may not be updated.
Workaround: Refresh the page.
Cannot publish to Cloudera Navigator due to 500 - Internal Server Error.
Workaround: The Cloudera Navigator integration is not supported in this release. If it has been enabled in your deployment in a prior release, it must be disabled. To disable, please set the following property value in platform configuration. You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json. For more information, see Platform Configuration Methods.
Circular reference in schema of Avro file causes job in Spark running environment to fail. See https://issues.apache.org/jira/browse/AVRO-1285.
|TD-20882||Connectivity||Spark jobs based on relational sources fail if one or more columns is dropped from the source table.|
Values in a newly collected sample do not appear in sorted order, even though a sort transform had been previously applied.
Workaround: You can re-apply the sort transform to the new sample. Some limitations apply. For more information, see Sort Transform.
This page has no comments.