This release contains numerous bug fixes and some interesting new features.
- Specify create, append, or replace actions for your file publishing destinations. See Run Job Page.
- Import to Wrangle in one step. See Import Data Page.
Admin, Install, & Config:
Support for CDH 5.9. See Supported Deployment Scenarios for Cloudera .
NOTE: Support for CDH 5.5/CDH 5.6 has been deprecated. Please upgrade to CDH 5.8 or later.
Changes to System Behavior
Changes to the Language:
- A number of functions have been renamed to conform to common function names. See Changes to the Language.
Changes to the Command Line Interface:
- New file publishing options enable specifying create, append, and replace actions for file publishing destinations.
output_pathis now a required parameter for commands that use it.
NOTE: When specifying publishing options in the CLI, you may specify one file format only for the output.
- See Changes to the Command Line Interface.
- The setting to include headers in CSV downloads is now managed as part of the job publication workflow on a per-job basis.
- This setting is no longer available in the Admin Settings page.
- For more information, see Run Job Page.
- Access to S3 sources no longer requires the ListAllMyBuckets permission. If the permission is not granted:
- Users cannot see default buckets through the application.
- Default buckets must be explicitly configured to be displayed from within the application.
- Users can still access unlisted buckets by directly entering the full path in the S3 browser.
- See Enable S3 Access.
Key Bug Fixes
|TD-19404||Split transform using at parameter values out of range of cell size generates an error in Pig.|
|TD-19150||On Photon, |
|TD-19032||Swapping rapidly between source datasets that have already been edited may cause a |
|TD-18933||You cannot load a dataset that utilizes another dataset via join or union three levels deep.|
|TD-18268||If you profile a wide column (one that contains many characters of data in each cell value), the machine learning service can crash.|
|TD-18093||Changes to a dataset that generates new columns can break any downstream lookups that use the dataset.|
New Known Issues
Publish to Redshift of single-file CSV or JSON files fails.
Workaround: Publish files to Redshift as multi-part files. See Run Job Page.
After upgrade, job card summaries in the Jobs page may fail to load for jobs executed in the pre-upgrade version with steps containing functions that have been renamed.
Workaround: You can re-run the job in the upgraded version. For more information on the renamed functions for Release 3.2.1, see Changes to the Language.
When publishing to S3, you cannot write to a single file in an
Workaround: You can change the publish action to recreate the object, replace the object, or save it as a multi-file output.
When switching between an
Workaround: Cancel the edit in progress. Re-edit the publishing action to apply the compression setting to the
You cannot configure a publishing location to be a directory that does not already exist.
Workaround: Create the directory on the datastore outside of the Trifacta platform. Verify that the appropriate user accounts have access to the directory.
User are permitted to select compressed formats for
NOTE: For Release 3.2.1, the
Job execution fails with
Workaround: You can try to raise the soft and hard limit on number of processes available to the platform. For more information, see Miscellaneous Configuration.
Column browser does not recognize when you place a checkmark next to the last column in the list.
Workaround: You can move the column to another location and then select it.
Preview cards take a long time to load when selecting values from a Datetime column.
Workaround: For selection purposes, you can change the data type to String. Then, make your selections and build your transform steps before switching back to Datetime data type.
Workaround: Please review the variant information in the transform. Then, remove the step and re-apply the Date formatting through the Type drop-down for the column. The required type information is applied.
This release features the introduction of the following key features:
- A new and improved object model.
A completely redesigned execution engine (codename: Photon), which enables much better performance across larger samples in the Transformer page and faster execution on the Trifacta Server.
NOTE: To interact with the Photon running environment, all desktop instances of Google Chrome must have the PNaCl component enabled and updated to the minimum supported version. See Desktop Requirements.
NOTE: If you are upgrading from Release 3.1.x, you must manually enable the Photon running environment. If you are upgrading from an earlier version or installing Release 3.2 or later, the Photon running environment is enabled by default. See Configure Photon Running Environment.
- The Transform Builder, a menu-driven interface for rapidly building transforms.
- A new publishing interface with easier, more flexible configuration.
- Numerous other features and performance enhancements.
Details are below.
Redesigned object model and related changes to the Trifacta application enable greater flexibility in asset reuse in current and future releases.
NOTE: Beginning in Release 3.2, the Trifacta platform is transitioning to an enhanced object model, which is designed to support greater re-usability of objects and improved operationalization. This new object model and its related features will be introduced over multiple releases. For more information, see Changes to the Object Model.
- A newly designed interface helps you to quickly build transform steps. See Transform Builder.
- New publishing interface with more flexible configuration options for outputs. See Run Job Page.
- Scrolling and loading improvements in the Transformer page.
- Substantial increase in the size of samples in Transformer page for better visibility into source data and more detailed profiling.
- Use the Dependencies Browser to review and resolve dependency errors between your datasets. See Recipe Navigator.
- For more information on the implications, see Changes to the Object Model.
Explore automatically detected string patterns in column data using pattern profiling and build transforms based on these patterns. See Column Details Panel.
- Join tool now supports fuzzy join options. See Join Panel.
Admin, Install, & Config:
NOTE: The minimum system requirements for the Trifacta node have changed for this release. For more information, see System Requirements.
- Support for CentOS/RedHat Linux 7.1. See System Requirements.
- Support for Ubuntu 14.04.
- Ubuntu 12.04 is no longer supported.
- See System Requirements.
Support for CDH 5.8 core and with security. See Supported Deployment Scenarios for Cloudera .
NOTE: Support for CDH 5.3/CDH 5.4 has been deprecated. Please upgrade to CDH 5.8 or later.
- Support for Hortonworks 2.4 with security. See Supported Deployment Scenarios for Hortonworks.
- Configurable session duration. See Miscellaneous Configuration.
- Support for Google Chrome 51+ only. See Desktop Requirements.
- Connect to multiple deployments of the Trifacta Server through the Wrangler Enterprise desktop application. See Configure for Trifacta Enterprise Application.
Command Line Interface:
- Support for use of Kerberos credentials by the CLI.
- Support for asset transfer during user deletion.
- See Changes to the Command Line Interface.
- Support for end-to-end integration via API and CLI. For more information on content, please contact Trifacta Support.
Job Execution and Performance:
Superior performance in job execution. Run jobs on the Trifacta Server on much larger datasets and faster rate.
- Numerous performance improvements to the web application across many users.
New Batch Job Runner service simplifies job monitoring and improves performance.
NOTE: The Batch Job Runner service requires a separate database for tracking jobs. New and existing customers must manually install this database. See Install Databases for PostgreSQL.
- Improved error message on job failure.
- Publish to Redshift is now generally available. See Enable S3 Access.
- SSL support for Oracle, Postgres, and Teradata relational sources. See Create Connection Window.
- Numerous security enhancements.
Changes to System Behavior
This section outlines changes to how the platform behaves that have resulted from features or bug fixes in Release 3.2.
NOTE: Due to changes in system behavior, all existing random samples for a dataset are no longer available after upgrading to this release. For any upgraded dataset, the selected sample reverts to the default sample, the first N rows of the dataset. The number of rows in the sample depends on the number of columns, data density, and other factors.
When you load your dataset into the Transformer page for the first time:
The first N rows of the dataset is selected as a sample.
NOTE: The first N rows sample may change the data that is displayed in the data grid. In some cases, the data grid may initially display no data at all.
A new random sample is automatically generated for you.
- The Collect New Random Sample button is available. However, until you add a script step that changes the number of rows in the dataset, this button creates a random sample that is identical to the one that is automatically created for you when you first load the dataset into the Transformer page.
Changes to Wrangle
multisplittransform has been replaced by a more flexible version of the
splittransform. For more information, see Split Transform.
- Additional miscellaneous changes. See Changes to the Language.
Key Bug Fixes
|TD-18319||Inconsistent results for |
|TD-16086||Job list drop-down fails to enable selection of correct jobs.|
|TD-16084||Job cards display |
|TD-15609||Column filtering only works if filtering value is entered in lowercase.|
Attempt to publish to Cloudera Navigator for a Trifacta® Server job results in a DataNotFoundException.
|TD-15330||Pivot transform generates "Cannot read property 'primitive' of undefined" error.|
|TD-14541||Names for private connections can collide with names of global connections, resulting in private connection unable to be edited by the owning user.|
|TD-14397||Left or outer join against dataset with |
|TD-13162||Join key selection screen and buttons are not accessible on a small desktop screen.|
New Known Issues
Swapping rapidly between source datasets that have already been edited may cause a
Workaround: Log out and log in again. Perform your dataset swap as needed.
You cannot load a dataset that utilizes another dataset via join or union three levels deep.
Example: three datasets (
Workaround: You can generate results for the lower-level datasets and then create a new wrangled dataset from these results. However, you no longer automatically inherit changes from the source dataset(s).
Workaround: Use non-negative values as inputs.
When Photon is enabled, previews in the data grid may take up to 30 seconds to dismiss.
Workaround: This issue is related to the display of suggestion cards. Although it's not an ideal solution, you can experiment with disabling the display of preview cards in the data grid options menu. See Data Grid Panel .
Platform fails to start if Trifacta user for S3 access does not have the ListAllMyBuckets permission.
Workaround: Please verify that this user has the appropriate permissions.
In Release 3.1.2 and earlier, any datasource that has never been used to create a dataset is no longer available after upgrade.
Workaround: The assets remain untouched on the datastore where located. As long as the user has read permissions to the datastore area, the assets can be re-imported into the platform for Release 3.2 and later.
If you profile a wide column (one that contains many characters of data in each cell value), the machine learning service can crash.
Workaround: Restart the machine learning service. If visual profiling of the column is important, look to split the column into separate columns and then profile each one individually.
|TD-18093||Transformer Page - Tools|
Changes to a dataset that generates new columns can break any downstream lookups that use the dataset.
Workaround: If the lookup breaks, you can recreate it.
Preview of Hive tables intermittently fails to show table data. When you click the Eye icon to preview Hive table data, you might see a spinner icon.
Workaround: To workaround, preview data on another Hive table. Then, preview the data on the first table again. If you do not have another table to preview, try previewing the Hive table three times, which might work.
Remove references to Zookeeper in the platform.
Workaround: As of Release 3.2, the Trifacta platform no longer requires access to Zookeeper. However, removal of all references in the platform requires more work, which will be completed in a future release.
Workaround: Make sure you specify a valid value for
Workaround: This issue does not appear in the Photon running environment or in jobs executed in Photon or Hadoop Pig. See Configure Photon Running Environment.
|TD-16419||Transform Builder||Comparison functions added through Builder are changed to operators in recipe|
Importing a directory of Avro files only imports the first file when the Photon running environment is enabled.
Python and Java UDFs accept inputs with zero parameters.
Workaround: Insert a dummy parameter as part of the input.
Platform cannot execute jobs on Pig that are sourced from S3, if OpenJDK is installed.
Workaround: Install Oracle JDK 1.8 before installing the Trifacta platform. See System Requirements.
This page has no comments.