This release includes a number of key bug fixes and updates.
- Promote users to Trifacta Administrator role. See Create Admin Account.
Changes to System Behavior
Key Bug Fixes
|TD-31581||Editing joins in reconvergent flows fails with an error message.|
|TD-31509||Undo not persisted back to server after sample has been collected and loaded.|
|TD-31399||Join "select-all" performance is slower and can cause browser to hang.|
|TD-31327||Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.|
|TD-31305||Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.|
|TD-31165||Job results are incorrect when a sample is collected and then the last transform step is undone.|
The following security-related fixes were completed in this release.
In Apache Log4j 2.x before 2.8.2, when using the TCP socket server or UDP socket server to receive serialized log events from another application, a specially crafted binary payload can be sent that, when deserialized, can execute arbitrary code.
|TD-32712||Upgrade Apache portable runtime to latest version to address security vulnerability.|
|TD-32711||Upgrade Python version to address security vulnerability.|
Multiple integer overflows in libgfortran might allow remote attackers to execute arbitrary code or cause a denial of service (Fortran application crash) via vectors related to array allocation.
Hawk before 3.1.3 and 4.x before 4.1.1 allow remote attackers to cause a denial of service (CPU consumption or partial outage) via a long (1) header or (2) URI that is matched against an improper regular expression. Upgrade version of less to address this security vulnerability.
Spring Security (Spring Security 4.1.x before 4.1.5, 4.2.x before 4.2.4, and 5.0.x before 5.0.1; and Spring Framework 4.3.x before 4.3.14 and 5.0.x before 5.0.3) does not consider URL path parameters when processing security constraints. By adding a URL path parameter with special encodings, an attacker may be able to bypass a security constraint. The root cause of this issue is a lack of clarity regarding the handling of path parameters in the Servlet Specification. Some Servlet containers include path parameters in the value returned for getPathInfo() and some do not. Spring Security uses the value returned by getPathInfo() as part of the process of mapping requests to security constraints. In this particular attack, different character encodings used in path parameters allows secured Spring MVC static resource URLs to be bypassed.
Apache POI in versions prior to release 3.15 allows remote attackers to cause a denial of service (CPU consumption) via a specially crafted OOXML file, aka an XML Entity Expansion (XEE) attack.
If a user of Commons-Email (typically an application programmer) passes unvalidated input as the so-called "Bounce Address", and that input contains line-breaks, then the email details (recipients, contents, etc.) might be manipulated. Mitigation: Users should upgrade to Commons-Email 1.5. You can mitigate this vulnerability for older versions of Commons Email by stripping line-breaks from data, that will be passed to Email.setBounceAddress(String).
Apache Commons FileUpload before 1.3.3 DiskFileItem File Manipulation Remote Code Execution
New Known Issues
|TD-31627||Transformer Page - Tools|
Prefixes added to column names in the Join page are not propagated to subsequent recipe steps that already existed.
Workaround: Perform a batch rename of column names in a step after the join. See Rename Columns.
Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see https://issues.apache.org/jira/browse/SPARK-18016.
New Known External Issues
The following issues are sourced from third-party vendors and are impacting the Trifacta platform.
NOTE: For additional details and the latest status, please contact the third-party vendor listed below.
|External Ticket Number||3rd Party Vendor|
Impacted Trifacta Feature
Cloudera Issue: OPSAPS-39589
|Cloudera||Publishing to Cloudera Navigator|
Within the CDH 5.x product line, Cloudera Navigator only supports Spark 1.x. The Trifacta platform requires Spark 2.1 and later.
When Spark 2.x jobs are published to Cloudera Navigator, Navigator is unable to detect them, so they are never added to Navigator.
Release 5.0 of Trifacta® Self-Managed Enterprise Edition delivers major enhancements to the Transformer page and workspace, starting with the new Home page. Key management capabilities simplify the completion of your projects and management of scheduled job executions. This major release of the platform supports broader connectivity and integration.
Improving user adoption:
The new workspace features a more intuitive design to assist in building your wrangling workflows with a minimum of navigation. From the new Home page, you can quickly access common tasks, such as creating new datasets or flows, monitoring jobs, or revisiting recent work.
Tip: Check out the new onboarding tour, which provides an end-to-end walkthrough of the data wrangling process. Available to all users on first login of the new release.
Significant improvements have been delivered to the core transformation experience. In the Transformer page, you can now search across dozens of pre-populated transformations and functions, which can be modified in the familiar Transform Builder. Use the new Transformer toolbar to build pre-designed transformations from the menu interface.
New for Release 5.0, target matching allows you to import a representation of the final target schema, against which you can compare your work in the Transformer page. Easy-to-understand visual tags show you mismatches between your current recipe and the target you have imported. Click these tags to insert steps that align your columns with their counterparts in the target.
For multi-dataset operations, the new Auto Align feature in the Union tool improves matching capabilities between datasets, and various enhancements to the Join tool improve the experience.
Over 20 new Wrangle functions deliver new Excel-like capabilities to wrangling.
Previously a beta feature, relational connectivity is now generally available, which broadens access to more diverse data. Out-of-the-box, the platform now supports more relational connections with others available through custom configuration. From the Run Jobs page, you can now publish directly to Amazon Redshift.
Build dynamic datasets with variables and parameters. Through parameters, you can apply rules to match multiple files through one platform object, a dataset with parameters. Rules can contain regular expressions, patterns, wildcards, dates, and variables, which can be overridden during runtime job execution through the UI or API. Variables can also be applied to custom SQL datasets.
Using these parameterized datasets allows schedules to pick up new data each execution run and enables users to pass variable values through the API or UI to select different data apply to the job.
Release 5.0 delivers broader and enhanced integration with Microsoft Azure. With a few clicks in the Azure Marketplace, you can deploy the platform into a new or existing HDI cluster. Your deployment can seamlessly integrate with either ADLS or WASB and can be configured to connect to Microsoft SQL Data Warehouse. As needed, integrate with Azure Active Directory for single-sign on simplicity.
Here's what's new in Release 5.0.
Support for CDH 5.14.
NOTE: Support for CDH 5.11 has been deprecated. See End of Life and Deprecated Features.
Support for Spark 2.2.
NOTE: By default, the Trifacta platform is configured to use Spark 2.1.0. Depending on your environment, you may be required to change the configuration to Spark 2.2, particularly if you are integrating with an EMR cluster. For more information, see Configure for Spark.
- Integrate your Microsoft Azure deployment with ADLS and WASB.
- Support for Azure Single Sign On. See Configure SSO for Azure AD.
- Integrate with domain-joined clusters using SSO. See Configure for HDInsight.
- Support for read-only and read-write connections to Microsoft SQL DW. See Configure for Azure.
- Through the application, you can now use Tricheck to check the server requirements and connectivity of the Trifacta node to the connected cluster. See Admin Settings Page.
New Home page and left nav bar allows for more streamlined access to recent flows and jobs, as well as learning resources. See Home Page.
Tip: Try the tutorial available from the Home page. See Home Page.
- Manage your datasets and references from the new Library page. See Library Page.
- In the new Jobs page, you can more easily locate and review all jobs to which you have access.
- Administrators can view and cancel jobs launched by other users.
- See Jobs Page.
- Use parameterized rules in imported datasets to allow scheduled jobs and API executions to automatically pick up the right input data. See Overview of Parameterization.
- Assign a new Target to your recipes to provide guidance during wrangling. See Overview of Target Matching.
- Search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser.
- Targets assigned to a recipe appear as column header overlay to assist users in aligning their dataset to match the dataset schema to the target schema. See Data Grid Panel.
- Cancel in-progress sampling jobs. See Samples Panel.
- New toolbar provides faster access to common transformations and operations. See Transformer Toolbar.
- Better intelligence for column matching during union operations. See Union Page.
- Numerous functional improvements to the Join page. See Join Page.
Run Job Page:
- Specify Redshift publishing actions as part of the job specification. See Run Job Page.
- Delete unused connections through the application. See Connections Page.
Changes to System Behavior
NOTE: If you are upgrading an instance that was integrated with an EMR cluster, the EMR cluster ID must be applied to the Trifacta platform. See Admin Settings Page.
NOTE: If you are integrating with an EMR cluster, EMR 5.7 is no longer supported. Please create an EMR 5.11 cluster instead. See End of Life and Deprecated Features.
- The aggregate transform has been removed from the platform. Instead, you can use the pivot transform to accomplish the same tasks. For more information, see Changes to the Language.
Key Bug Fixes
|TD-28930||Delete other columns causes column lineage to be lost and reorders columns.|
|TD-28573||Photon running environment executes column splits for fixed length columns using byte length, instead of character length. In particular, this issue affects columns containing special characters.|
|TD-27784||Ubuntu 16 install for Azure: supervisord complains about "missing" Python packages.|
|TD-26069||Photon evaluates |
New Known Issues
When creating Tableau Server connections, the Test Connection button is missing.
Workaround: Create the connection. Create a very simple dataset with minimal recipe. Run a job on it. From the Export Results window, try to publish to Tableau Server. If you cannot connect to the Tableau Server, try specifying a value for the Site Name in the Export Results window.
Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.
NOTE: This issue also applies to flows that were upgraded from a previous release.
Workaround: Recreate the samples after the move or copy.
|TD-31252||Transformer Page - Tools|
Assigning a target schema through the Column Browser does not refresh the page.
Workaround: To update the page, reload the page through the browser.
Job results are incorrect when a sample is collected and then the last transform step is undone.
Workaround: Recollect a sample after undoing the transform step.
Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.
Workaround: Re-run the job with Snappy compression disabled. Then, export the new dataset.
|TD-30820||Compilation/Execution||Some string comparison functions process leading spaces differently when executed on the Photon or the Spark running environment.|
|TD-30717||Connectivity||No validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.|
Spark job run on ALDS cluster fails when Snappy compression is applied to the output.
Workaround: Job execution should work if Snappy compression is installed on the cluster.
|TD-30342||Connectivity||No data validation is performed during publication to Redshift or SQL DW.|
Redshift: No support via CLI or API for:
Workaround: Please execute these tasks through the application.
Pre-import preview of Bigint values from Hive or Redshift are incorrect.
Workaround: The preview is incorrect. When the dataset is imported, the values are accurate.
In reference dataset, UDF from the source dataset is not executed if new recipe contains a join or union step.
Workaround: Publish the source dataset. In the Export Results window, create a new dataset from the results. Import it as your reference data.
When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.
This page has no comments.