Page tree

 

Contents:


Release 5.0.1

This release includes a number of key bug fixes and updates. 

What's New

Changes to System Behavior

None.

Key Bug Fixes

TicketDescription
TD-31581Editing joins in reconvergent flows fails with an error message.
TD-31509Undo not persisted back to server after sample has been collected and loaded.
TD-31399Join "select-all" performance is slower and can cause browser to hang.
TD-31327Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.
TD-31305Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.
TD-31165Job results are incorrect when a sample is collected and then the last transform step is undone.

Security Fixes

The following security-related fixes were completed in this release.

TicketDescription
TD-33512

In Apache Log4j 2.x before 2.8.2, when using the TCP socket server or UDP socket server to receive serialized log events from another application, a specially crafted binary payload can be sent that, when deserialized, can execute arbitrary code.

See CVE-2017-5645.

TD-32712Upgrade Apache portable runtime to latest version to address security vulnerability.
TD-32711Upgrade Python version to address security vulnerability.
TD-32696

Multiple integer overflows in libgfortran might allow remote attackers to execute arbitrary code or cause a denial of service (Fortran application crash) via vectors related to array allocation.

See CVE-2014-5044.

TD-32629

Hawk before 3.1.3 and 4.x before 4.1.1 allow remote attackers to cause a denial of service (CPU consumption or partial outage) via a long (1) header or (2) URI that is matched against an improper regular expression. Upgrade version of less to address this security vulnerability.

See CVE-2016-2515.

TD-32623

Spring Security (Spring Security 4.1.x before 4.1.5, 4.2.x before 4.2.4, and 5.0.x before 5.0.1; and Spring Framework 4.3.x before 4.3.14 and 5.0.x before 5.0.3) does not consider URL path parameters when processing security constraints. By adding a URL path parameter with special encodings, an attacker may be able to bypass a security constraint. The root cause of this issue is a lack of clarity regarding the handling of path parameters in the Servlet Specification. Some Servlet containers include path parameters in the value returned for getPathInfo() and some do not. Spring Security uses the value returned by getPathInfo() as part of the process of mapping requests to security constraints. In this particular attack, different character encodings used in path parameters allows secured Spring MVC static resource URLs to be bypassed.

See CVE-2018-1199.

TD-32622

Apache POI in versions prior to release 3.15 allows remote attackers to cause a denial of service (CPU consumption) via a specially crafted OOXML file, aka an XML Entity Expansion (XEE) attack.

See CVE-2017-5644.

TD-32621

math.js before 3.17.0 had an arbitrary code execution in the JavaScript engine. Creating a typed function with JavaScript code in the name could result arbitrary execution.

See CVE-2017-1001002.

TD-32577

If a user of Commons-Email (typically an application programmer) passes unvalidated input as the so-called "Bounce Address", and that input contains line-breaks, then the email details (recipients, contents, etc.) might be manipulated. Mitigation: Users should upgrade to Commons-Email 1.5. You can mitigate this vulnerability for older versions of Commons Email by stripping line-breaks from data, that will be passed to Email.setBounceAddress(String).

See CVE-2018-1294.

TD-31427

 Apache Commons FileUpload before 1.3.3 DiskFileItem File Manipulation Remote Code Execution

See CVE-2016-1000031.

 

New Known Issues

TicketComponentDescription
TD-31627Transformer Page - Tools

Prefixes added to column names in the Join page are not propagated to subsequent recipe steps that already existed.

Workaround: Perform a batch rename of column names in a step after the join. See Rename Columns.

TD-30979Compilation/Execution

Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see  https://issues.apache.org/jira/browse/SPARK-18016.

New Known External Issues

The following issues are sourced from third-party vendors and are impacting the Trifacta platform

NOTE: For additional details and the latest status, please contact the third-party vendor listed below.

External Ticket Number3rd Party Vendor

Impacted Trifacta Feature

Description

Trifacta Ticket

Cloudera Issue: OPSAPS-39589

ClouderaPublishing to Cloudera Navigator

Within the CDH 5.x product line, Cloudera Navigator only supports Spark 1.x. The Trifacta platform requires Spark 2.1 and later.

When Spark 2.x jobs are published to Cloudera Navigator, Navigator is unable to detect them, so they are never added to Navigator.

For details, see https://www.cloudera.com/documentation/enterprise/release-notes/topics/cn_rn_known_issues.html#spark.

TD-22443

Release 5.0

Release 5.0 of Trifacta® Wrangler Enterprise delivers major enhancements to the Transformer page and workspace, starting with the new Home page. Key management capabilities simplify the completion of your projects and management of scheduled job executions. This major release of the platform supports broader connectivity and integration.

Improving user adoption:

The new workspace features a more intuitive design to assist in building your wrangling workflows with a minimum of navigation. From the new Home page, you can quickly access common tasks, such as creating new datasets or flows, monitoring jobs, or revisiting recent work. 

Tip: Check out the new onboarding tour, which provides an end-to-end walkthrough of the data wrangling process. Available to all users on first login of the new release.

Significant improvements have been delivered to the core transformation experience. In the Transformer page, you can now search across dozens of pre-populated transformations and functions, which can be modified in the familiar Transform Builder. Use the new Transformer toolbar to build pre-designed transformations from the menu interface.

New for Release 5.0, target matching allows you to import a representation of the final target schema, against which you can compare your work in the Transformer page. Easy-to-understand visual tags show you mismatches between your current recipe and the target you have imported. Click these tags to insert steps that align your columns with their counterparts in the target.

For multi-dataset operations, the new Auto Align feature in the Union tool improves matching capabilities between datasets, and various enhancements to the Join tool improve the experience.

Over 20 new Wrangle functions deliver new Excel-like capabilities to wrangling.

Enterprise operationalization:

Previously a beta feature, relational connectivity is now generally available, which broadens access to more diverse data. Out-of-the-box, the platform now supports more relational connections with others available through custom configuration. From the Run Jobs page, you can now publish directly to Amazon Redshift.

Build dynamic datasets with variables and parameters. Through parameters, you can apply rules to match multiple files through one platform object, a dataset with parameters. Rules can contain regular expressions, patterns, wildcards, dates, and variables, which can be overridden during runtime job execution through the UI or API. Variables can also be applied to custom SQL datasets.

Using these parameterized datasets allows schedules to pick up new data each execution run and enables users to pass variable values through the API or UI to select different data apply to the job.

Cloud focus:

Release 5.0 delivers broader and enhanced integration with Microsoft Azure. With a few clicks in the Azure Marketplace, you can deploy the platform into a new or existing HDI cluster. Your deployment can seamlessly integrate with either ADLS or WASB and can be configured to connect to Microsoft SQL Data Warehouse. As needed, integrate with Azure Active Directory for single-sign on simplicity.

What's New

Here's what's new in Release 5.0.

Install:

  • Support for CDH 5.14.

    NOTE: Support for CDH 5.11 has been deprecated. See End of Life and Deprecated Features.

  • Support for Spark 2.2. 

    NOTE: By default, the Trifacta platform is configured to use Spark 2.1.0. Depending on your environment, you may be required to change the configuration to Spark 2.2, particularly if you are integrating with an EMR cluster. For more information, see Configure for Spark.

Azure:

Admin:

  • Through the application, you can now use Tricheck to check the server requirements and connectivity of the Trifacta node to the connected cluster. See Admin Settings Page.

Workspace: 

  • New Home page and left nav bar allows for more streamlined access to recent flows and jobs, as well as learning resources. See Home Page.

    Tip: Try the tutorial available from the Home page. See Home Page.

  • Manage your datasets and references from the new Library page. See Library Page.
  • In the new Jobs page, you can more easily locate and review all jobs to which you have access. 
    • Administrators can view and cancel jobs launched by other users. 
    • See Jobs Page.

Workflow:

  • Use parameterized rules in imported datasets to allow scheduled jobs and API executions to automatically pick up the right input data.  See Overview of Parameterization.
  • Assign a new Target to your recipes to provide guidance during wrangling. See Overview of RapidTarget.

Transformer Page:

  • Search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser. 
  • Targets assigned to a recipe appear as column header overlay to assist users in aligning their dataset to match the dataset schema to the target schema. See Data Grid Panel.
  • Cancel in-progress sampling jobs. See Samples Panel
  • New toolbar provides faster access to common transformations and operations. See Transformer Toolbar.
  • Better intelligence for column matching during union operations. See Union Page.
  • Numerous functional improvements to the Join page. See Join Window.

Run Job Page:

  • Specify Redshift publishing actions as part of the job specification. See Run Job Page.

Connectivity:

Changes to System Behavior

NOTE: If you are upgrading an instance that was integrated with an EMR cluster, the EMR cluster ID must be applied to the Trifacta platform. See Admin Settings Page.

NOTE: If you are integrating with an EMR cluster, EMR 5.7 is no longer supported. Please create an EMR 5.11 cluster instead. See End of Life and Deprecated Features.

Language:

  • The aggregate transform has been removed from the platform. Instead, you can use the pivot transform to accomplish the same tasks. For more information, see Changes to the Language.

 

Key Bug Fixes

TicketDescription
TD-28930Delete other columns causes column lineage to be lost and reorders columns.
TD-28573

The Trifacta Photon running environment executes column splits for fixed length columns using byte length, instead of character length. In particular, this issue affects columns containing special characters.

TD-27784Ubuntu 16 install for Azure: supervisord complains about "missing" Python packages.
TD-26069

The Trifacta Photon running environment evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.

New Known Issues

TicketComponentDescription
TD-31354Connectivity

When creating Tableau Server connections, the Test Connection button is missing.

Workaround: Create the connection. Create a very simple dataset with minimal recipe. Run a job on it. From the Export Results window, try to publish to Tableau Server. If you cannot connect to the Tableau Server, try specifying a value for the Site Name in the Export Results window.

TD-31305Workspace

Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.

NOTE: This issue also applies to flows that were upgraded from a previous release.

Workaround: Recreate the samples after the move or copy.

TD-31252Transformer Page - Tools

Assigning a target schema through the Column Browser does not refresh the page.

Workaround: To update the page, reload the page through the browser.

TD-31165Compilation/Execution

Job results are incorrect when a sample is collected and then the last transform step is undone.

Workaround: Recollect a sample after undoing the transform step.

TD-30857Connectivity

Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.

Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

TD-30854Compilation/Execution

When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.

Workaround: Re-run the job with Snappy compression disabled. Then, export the new dataset.

TD-30820Compilation/Execution

Some string comparison functions process leading spaces differently when executed on the Trifacta Photon or the Spark running environment.

TD-30717ConnectivityNo validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.
TD-30361Compilation/Execution

Spark job run on ALDS cluster fails when Snappy compression is applied to the output.

Workaround: Job execution should work if Snappy compression is installed on the cluster.

TD-30342ConnectivityNo data validation is performed during publication to Redshift or SQL DW.
TD-30139Connectivity

Redshift: No support via CLI or API for:

  • creating Redshift connections,
  • running jobs on data imported from Redshift,
  • publishing jobs results to Redshift

Workaround: Please execute these tasks through the application.

TD-30074Type System

Pre-import preview of Bigint values from Hive or Redshift are incorrect.

Workaround: The preview is incorrect. When the dataset is imported, the values are accurate.

TD-28663Compilation/Execution

In reference dataset, UDF from the source dataset is not executed if new recipe contains a join or union step.

Workaround: Publish the source dataset. In the Export Results window, create a new dataset from the results. Import it as your reference data.

TD-27860Compilation/Execution

When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.

This page has no comments.