Page tree

 

Contents:


Welcome to Release 5.1 of  Trifacta® Wrangler Enterprise! This release includes a significant expansion in database support and connectivity with more running environment versions, such as Azure Databricks. High availability is now available on the Trifacta platform node itself.

Within the Transformer page, you should see a number of enhancements, including improved toolbars and column menus. Samples can be named. 

Regarding operationalization of the platform, datasets with parameters can now accept Trifacta patterns for parameter specification, which simplifies the process of creating complex matching patterns. Additionally, you can swap out a static imported dataset for a dataset with parameters, which enables development on a simpler dataset before expanding to a more complex set of sources. Variable overrides can now be applied to scheduled job executions, and you can specify multiple variable overrides in Flow View. 

The underlying language has been improved with a number of transformations and functions, including a set of transformations designed around preparing data for machine processing. Details are below.

Tip: For a general overview of the product and its capabilities, see Product Overview.

What's New

Install:

  • Support for PostgreSQL 9.6 for Trifacta databases.

    NOTE: PostgreSQL 9.3 is no longer supported. PostgreSQL 9.3 is scheduled for end of life (EOL) in September 2018. For more information on upgrading, see Upgrade the Databases.

  • Partial support for MySQL 5.7 for hosting the Trifacta databases.

    NOTE: MySQL 5.7 is not supported for installation on Amazon RDS. See System Requirements.

  • Support for high availability on the Trifacta node. See Configure for High Availability.

    NOTE: High availability on the platform is in Beta release.

  • Support for CDH 5.15.

    NOTE: Support for CDH 5.12 has been deprecated. See End of Life and Deprecated Features.

  • Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements.

  • Support for integration with EMR 5.13, EMR 5.14, and EMR 5.15. See Configure for EMR.

    NOTE: EMR 5.13 - 5.15 require Spark 2.3.0. See Configure for Spark.

  • Support for integration with Azure Databricks. See Configure for Azure Databricks.
  • Support for WebAssembly, Google Chrome's standards-compliant native client. 

    NOTE: This feature is in Beta release.

    NOTE: In a future release, use of PNaCl native client is likely to be deprecated.

    Use of WebAssembly requires Google Chrome 68+. No additional installation is required. In this release, this feature must be enabled. For more information, see Miscellaneous Configuration.

  • The Trifacta® platform defaults to using Spark 2.3.0 for Hadoop job execution. See Configure for Spark.

Connectivity:

Import:

Flow View:

  • Specify overrides for multiple variables through Flow View. See Flow View Page.
  • Variable overrides can also be applied to scheduled job executions. See Add Schedule Dialog.

Transformer Page:

  • Join tool is now integrated into the context panel in the Transformer page. See Join Panel.
    • Improved join inference key model. See Join Panel.

  • Patterns are available for review and selection, prompting suggestions, in the context panel. See Pattern Details Panel.
  • Updated toolbar. See Transformer Toolbar.

  • Enhanced options in the column menu. See Column Menus.
  • Support for a broader range of characters in column names. See Rename Columns.

Sampling:

  • Samples can be named. See Samples Panel.
  • Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.

Jobs:

Language:

  • Rename columns using values across multiple rows. See Rename Columns
Transformation NameDescription
Bin columnPlace numeric values into bins of equal or custom size for the range of values.
Scale columnScale a column's values to a fixed range or to zero mean, unit variance distributions.
One-hot encodingEncode values from one column into separate columns containing 0 or 1, depending on the absence or presence of the value in the corresponding row.
Group ByGenerate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.

Publishing:

Execution:

  • UDFs are now supported for execution on HDInsight clusters. See Java UDFs.

Admin:

Changes to System Behavior

Diagnostic Server removed from product

The Diagnostic Server and its application page have been removed from the product. This feature has been superseded by Tricheck, which is available to administrators through the application. For more information, see Admin Settings Page.

Wrangle now supports nested expressions

The Wrangle now supports nested expressions within expressions. For more information, see Changes to the Language.

Language changes

  • The RAND function without parameters now generates true random numbers. 
  • When the source information is not available, the SOURCEROWNUMBER function can still be used. It returns null values in all cases.
  • New functions.

  • See Changes to the Language.

Key Bug Fixes

TicketDescription
TD-36332Data grid can display wrong results if a sample is collected and dataset is unioned.
TD-36192Canceling a step in recipe panel can result in column menus disappearing in the data grid.
TD-36011User can import modified exports or exports from a different version, which do not work.
TD-35916Cannot logout via SSO
TD-35899A deployment user can see all deployments in the instance.
TD-35780Upgrade: Duplicate metadata in separate publications causes DB migration failure.
TD-35746/v4/importedDatasets GET method is failing.
TD-35644Extractpatterns with "HTTP Query strings" option doesn't work
TD-35504Cancel job throws 405 status code error. Clicking Yes repeatedly pops up Cancel Job dialog.
TD-35481After upgrade, recipe is malformed at splitrows step.
TD-35177Login screen pops up repeatedly when access permission is denied for a connection.
TD-34822

Case-sensitive variations in date range values are not matched when creating a dataset with parameters.

NOTE: Date range parameters are now case-insensitive.

TD-33428

Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

NOTE: Avoid creating datasets that are wider than 1000 columns. Performance can degrade significantly on even a much more narrow dataset. You should limit yourself to under 500 columns in your dataset.

TD-31327Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.
TD-31252Assigning a target schema through the Column Browser does not refresh the page.
TD-31165Job results are incorrect when a sample is collected and then the last transform step is undone.
TD-30979Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see  https://issues.apache.org/jira/browse/SPARK-18016.
TD-30857

Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.

NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

TD-30854When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.
TD-30820Some string comparison functions process leading spaces differently when executed on the Photon or the Spark running environment.
TD-30717No validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.
TD-27860When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.

New Known Issues

TicketComponentDescription
TD-40348Transformer Page

When loading a recipe in imported flow that references an imported Excel dataset, Transformer page displays Input validation failed: (Cannot read property 'filter' of undefined) error, and the screen is blank. 

Workaround: In Flow View, select an output object, and run a job. Then, load the recipe in the Transformer page and generate a new sample. For more information, see Import Flow.

TD-35714Installer/Upgrader/Utilities

After installing on Ubuntu 16.04 (Xenial), platform may fail to start with "ImportError: No module named pkg_resources" error.

Workaround: Verify installation of python-setuptools package. Install if missing.

TD-35644Compilation/ExecutionExtractpatterns for "HTTP Query strings" option doesn't work.
TD-35562Compilation/Execution

When executing Spark 2.3.0 jobs on S3-based datasets, jobs may fail due to a known incompatibility between HTTPClient:4.5.x and aws-java-jdk:1.10.xx. For details, see https://github.com/apache/incubator-druid/issues/4456.

Workaround: Use Spark 2.1.0 instead. In Admin Settings page, configure the spark.version property to 2.1.0. For more information, see Admin Settings Page.

For additional details on Spark versions, see Configure for Spark.

TD-35504Compilation/Execution

Clicking Cancel Job button generates a 405 status code error. Click Yes button fails to close the dialog.

Workaround: After you have clicked the Yes button once, you can click the No button. The job is removed from the page.

TD-35486Compilation/Execution

Spark jobs fail on LCM function that uses negative numbers as inputs.

Workaround: If you wrap the negative number input in the ABS function, the LCM function may be computed. You may have to manually check if a negative value for the LCM output is applicable.

TD-35483Compilation/Execution

Differences in how WEEKNUM function is calculated in Photon and Spark running environments, due to the underlying frameworks on which the environments are created.

  • Photon week 1 of the year: The week that contains January 1.
  • Spark week 1 of the year: The week that contains at least four days in the specified year.

For more information, see WEEKNUM Function.

TD-35478Compilation/Execution
The Spark running environment does not support use of multi-character delimiters for CSV outputs. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.

Workaround: You can switch your job to a different running environment or use single-character delimiters.

TD-34840Transformer PagePlatform fails to provide suggestions for transformations when selecting keys from an object with many of them.
TD-34119Compilation/ExecutionWASB job fails when publishing two successive appends.
TD-30855 Publish

Creating dataset from Parquet-only output results in "Dataset creation failed" error.

NOTE: If you generate results in Parquet format only, you cannot create a dataset from it, even if the Create button is present.

TD-30828Publish

You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job.

Workaround: Please wait until the previous job has been published before retrying to publish the failing job.

TD-27933Connectivity

For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the Photon running environment.

Workaround: Verify that you have inserted a new line at the end of every file-based source.


This page has no comments.