Welcome to Release 5.1 of Trifacta® Wrangler Enterprise! This release includes a significant expansion in database support and connectivity with more running environment versions, such as Azure Databricks. High availability is now available on the Trifacta platform node itself.
Within the Transformer page, you should see a number of enhancements, including improved toolbars and column menus. Samples can be named.
Regarding operationalization of the platform, datasets with parameters can now accept Trifacta patterns for parameter specification, which simplifies the process of creating complex matching patterns. Additionally, you can swap out a static imported dataset for a dataset with parameters, which enables development on a simpler dataset before expanding to a more complex set of sources. Variable overrides can now be applied to scheduled job executions, and you can specify multiple variable overrides in Flow View.
The underlying language has been improved with a number of transformations and functions, including a set of transformations designed around preparing data for machine processing. Details are below.
Tip: For a general overview of the product and its capabilities, see Product Overview.
Support for PostgreSQL 9.6 for Trifacta databases.
NOTE: PostgreSQL 9.3 is no longer supported. PostgreSQL 9.3 is scheduled for end of life (EOL) in September 2018. For more information on upgrading, see Upgrade Databases for PostgreSQL.
Partial support for MySQL 5.7 for hosting the Trifacta databases.
NOTE: MySQL 5.7 is not supported for installation on Amazon RDS. See System Requirements.
Support for high availability on the Trifacta node. See Configure for High Availability.
NOTE: High availability on the platform is in Beta release.
Support for CDH 5.15.
NOTE: Support for CDH 5.12 has been deprecated. See End of Life and Deprecated Features.
Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements.
Support for integration with EMR 5.13, EMR 5.14, and EMR 5.15. See Configure for EMR.
NOTE: EMR 5.13 - 5.15 require Spark 2.3.0. See Configure for Spark.
- Support for integration with Azure Databricks. See Configure for Azure Databricks.
Support for WebAssembly, Google Chrome's standards-compliant native client.
NOTE: This feature is in Beta release.
NOTE: In a future release, use of PNaCl native client is likely to be deprecated.
Use of WebAssembly requires Google Chrome 68+. No additional installation is required. In this release, this feature must be enabled. For more information, see Miscellaneous Configuration.
- The Trifacta® platform defaults to using Spark 2.3.0 for Hadoop job execution. See Configure for Spark.
Enhanced import process for Excel files, including support for import from backend file systems. See Import Excel Data.
Support for DB2 connections. See Connection Types.
Support for HiveServer2 Interactive (Hive 2.x) on HDP 2.6. See Configure for Hive.
Support for Kerberos-delegated relational connections. See Enable SSO for Relational Connections.
NOTE: In this release, only SQL Server connections can use SSO. See Create SQL Server Connections.
- Performance caching for JDBC ingestion. See Configure JDBC Ingestion.
- Enable access to multiple WASB datastores. See Enable WASB Access.
- Support for use of Trifacta patterns in creating datasets with parameters. See Create Dataset with Parameters.
- Swap a static imported dataset with a dataset with parameters in Flow View. See Flow View Page.
- Specify overrides for multiple variables through Flow View. See Flow View Page.
- Variable overrides can also be applied to scheduled job executions. See Add Schedule Dialog.
- Join tool is now integrated into the context panel in the Transformer page. See Join Panel.
Improved join inference key model. See Join Panel.
- Patterns are available for review and selection, prompting suggestions, in the context panel.
Updated toolbar. See Transformer Toolbar.
- Enhanced options in the column menu. See Column Menus.
Support for a broader range of characters in column names. See Rename Columns.
- Samples can be named. See Samples Panel.
Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.
- Filter list of jobs by date. See Jobs Page.
- Rename columns using values across multiple rows. See Rename Columns.
|Bin column||Place numeric values into bins of equal or custom size for the range of values.|
|Scale column||Scale a column's values to a fixed range or to zero mean, unit variance distributions.|
|One-hot encoding||Encode values from one column into separate columns containing |
|Group By||Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.|
- Export dependencies of a job as a flow. See Flow View Page.
- Add quotes as CSV file publishing options. See Run Job Page.
- Specify CSV field delimiters for publishing. See Miscellaneous Configuration.
- Support for publishing Datetime values to Redshift as timestamps. See Redshift Data Type Conversions.
- UDFs are now supported for execution on HDInsight clusters. See Java UDFs.
- Enable deletion of jobs. See Miscellaneous Configuration.
- Upload an updated license file through the application. See Admin Settings Page.
Changes to System Behavior
Diagnostic Server removed from product
The Diagnostic Server and its application page have been removed from the product. This feature has been superseded by Tricheck, which is available to administrators through the application. For more information, see Admin Settings Page.
Wrangle now supports nested expressions
The Wrangle now supports nested expressions within expressions. For more information, see Changes to the Language.
RANDfunction without parameters now generates true random numbers.
- When the source information is not available, the
SOURCEROWNUMBERfunction can still be used. It returns null values in all cases.
- See Changes to the Language.
Key Bug Fixes
|TD-36332||Data grid can display wrong results if a sample is collected and dataset is unioned.|
|TD-36192||Canceling a step in recipe panel can result in column menus disappearing in the data grid.|
|TD-36011||User can import modified exports or exports from a different version, which do not work.|
|TD-35916||Cannot logout via SSO|
|TD-35899||A deployment user can see all deployments in the instance.|
|TD-35780||Upgrade: Duplicate metadata in separate publications causes DB migration failure.|
|TD-35644||Extractpatterns with "HTTP Query strings" option doesn't work|
|TD-35504||Cancel job throws 405 status code error. Clicking Yes repeatedly pops up Cancel Job dialog.|
|TD-35481||After upgrade, recipe is malformed at splitrows step.|
|TD-35177||Login screen pops up repeatedly when access permission is denied for a connection.|
Case-sensitive variations in date range values are not matched when creating a dataset with parameters.
NOTE: Date range parameters are now case-insensitive.
Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.
|TD-31327||Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.|
|TD-31252||Assigning a target schema through the Column Browser does not refresh the page.|
|TD-31165||Job results are incorrect when a sample is collected and then the last transform step is undone.|
|TD-30979||Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see https://issues.apache.org/jira/browse/SPARK-18016.|
Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
|TD-30854||When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.|
Some string comparison functions process leading spaces differently when executed on the Trifacta Photon or the Spark running environment.
|TD-30717||No validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.|
|TD-27860||When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.|
New Known Issues
After installing on Ubuntu 16.04 (Xenial), platform may fail to start with "ImportError: No module named pkg_resources" error.
Workaround: Verify installation of
|TD-35644||Compilation/Execution||Extractpatterns for "HTTP Query strings" option doesn't work.|
When executing Spark 2.3.0 jobs on S3-based datasets, jobs may fail due to a known incompatibility between HTTPClient:4.5.x and aws-java-jdk:1.10.xx. For details, see https://github.com/apache/incubator-druid/issues/4456.
Workaround: Use Spark 2.1.0 instead. In Admin Settings page, configure the
For additional details on Spark versions, see Configure for Spark.
Clicking Cancel Job button generates a 405 status code error. Click Yes button fails to close the dialog.
Workaround: After you have clicked the Yes button once, you can click the No button. The job is removed from the page.
Spark jobs fail on LCM function that uses negative numbers as inputs.
Workaround: If you wrap the negative number input in the ABS function, the LCM function may be computed. You may have to manually check if a negative value for the LCM output is applicable.
Differences in how WEEKNUM function is calculated in Trifacta Photon and Spark running environments, due to the underlying frameworks on which the environments are created.
For more information, see WEEKNUM Function.
The Spark running environment does not support use of multi-character delimiters for CSV outputs. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.
Workaround: You can switch your job to a different running environment or use single-character delimiters.
|TD-34840||Transformer Page||Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.|
|TD-34119||Compilation/Execution||WASB job fails when publishing two successive appends.|
Creating dataset from Parquet-only output results in "Dataset creation failed" error.
NOTE: If you generate results in Parquet format only, you cannot create a dataset from it, even if the Create button is present.
You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job.
Workaround: Please wait until the previous job has been published before retrying to publish the failing job.
For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the Trifacta Photon running environment.
Workaround: Verify that you have inserted a new line at the end of every file-based source.
This page has no comments.