Welcome to Release 5.1 of ! This release includes a significant expansion in database support and connectivity with more running environment versions, such as Azure Databricks. High availability is now available on the node itself.
Within the Transformer page, you should see a number of enhancements, including improved toolbars and column menus. Samples can be named.
Regarding operationalization of the platform, datasets with parameters can now accept for parameter specification, which simplifies the process of creating complex matching patterns. Additionally, you can swap out a static imported dataset for a dataset with parameters, which enables development on a simpler dataset before expanding to a more complex set of sources. Variable overrides can now be applied to scheduled job executions, and you can specify multiple variable overrides in Flow View.
The underlying language has been improved with a number of transformations and functions, including a set of transformations designed around preparing data for machine processing. Details are below.
Tip: For a general overview of the product and its capabilities, see Product Overview.
Support for PostgreSQL 9.6 for .
NOTE: PostgreSQL 9.3 is no longer supported. PostgreSQL 9.3 is scheduled for end of life (EOL) in September 2018. For more information on upgrading, see Upgrade Databases for PostgreSQL.
Partial support for MySQL 5.7 for hosting the .
NOTE: MySQL 5.7 is not supported for installation on Amazon RDS. See System Requirements.
Support for high availability on the . See Configure for High Availability.
Support for CDH 5.15.
NOTE: Support for CDH 5.12 has been deprecated. See End of Life and Deprecated Features.
Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements.
Support for integration with EMR 5.13, EMR 5.14, and EMR 5.15. See Configure for EMR.
NOTE: EMR 5.13 - 5.15 require Spark 2.3.0. See Configure for Spark.
Support for WebAssembly, Google Chrome's standards-compliant native client.
NOTE: In a future release, use of PNaCl native client is likely to be deprecated.
Use of WebAssembly requires Google Chrome 68+. No additional installation is required. In this release, this feature must be enabled. For more information, see Miscellaneous Configuration.
Enhanced import process for Excel files, including support for import from backend file systems. See Import Excel Data.
Support for DB2 connections. See Connection Types.
Support for HiveServer2 Interactive (Hive 2.x) on HDP 2.6. See Configure for Hive.
Support for Kerberos-delegated relational connections. See Enable SSO for Relational Connections.
NOTE: In this release, only SQL Server connections can use SSO. See Create SQL Server Connections.
Improved join inference key model. See Join Panel.
Updated toolbar. See Transformer Toolbar.
Support for a broader range of characters in column names. See Rename Columns.
Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.
|Bin column||Place numeric values into bins of equal or custom size for the range of values.|
|Scale column||Scale a column's values to a fixed range or to zero mean, unit variance distributions.|
|One-hot encoding||Encode values from one column into separate columns containing |
|Group By||Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.|
The Diagnostic Server and its application page have been removed from the product. This feature has been superseded by Tricheck, which is available to administrators through the application. For more information, see Admin Settings Page.
The now supports nested expressions within expressions. For more information, see Changes to the Language.
RANDfunction without parameters now generates true random numbers.
SOURCEROWNUMBERfunction can still be used. It returns null values in all cases.
|TD-36332||Data grid can display wrong results if a sample is collected and dataset is unioned.|
|TD-36192||Canceling a step in recipe panel can result in column menus disappearing in the data grid.|
|TD-36011||User can import modified exports or exports from a different version, which do not work.|
|TD-35916||Cannot logout via SSO|
|TD-35899||A deployment user can see all deployments in the instance.|
|TD-35780||Upgrade: Duplicate metadata in separate publications causes DB migration failure.|
|TD-35644||Extractpatterns with "HTTP Query strings" option doesn't work|
|TD-35504||Cancel job throws 405 status code error. Clicking Yes repeatedly pops up Cancel Job dialog.|
|TD-35481||After upgrade, recipe is malformed at splitrows step.|
|TD-35177||Login screen pops up repeatedly when access permission is denied for a connection.|
Case-sensitive variations in date range values are not matched when creating a dataset with parameters.
Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
|TD-31327||Unable to save dataset sourced from multi-line custom SQL on dataset with parameters.|
|TD-31252||Assigning a target schema through the Column Browser does not refresh the page.|
|TD-31165||Job results are incorrect when a sample is collected and then the last transform step is undone.|
|TD-30979||Transformation job on wide dataset fails on Spark 2.2 and earlier due to exceeding Java JVM limit. For details, see https://issues.apache.org/jira/browse/SPARK-18016.|
Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
|TD-30854||When creating a new dataset from the Export Results window from a CSV dataset with Snappy compression, the resulting dataset is empty when loaded in the Transformer page.|
Some string comparison functions process leading spaces differently when executed on the or the Spark running environment.
|TD-30717||No validation is performed for Redshift or SQL DW connections or permissions prior to job execution. Jobs are queued and then fail.|
|TD-27860||When the platform is restarted or an HA failover state is reached, any running jobs are stuck forever In Progress.|
After installing on Ubuntu 16.04 (Xenial), platform may fail to start with "ImportError: No module named pkg_resources" error.
|TD-35644||Compilation/Execution||Extractpatterns for "HTTP Query strings" option doesn't work.|
When executing Spark 2.3.0 jobs on S3-based datasets, jobs may fail due to a known incompatibility between HTTPClient:4.5.x and aws-java-jdk:1.10.xx. For details, see https://github.com/apache/incubator-druid/issues/4456.
For additional details on Spark versions, see Configure for Spark.
Clicking Cancel Job button generates a 405 status code error. Click Yes button fails to close the dialog.
Spark jobs fail on LCM function that uses negative numbers as inputs.
Differences in how WEEKNUM function is calculated in and Spark running environments, due to the underlying frameworks on which the environments are created.
For more information, see WEEKNUM Function.
The Spark running environment does not support use of multi-character delimiters for CSV outputs. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.
|TD-34840||Transformer Page||Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.|
|TD-34119||Compilation/Execution||WASB job fails when publishing two successive appends.|
Creating dataset from Parquet-only output results in "Dataset creation failed" error.
You cannot publish ad-hoc results for a job when another publishing job is in progress for the same job.
For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the running environment.