This release of introduces scheduling of dataset execution from within your flows, as well as a number of bug fixes and system improvements.
Admin, Install, & Config:
Support for Cloudera 5.12. See Supported Deployment Scenarios for Cloudera.
NOTE: Support for Cloudera 5.9 has been deprecated. For more information, see End of Life and Deprecated Features.
Single-file run_job action is deprecated for CLI
See Changes to the Command Line Interface.
|TD-25615||Error in flat-aggregation generation in Spark running environment.|
|TD-25438||Deleting an upstream reference node does not propagate results correctly to the Transformer page.|
TDE files generated by the TDE download option may fail to open in Tableau if column names are more than 124 characters in length.
When editing a schedule that was set for
This release of includes the ability to share flows and a completely revamped Transformer page for a simpler, faster, and more consistent user experience. From the Transformer page, you can now collect ad-hoc samples using a wider variety of techniques. New integration and publishing options make the broader in its reach throughout the enterprise. Read below for additional features and details.
Admin, Install, & Config:
Support for integration with MapR Hadoop clusters has been deprecated. The continues to support Cloudera and Hortonworks. For more information on other available options, please contact your representative.
NOTE: Support for CentOS 6.2.x and CentOS 6.3.x has been deprecated. Please upgrade to the latest CentOS 6.x release.
Support for Cloudera 5.11. See Supported Deployment Scenarios for Cloudera.
NOTE: Support for CDH 5.8 has been deprecated. See End of Life and Deprecated Features.
Support for HDP 2.6. See Supported Deployment Scenarios for Hortonworks.
NOTE: Support for HDP 2.4 has been deprecated. See End of Life and Deprecated Features.
Support for large-scale relational sources when executing jobs on Hadoop. See Enable Relational Connections.
Per-file import settings including file encoding type, automatic structure detection. See Import Data Page.
NOTE: The list of supported encoding types has changed. See Configure Global File Encoding Type.
Read/write support for Snappy compression. See Supported File Formats.
NOTE: Integration with fully compressed Hadoop clusters requires additional configuration. See Enable Integration with Compressed Clusters.
As of Release 4.1, the Pig running environment is no longer available for execution of jobs. Implications:
pigrunning environment must be updated. See Changes to the Command Line Interface.
With the removal of the Hadoop Pig running environment, Python user-defined functions are no longer available.
NOTE: As of Release 4.1, all user-defined functions must be migrated to or created in Java. For more information, see Java UDFs.
For more information on migrating UDFs, see Changes to the User-Defined Functions.
In Release 4.0.1 and earlier, you could type in transformation steps as plain text in the Transform Editor as well as use the Transform Builder.
In Release 4.1 and later, the Transform Editor has been removed, in favor of an enhanced version of the Transform Builder.
Tip: You can copy and paste raw commands into the Transformation/Choose a transformation textbox of the Transform Builder. The documentation still displays example transformation steps as text commands.
See Transform Builder.
In Release 4.0.1, you could explore dependencies between your datasets through the Dependencies Browser, which was accessible through a graph in the toolbar in the Transformer page.
In Release 4.1, this browser has been replaced by the Dataset Navigator. In the Transformer page, click the drop-down next to the name of the current dataset. In the Dataset Navigator, you can browse the datasets through a list or flow view to locate another wrangled dataset to load.
In Release 4.2 and later, this browser has been renamed to the Recipe Navigator. See Recipe Navigator.
Prior to Release 4.0, the databases had to be installed manually.
In Release 4.0 and later, the databases are installed for you on the local server as part of the basic install process. For more information, see Install Databases.
If you need to re-install the databases, manual steps are still available. See Install Databases for PostgreSQL.
In Release 4.0 and earlier, if your dataset used the initial rows (head) sample in the data grid, this sample is replaced by the random sample after the upgrade.
Tip: When the dataset is loaded in the Transformer page after upgrade, you can switch the sample back to the first rows sample. For more information, see Samples Panel.
|TD-23787||When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.|
|TD-22467||Last active sample is not displayed during preview of multi-dataset operations.|
|TD-22128||Cannot read multi-file Avro stream if data is greater than 500 KB.|
|TD-20796||For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.|
You cannot configure a publishing location to be a directory that does not already exist.
See Run Job Page.
When a pivot transform is applied, some column histograms may not be updated.
Cannot publish to Cloudera Navigator due to 500 - Internal Server Error.
Circular reference in schema of Avro file causes job in Spark running environment to fail. See https://issues.apache.org/jira/browse/AVRO-1285.
|TD-20882||Connectivity||Spark jobs based on relational sources fail if one or more columns is dropped from the source table.|
Values in a newly collected sample do not appear in sorted order, even though a sort transform had been previously applied.