Release 4.1.1

This release of  introduces scheduling of dataset execution from within your flows, as well as a number of bug fixes and system improvements.

What's New

Admin, Install, & Config:


Transformer Page:

Changes to System Behavior

Single-file run_job action is deprecated for CLI

See Changes to the Command Line Interface.


Key Bug Fixes

TD-25615Error in flat-aggregation generation in Spark running environment.
TD-25438Deleting an upstream reference node does not propagate results correctly to the Transformer page.

TDE files generated by the TDE download option may fail to open in Tableau if column names are more than 124 characters in length.

NOTE: When you run the job, include a publishing option to publish to TDE format. When you export the generated results, this issue no longer appears in the output.

New Known Issues


When editing a schedule that was set for 0 minutes after the hour, the schedule is displayed to execute at 15 minutes after the hour.

Workaround: This bug is a display bug. The correct value is saved when the value is set to 0 for the schedule.

Release 4.1

This release of  includes the ability to share flows and a completely revamped Transformer page for a simpler, faster, and more consistent user experience. From the Transformer page, you can now collect ad-hoc samples using a wider variety of techniques. New integration and publishing options make the  broader in its reach throughout the enterprise. Read below for additional features and details.

What's New

Admin, Install, & Config:

Support for integration with MapR Hadoop clusters has been deprecated. The continues to support Cloudera and Hortonworks. For more information on other available options, please contact your representative.

NOTE: Support for CentOS 6.2.x and CentOS 6.3.x has been deprecated. Please upgrade to the latest CentOS 6.x release.




Transformer Page:




Changes to System Behavior

Hadoop Pig running environment is no longer available

As of Release 4.1, the Pig running environment is no longer available for execution of jobs. Implications:

Python UDFs are no longer available

With the removal of the Hadoop Pig running environment, Python user-defined functions are no longer available. 

NOTE: As of Release 4.1, all user-defined functions must be migrated to or created in Java. For more information, see Java UDFs.

For more information on migrating UDFs, see Changes to the User-Defined Functions.

Transform Editor has been removed

In Release 4.0.1 and earlier, you could type in  transformation steps as plain text in the Transform Editor as well as use the Transform Builder. 

In Release 4.1 and later, the Transform Editor has been removed, in favor of an enhanced version of the Transform Builder.

Tip: You can copy and paste raw commands into the Transformation/Choose a transformation textbox of the Transform Builder. The documentation still displays example transformation steps as text commands.

See Transform Builder.

Dependencies Browser has been replaced

In Release 4.0.1, you could explore dependencies between your datasets through the Dependencies Browser, which was accessible through a graph in the toolbar in the Transformer page. 

In Release 4.1, this browser has been replaced by the Dataset Navigator. In the Transformer page, click the drop-down next to the name of the current dataset. In the Dataset Navigator, you can browse the datasets through a list or flow view to locate another wrangled dataset to load.

In Release 4.2 and later, this browser has been renamed to the Recipe Navigator. See Recipe Navigator

Manual database installation is no longer required

Prior to Release 4.0, the databases had to be installed manually. 

In Release 4.0 and later, the databases are installed for you on the local server as part of the basic install process. For more information, see Install Databases.

If you need to re-install the databases, manual steps are still available. See Install Databases for PostgreSQL.

Head sample replaced by random sample on upgrade

In Release 4.0 and earlier, if your dataset used the initial rows (head) sample in the data grid, this sample is replaced by the random sample after the upgrade.

Tip: When the dataset is loaded in the Transformer page after upgrade, you can switch the sample back to the first rows sample. For more information, see Samples Panel.


Key Bug Fixes

TD-23787When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.
TD-22467Last active sample is not displayed during preview of multi-dataset operations.
TD-22128Cannot read multi-file Avro stream if data is greater than 500 KB.
TD-20796For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.

You cannot configure a publishing location to be a directory that does not already exist.

See Run Job Page.


splitrows transform allows splitting even if required parameter on is set to an empty value.

New Known Issues


When a pivot transform is applied, some column histograms may not be updated.

Workaround: Refresh the page.


Cannot publish to Cloudera Navigator due to 500 - Internal Server Error.

Workaround: The Cloudera Navigator integration is not supported in this release. If it has been enabled in your deployment in a prior release, it must be disabled. To disable, please set the following property value in platform configuration.

"clouderaNavigator.enabled": false,

Circular reference in schema of Avro file causes job in Spark running environment to fail. See

TD-20882ConnectivitySpark jobs based on relational sources fail if one or more columns is dropped from the source table.
TD-21836Transformer Page

Values in a newly collected sample do not appear in sorted order, even though a sort transform had been previously applied.

Workaround: You can re-apply the sort transform to the new sample. Some limitations apply. For more information, see Sort Transform.