Release 4.0.2

This release contains key bug fixes from Release 4.0.1.

What's New

No new features have been introduced.

Changes to System Behavior


Key Bug Fixes



Update NodeJS to 6.11.1


Spark job gets stuck for flow with header filter and multiple map transform expressions


Spark job OOM error when failing over frequently on a Resource Manager High Availability cluster


Dictionary URL is incorrect in CDF for Spark jobs


Spark jobs with timestamp source columns yield empty columns


Job fails with "Unary operator LexiconCheck not supported" in Spark


Corrupted DotZlib.chm file in 4.0.1 RPM


Nginx Request URI length default is too low.


'Python Error' when opening recipe with large number of columns and a nest


ArrayIndexOutOfBoundsException when UDF iterator reaches premature end


Nest transform creates a map with duplication keys.


In shared Hadoop cluster on Edge environment, valid relational connections do not appear in the GUI.


Support for equals sign (=) in output path.


Results of Spark job show missing values, even though recipe step replaces them with a value.


Type registry fails to initialize when webapp process is relaunched.


Spark PyMultiStringReplaceUdf UDF code throws NPE when processing nested fields.


Unexpected dates appear in CSV output on  job execution.


umask settings on output directories are not being respected for single-file output.


Adding a specific comment appears to invalidate earlier edit.


Spark unable to read recursive folders


Spark error doing split


No rows in random samples on CSM cluster.


Recipe upgraded from 3.1 to 3.2 becomes corrupted when new lookup is added.



Webapp, batch-job-runner scaling issues


Flow with many dependencies hangs for 6 hours and then fails when executed in Spark on AWS


Generating large CLI script blocks client access


Long latency when loading complex flow views


Recipe showing MISSING for some Lookups after upgrade


View Results button is missing on Job Cards even with profiling enabled


Spark yarn-app log dump feature requires to have read/execute permissions to log aggregation folder.


Extremely slow UI performance for some actions


Java UDFs must support initSchema method to initArgs.


Use Node.js cluster module for easy scaling of webapp and VFS services


Columns created from UDFs do not work with column browser, column menus, or both, and they cannot be shown or hidden.

New Known Issues


Release 4.0.1

This release adds a few new features and addresses some known issues with the platform. 

What's New

Admin, Install, & Config:

NOTE: Integration with MapR is not supported for this release.


Changes to System Behavior

Application timeout behavior more consistent

In Release 4.0, the web application session timeout was set to 60 minutes by default, which caused inconsistent behaviors. See TD-22675 below. 

In Release 4.0.1 and later, this session timeout was set to one month by default. This change returns the web application to the same setting as Release 3.2.1 and earlier.

NOTE: Beginning in Release 4.0, this setting is configurable. For more information on changing the session timeout, see Configure Application Limits.

Key Bug Fixes

TD-22675Session timeout behavior is inconsistent. Application seems to have some functionality after timeout.
TD-22570After upgrade, some pre-upgrade jobs appear to point to deleted datasets.
TD-22388S3 authorization mechanism does not support Signature Version 2 in Asia-Pacific and EU.
TD-22220Dataset suddenly fails to load after upgrade from Release 3.2 because of type checking on an invalid recipe line.
TD-19830Editing a Join or Union transform that includes a reference dataset (not in the same flow) may result in the unintentional removal of that reference dataset from the flow.

splitrows transform does not work after a backslash.

This issue is fixed with the new quoteEscapeChar parameter for the splitrows transform. See Changes to the Language.

TD-5783Prevent two-finger scroll in data grid from stepping back in the browser's history on Mac OS.

New Known Issues


Connection for Redshift publishing uses its own AWS access key and secret, which may be different from the per-user or system credentials. If the Redshift connection does not have read access to the data, publication fails.

Workaround: Verify that the access key and secret for the Redshift connection has access to any source data that you wish to publish to Redshift.


Release 4.0

This release features a single page for managing your flows, a faster Spark-based running environment on the , and a number of new  functions and capabilities. Details are below. 

NOTE: Integration with MapR is not supported for this release.

What's New


Transformer Page:







Admin, Install, & Config:

Changes in System Behavior

Changes to the Language:

Changes to the CLI:

End of Life Features:

Key Bug Fixes

TD-21006Photon fails to compress output file and is forced to restart on download.
TD-20736Publish to Redshift fails for single-file outputs.
TD-20524Join tool hangs due to mismatched data types.
TD-20344When Photon is enabled, no sample data is displayed when joins yield a data mismatch. 
TD-20176After Release 3.2.1 upgrade, data grid in the Transformer Page no longer displays any data in the sample, even though data is present in the pre-upgrade environment.
TD-20173NUMFORMAT string #.#0 fails to be converted to supported string format on upgrade, and recipe step fails validation. For more information, see Changes to the Language.
TD-19899Failed first job of jobgroup prevents datasets from showing up in flow.
TD-19852User can accept compressed formats for append publish action. 
TD-19678Column browser does not recognize when you place a checkmark next to the last column in the list.
TD-18836find function accepts negative values for the start index. These values are consumed but produce unexpected results.
TD-18746When Photon is enabled, previews in the data grid may take up to 30 seconds to dismiss.

Platform fails to start if for S3 access does not have the ListAllMyBuckets permission.

TD-18340When writing CSV outputs, the Spark running environment fails to recognize the defined escape character.
TD-17677Remove references to Zookeeper in the platform.
TD-16419Comparison functions added through Builder are changed to operators in recipe
TD-12283Platform cannot execute jobs on Pig that are sourced from S3, if OpenJDK is installed. 

New Known Issues


Cannot read multi-file Avro stream if data is greater than 500 KB.

Workaround: Load files as independent datasets and union them together, or concatenate the files outside of the platform.

TD-21737Transformer Page

Cannot transform downstream datasets if an upstream dataset fails to contain a splitrows transform.

Workaround: Add a splitrows transform to the upstream dataset. See Splitrows Transform.

TD-20796Job Results Page

For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.


Merge function does not work with double-escaped values, and job fails in Pig. Example:

set col: column4 value: merge(['ms\\',column4])

Workaround: Add a dummy character to the original transform and then remove it. Example:

set col: column4 value: merge(['ms\\℗',column4])
replace col: column4 on: '℗' with: ''

As another alternative, you can execute the job in the Spark running environment.