Release 4.0.2

This release contains key bug fixes from Release 4.0.1.

What's New

No new features have been introduced.

Changes to System Behavior

None.

Key Bug Fixes

TicketDescription

TD-25182

Update NodeJS to 6.11.1

TD-25143

Spark job gets stuck for flow with header filter and multiple map transform expressions

TD-25090

Spark job OOM error when failing over frequently on a Resource Manager High Availability cluster

TD-25087

Dictionary URL is incorrect in CDF for Spark jobs

TD-25080

Spark jobs with timestamp source columns yield empty columns

TD-24965

Job fails with "Unary operator LexiconCheck not supported" in Spark

TD-24869

Corrupted DotZlib.chm file in 4.0.1 RPM

TD-24669

Nginx Request URI length default is too low.

TD-24464

'Python Error' when opening recipe with large number of columns and a nest

TD-24409

ArrayIndexOutOfBoundsException when UDF iterator reaches premature end

TD-24322

Nest transform creates a map with duplication keys.

TD-23921

In shared Hadoop cluster on Edge environment, valid relational connections do not appear in the GUI.
TD-23920

 

Support for equals sign (=) in output path.

TD-23904

Results of Spark job show missing values, even though recipe step replaces them with a value.

TD-23857

Type registry fails to initialize when webapp process is relaunched.

TD-23791

Spark PyMultiStringReplaceUdf UDF code throws NPE when processing nested fields.

TD-23780

Unexpected dates appear in CSV output on  job execution.

TD-23722

umask settings on output directories are not being respected for single-file output.

TD-23646

Adding a specific comment appears to invalidate earlier edit.

TD-23645

Spark unable to read recursive folders

TD-23578

Spark error doing split

TD-23507

No rows in random samples on CSM cluster.

TD-23459

Recipe upgraded from 3.1 to 3.2 becomes corrupted when new lookup is added.

 

TD-23457

Webapp, batch-job-runner scaling issues

TD-23358

Flow with many dependencies hangs for 6 hours and then fails when executed in Spark on AWS

TD-23276

Generating large CLI script blocks client access

TD-23111

Long latency when loading complex flow views

TD-23102

Recipe showing MISSING for some Lookups after upgrade

TD-23099

View Results button is missing on Job Cards even with profiling enabled

TD-22907

Spark yarn-app log dump feature requires to have read/execute permissions to log aggregation folder.

TD-22889

Extremely slow UI performance for some actions

TD-22796

Java UDFs must support initSchema method to initArgs.

TD-22313

Use Node.js cluster module for easy scaling of webapp and VFS services

TD-22291

Columns created from UDFs do not work with column browser, column menus, or both, and they cannot be shown or hidden.

New Known Issues

None.

Release 4.0.1

This release adds a few new features and addresses some known issues with the platform. 

What's New

Admin, Install, & Config:

NOTE: Integration with MapR is not supported for this release.

Language:

Changes to System Behavior

Application timeout behavior more consistent

In Release 4.0, the web application session timeout was set to 60 minutes by default, which caused inconsistent behaviors. See TD-22675 below. 

In Release 4.0.1 and later, this session timeout was set to one month by default. This change returns the web application to the same setting as Release 3.2.1 and earlier.

NOTE: Beginning in Release 4.0, this setting is configurable. For more information on changing the session timeout, see Configure Application Limits.

Key Bug Fixes

TicketDescription
TD-22675Session timeout behavior is inconsistent. Application seems to have some functionality after timeout.
TD-22570After upgrade, some pre-upgrade jobs appear to point to deleted datasets.
TD-22388S3 authorization mechanism does not support Signature Version 2 in Asia-Pacific and EU.
TD-22220Dataset suddenly fails to load after upgrade from Release 3.2 because of type checking on an invalid recipe line.
TD-19830Editing a Join or Union transform that includes a reference dataset (not in the same flow) may result in the unintentional removal of that reference dataset from the flow.
TD-14131

splitrows transform does not work after a backslash.

This issue is fixed with the new quoteEscapeChar parameter for the splitrows transform. See Changes to the Language.

TD-5783Prevent two-finger scroll in data grid from stepping back in the browser's history on Mac OS.

New Known Issues

TicketComponentDescription
TD-22864Compilation/Execution

Connection for Redshift publishing uses its own AWS access key and secret, which may be different from the per-user or system credentials. If the Redshift connection does not have read access to the data, publication fails.

Workaround: Verify that the access key and secret for the Redshift connection has access to any source data that you wish to publish to Redshift.

 

Release 4.0

This release features a single page for managing your flows, a faster Spark-based running environment on the , and a number of new  functions and capabilities. Details are below. 

NOTE: Integration with MapR is not supported for this release.

What's New

Workspace:

Transformer Page:

Profiling:

Ingestion:

Compilation/Execution:

Language:

APIs:

CLI:

Admin, Install, & Config:

Changes in System Behavior

Changes to the Language:

Changes to the CLI:

End of Life Features:

Key Bug Fixes

TicketDescription
TD-21006Photon fails to compress output file and is forced to restart on download.
TD-20736Publish to Redshift fails for single-file outputs.
TD-20524Join tool hangs due to mismatched data types.
TD-20344When Photon is enabled, no sample data is displayed when joins yield a data mismatch. 
TD-20176After Release 3.2.1 upgrade, data grid in the Transformer Page no longer displays any data in the sample, even though data is present in the pre-upgrade environment.
TD-20173NUMFORMAT string #.#0 fails to be converted to supported string format on upgrade, and recipe step fails validation. For more information, see Changes to the Language.
TD-19899Failed first job of jobgroup prevents datasets from showing up in flow.
TD-19852User can accept compressed formats for append publish action. 
TD-19678Column browser does not recognize when you place a checkmark next to the last column in the list.
TD-18836find function accepts negative values for the start index. These values are consumed but produce unexpected results.
TD-18746When Photon is enabled, previews in the data grid may take up to 30 seconds to dismiss.
TD-18538

Platform fails to start if for S3 access does not have the ListAllMyBuckets permission.

TD-18340When writing CSV outputs, the Spark running environment fails to recognize the defined escape character.
TD-17677Remove references to Zookeeper in the platform.
TD-16419Comparison functions added through Builder are changed to operators in recipe
TD-12283Platform cannot execute jobs on Pig that are sourced from S3, if OpenJDK is installed. 

New Known Issues

TicketComponentDescription
TD-22128Complication/Execution

Cannot read multi-file Avro stream if data is greater than 500 KB.

Workaround: Load files as independent datasets and union them together, or concatenate the files outside of the platform.

TD-21737Transformer Page

Cannot transform downstream datasets if an upstream dataset fails to contain a splitrows transform.

Workaround: Add a splitrows transform to the upstream dataset. See Splitrows Transform.

TD-20796Job Results Page

For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.

TD-19183Workspace

Merge function does not work with double-escaped values, and job fails in Pig. Example:

set col: column4 value: merge(['ms\\',column4])

Workaround: Add a dummy character to the original transform and then remove it. Example:

set col: column4 value: merge(['ms\\℗',column4])
replace col: column4 on: '℗' with: ''

As another alternative, you can execute the job in the Spark running environment.