Page tree

 

Contents:


Release 4.0.2

This release contains key bug fixes from Release 4.0.1.

What's New

No new features have been introduced.

Changes to System Behavior

None.

Key Bug Fixes

TicketDescription

TD-25182

Update NodeJS to 6.11.1

TD-25143

Spark job gets stuck for flow with header filter and multiple map transform expressions

TD-25090

Spark job OOM error when failing over frequently on a Resource Manager High Availability cluster

TD-25087

Dictionary URL is incorrect in CDF for Spark jobs

TD-25080

Spark jobs with timestamp source columns yield empty columns

TD-24965

Job fails with "Unary operator LexiconCheck not supported" in Spark

TD-24869

Corrupted DotZlib.chm file in 4.0.1 RPM

TD-24669

Nginx Request URI length default is too low.

TD-24464

'Python Error' when opening recipe with large number of columns and a nest

TD-24409

ArrayIndexOutOfBoundsException when UDF iterator reaches premature end

TD-24322

Nest transform creates a map with duplication keys.

TD-23921

In shared Hadoop cluster on Edge environment, valid relational connections do not appear in the GUI.
TD-23920

 

Support for equals sign (=) in output path.

TD-23904

Results of Spark job show missing values, even though recipe step replaces them with a value.

TD-23857

Type registry fails to initialize when webapp process is relaunched.

TD-23791

Spark PyMultiStringReplaceUdf UDF code throws NPE when processing nested fields.

TD-23780

Unexpected dates appear in CSV output on Trifacta Server job execution.

TD-23722

umask settings on output directories are not being respected for single-file output.

TD-23646

Adding a specific comment appears to invalidate earlier edit.

TD-23645

Spark unable to read recursive folders

TD-23578

Spark error doing split

TD-23507

No rows in random samples on CSM cluster.

TD-23459

Recipe upgraded from 3.1 to 3.2 becomes corrupted when new lookup is added.

 

TD-23457

Webapp, batch-job-runner scaling issues

TD-23358

Flow with many dependencies hangs for 6 hours and then fails when executed in Spark on AWS

TD-23276

Generating large CLI script blocks client access

TD-23111

Long latency when loading complex flow views

TD-23102

Recipe showing MISSING for some Lookups after upgrade

TD-23099

View Results button is missing on Job Cards even with profiling enabled

TD-22907

Spark yarn-app log dump feature requires to have read/execute permissions to log aggregation folder.

TD-22889

Extremely slow UI performance for some actions

TD-22796

Java UDFs must support initSchema method to initArgs.

TD-22313

Use Node.js cluster module for easy scaling of webapp and VFS services

TD-22291

Columns created from UDFs do not work with column browser, column menus, or both, and they cannot be shown or hidden.

New Known Issues

None.

Release 4.0.1

This release adds a few new features and addresses some known issues with the platform. 

What's New

Admin, Install, & Config:

NOTE: Integration with MapR is not supported for this release.

Language:

  • Apply optional quoteEscapeChar to identify escaped quote characters when splitting rows. 
  • See Changes to the Language.

Changes to System Behavior

Application timeout behavior more consistent

In Release 4.0, the web application session timeout was set to 60 minutes by default, which caused inconsistent behaviors. See TD-22675 below. 

In Release 4.0.1 and later, this session timeout was set to one month by default. This change returns the web application to the same setting as Release 3.2.1 and earlier.

NOTE: Beginning in Release 4.0, this setting is configurable. For more information on changing the session timeout, see Configure Application Limits.

Key Bug Fixes

TicketDescription
TD-22675Session timeout behavior is inconsistent. Application seems to have some functionality after timeout.
TD-22570After upgrade, some pre-upgrade jobs appear to point to deleted datasets.
TD-22388S3 authorization mechanism does not support Signature Version 2 in Asia-Pacific and EU.
TD-22220Dataset suddenly fails to load after upgrade from Release 3.2 because of type checking on an invalid recipe line.
TD-19830Editing a Join or Union transform that includes a reference dataset (not in the same flow) may result in the unintentional removal of that reference dataset from the flow.
TD-14131

splitrows transform does not work after a backslash.

This issue is fixed with the new quoteEscapeChar parameter for the splitrows transform. See Changes to the Language.

TD-5783Prevent two-finger scroll in data grid from stepping back in the browser's history on Mac OS.

New Known Issues

TicketComponentDescription
TD-22864Compilation/Execution

Connection for Redshift publishing uses its own AWS access key and secret, which may be different from the per-user or system credentials. If the Redshift connection does not have read access to the data, publication fails.

Workaround: Verify that the access key and secret for the Redshift connection has access to any source data that you wish to publish to Redshift.

 

Release 4.0

This release features a single page for managing your flows, a faster Spark-based running environment on the Trifacta node, and a number of new Wrangle functions and capabilities. Details are below. 

NOTE: Integration with MapR is not supported for this release.

What's New

Workspace:

  • The new flow detail page includes a visual representation of your flow and detailed information about its datasets and recipes. From the Flow View page, users can swap datasets and run jobs, too. See Flow View Page.
  • Send a copy of a flow to another user. See Send a Copy of a Flow. 

Transformer Page:

  • Column width settings now persist across transform steps, other actions, and user sessions. See Transformer Page
  • Users can now perform join and unions directly against imported datasets that contain schema information, such as Hive, JDBC, and Avro. 
  • Wrangle steps can now be displayed in natural language. See Data Grid Panel.
  • New column menu shortcuts allow you to quickly assemble recipe steps from menu selections, based on a column's data type. See Column Menus.
  • New column browser streamlines interactions involving multiple columns. See Column Browser Panel.
  • Default quick scan samples are now collected over more of the data source, the first 1 GB. Administrators can now modify this size. See Configure Application Limits.
  • For the Spark running environment, you can enable generation of random samples across the entire dataset. See Configure for Spark.

Profiling:

Ingestion:

  • New Custom SQL query options for Hive and relational sources enables pre-filtering of rows and columns by executing the SQL logic within the database to reduce data transfer time for faster overall performance. See Enable Custom SQL Query.  
  • Users can now import Hive views to be used as a source.  See Hive Browser.
  • Expand the list of file extensions that are permitted for upload. See Miscellaneous Configuration

Compilation/Execution:

  • New Spark v2.1.0-based running environment leverages in-memory speed to deliver overall faster execution times on jobs. See Configure Spark Running Environment.

    NOTE: As of Release 4.0, for new installs and upgrades, Spark is the default running environment for execution on the Hadoop cluster. Support for Hadoop Pig running environment is deprecated and in future releases will reach end-of-life. For more information, see Running Environment Options.

    NOTE: Python UDFs are not supported in the Spark running environment. Support for Python UDFs is deprecated and in a future release will reach end-of-life. For more information on migrating to using Java UDFs, see Changes to the User-Defined Functions.

  • You can disable the ability to run jobs on the Trifacta Server. See Running Environment Options.
  • User-specific properties can be passed to Pig or Spark for use during job execution. See Configure user-specific props for cluster jobs.
  • Default file publishing setting for CSV output is multiple output files when using a Hadoop running environment, resulting in better performance over large data volumes.

Language:

  • Window transform now supports use of aggregation functions. See Window Transform.
  • New NOW and TODAY functions.
  • New ROLLINGSUM function computes the rolling sum over a specified number of rows before and after the current row. See ROLLINGSUM Function.
  • New ROLLINGAVERAGE function computes rolling average over a specified window. See ROLLINGAVERAGE Function.
  • New ROWNUMBER function computes the row number for each row, based on order and optional grouping parameters. See ROWNUMBER Function.
  • New COUNTA function can be used to count the number of non-null values in a column based on order and grouping parameters. See COUNTA Function.
  • New COUNTDISTINCT function counts distinct number of values in a specified column. See COUNTDISTINCT Function.
  • Four new functions for testing conditional data validation: IFNULLIFMISMATCHED, IFMISSING, and IFVALID. See Type Functions.
  • New *IF functions for each available aggregation function. See Aggregate Functions.
  • For more information, see Changes to the Language.

APIs:

  • First release of publicly available APIs, which enable end-to-end operationalization of processing your datasets. See API Reference

CLI:

Admin, Install, & Config:

Changes in System Behavior

Changes to the Language:

  • set and settype transforms now work on multiple columns.
  • Recipe steps are now displayed in natural language format by default in the recipe panel and suggestion cards. 
  • Some functions have been renamed to conform to common function names. 
  • For more information, see Changes to the Language

Changes to the CLI:

  • The Jobs command line interface now supports job execution on the Spark running environment. See CLI for Jobs.

End of Life Features:

Key Bug Fixes

TicketDescription
TD-21006Photon fails to compress output file and is forced to restart on download.
TD-20736Publish to Redshift fails for single-file outputs.
TD-20524Join tool hangs due to mismatched data types.
TD-20344When Photon is enabled, no sample data is displayed when joins yield a data mismatch. 
TD-20176After Release 3.2.1 upgrade, data grid in the Transformer Page no longer displays any data in the sample, even though data is present in the pre-upgrade environment.
TD-20173NUMFORMAT string #.#0 fails to be converted to supported string format on upgrade, and recipe step fails validation. For more information, see Changes to the Language.
TD-19899Failed first job of jobgroup prevents datasets from showing up in flow.
TD-19852User can accept compressed formats for append publish action. 
TD-19678Column browser does not recognize when you place a checkmark next to the last column in the list.
TD-18836find function accepts negative values for the start index. These values are consumed but produce unexpected results.
TD-18746When Photon is enabled, previews in the data grid may take up to 30 seconds to dismiss.
TD-18538

Platform fails to start if Trifacta user for S3 access does not have the ListAllMyBuckets permission.

TD-18340When writing CSV outputs, the Spark running environment fails to recognize the defined escape character.
TD-17677Remove references to Zookeeper in the platform.
TD-16419Comparison functions added through Builder are changed to operators in recipe
TD-12283Platform cannot execute jobs on Pig that are sourced from S3, if OpenJDK is installed. 

New Known Issues

TicketComponentDescription
TD-22128Complication/Execution

Cannot read multi-file Avro stream if data is greater than 500 KB.

Workaround: Load files as independent datasets and union them together, or concatenate the files outside of the platform.

TD-21737Transformer Page

Cannot transform downstream datasets if an upstream dataset fails to contain a splitrows transform.

Workaround: Add a splitrows transform to the upstream dataset. See Splitrows Transform.

TD-20796Job Results Page

For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.

TD-19183Workspace

Merge function does not work with double-escaped values, and job fails in Pig. Example:

set col: column4 value: merge(['ms\\',column4])

Workaround: Add a dummy character to the original transform and then remove it. Example:

set col: column4 value: merge(['ms\\℗',column4])
replace col: column4 on: '℗' with: ''

As another alternative, you can execute the job in the Spark running environment.

This page has no comments.