This release includes numerous bug fixes, support for new distributions, and new capabilities, such as the option to disable initial type inference on schematized sources.
- Enable or disable initial type inference for schematized sources at global or individual connection level, or for individual dataset sources. See Configure Type Inference.
- Support for publishing Datetime data to Hive Datetime or Timestamp data types. See Hive Data Type Conversions.
Install, Config & Admin:
Support for Ubuntu 16.04. See System Requirements.
Support for Cloudera 5.13. See Supported Deployment Scenarios for Cloudera.
NOTE: Support for CDH 5.10 has been deprecated. Please upgrade your Hadoop cluster. For more information, see End of Life and Deprecated Features.
Changes to System Behavior
Key Bug Fixes
|TD-27799||DATEDIF function does not work for inputs that are functions returning date values.|
|TD-27703||Spark job fails with scala.MatchError|
|TD-24121||When publishing multi-part files, different permissions are written to the parent directory when job was executed on Hadoop or Photon.|
New Known Issues
|TD-27950||Transformer Page - Tools|
When you join with an imported dataset not in your flow and it takes longer than expected to collect its initial sample, you may encounter the following error:
Workaround: Create a recipe off of the imported dataset and then join to the recipe, which is the preferred method of joining. For more information, see Join Page.
Ubuntu 16 install for Azure: supervisord complains about "missing" Python packages.
Workaround: These packages are present but lack appropriate permissions. A workaround is documented as part of the installation and configuration process. For more information, see "Workaround for missing Python packages," see Configure for Azure.
This release introduces deployment management, which enables separation of development and production flows and their related jobs. Develop your flows in a Dev environment and, when ready, push to Prod, where they can be versioned and triggered for production execution. Additionally, you can create and manage all of your connections through the new Connections page. A revamped flow view streamlines object interactions and now supports starting and stopping of jobs without leaving flow view.
- Release 4.2 also supports installation of the platform on Amazon EC2 instances and integration with EMR as well as installation for Microsoft Azure.
Details are below.
- Manage the lifecycle process of flows across multiple platform instances, building in Dev and publishing to Prod. See Overview of Deployment Management.
- Manage versions deployed into Production. See Deployments Page.
New objects in Flow View and better organization of them. See Flow View Page.
NOTE: Wrangled datasets are no longer objects in the Trifacta platform. Their functionality has been moved to other and new objects. For more information, see Changes to the Object Model.
See Object Overview.
- Create, manage, and share connections through the new Connections page. See Connections Page.
- Sharing of connections and flows is enabled by default. See Configure Sharing.
- Import and export flows from your platform instance.
- Cancel jobs in progress.
- Perform cross joins between datasets. See Join Page.
- Cut, copy, and paste columns and column values. See Column Browser Panel.
- Rename multiple columns in a single transformation step. See Rename Columns.
- In Column Details, you can select a phone number or date pattern to generate suggestions for standardizing the values in the column to a single format. See Column Details Panel.
- Personalized suggestions presented based on your previous usage.
- Browse and select patterns for re-use from your recent history. See Pattern History Panel.
Upload your own avatar image. See User Profile Page.
NOTE: This feature may need to be enabled. See Miscellaneous Configuration.
- Install from Amazon Marketplace via AMI into a deployed EC2 instance.
- Leverage IAM roles to manage permissions for the Trifacta platform deployed on an EC2 instance. See Configure for EC2 Role-Based Authentication.
- Install and integrate with Amazon Elastic MapReduce (EMR). See Configure for EMR.
- Install for Microsoft Azure and integrate with HDInsight. See Install from Azure Marketplace.
- Redshift improvements:
- Publish directly to Tableau Server. See Run Job Page.
- For more information on creating the connection, see Create Tableau Server Connections.
- New string comparison functions.
- New SUBSTITUTE function replaces string literals or patterns with a new literal or column value.
- See Changes to the Language.
- Expanded set of encoding types supported for file import. See Configure Global File Encoding Type.
- Improved performance when initializing jobs and in Flow View for complex flows.
Changes to System Behavior
New session duration parameter and default value
For technical reasons, the name and default value of the following parameter has been changed in Release 4.2.
|Affected Releases||Parameter Name||Default Value||Max Value|
|Release 4.2 and later|
|Release 4.1.1 and earlier|
NOTE: Upgrading customers have the new configuration setting automatically set to the default:
10080 minutes (one week). You must make adjustments as needed.
For more information on changing this parameter value, see Configure Application Limits.
/docs endpoint is removed
In Release 4.0, the
/docs endpoint was deprecated from use. This endpoint displayed a documentation page containing information on Wrangle
language, the command line interface, and Trifacta patterns.
In Release 4.2, this endpoint has been removed from the platform. Content has been superseded by the following content:
For more information on features that have been deprecated or removed, see End of Life and Deprecated Features.
s3n is no longer supported
If you are integrating with S3 sources, the platform now requires use of the s3a protocol. The s3n protocol is no longer supported.
No configuration changes in the Trifacta platform are needed. See Enable S3 Access.
Key Bug Fixes
|TD-27748||Direct publish to Hive fails on wide datasets due to Avro limitations.|
SQL Server Database timing out with long load times.
|TD-27197||Column histogram does not update after adding |
|TD-27127||Send a Copy tab in Flow View sharing does not include all available users.|
|TD-27055||Job run on flow with complex recipes fails on Hadoop but succeeds on Photon.|
|TD-26837||Creating custom dictionaries fails on S3 backend datastore.|
|TD-26388||Orphaned bzip2 processes owned by the platform user accumulate on the node.|
|TD-26041||When editing a schedule that was set for 0 minutes after the hour, the schedule is displayed to execute at 15 minutes after the hour.|
|TD-25903||Overflow error when ROUND function is applied to large values.|
|TD-25733||Attempting a union of 12 datasets crashes UI.|
|TD-25709||Spark jobs fail if HDFS path includes commas.|
New Known Issues
DATEDIF function does not work for inputs that are functions returning date values.
Workaround: Write function returning your date values to a new column. Then, apply DATEDIF function using that column as a new input.
|TD-27703||Compilation/Execution||Spark job fails with scala.MatchError|
|TD-26069||Compilation/Execution||Photon evaluates |
|TD-24121||Compilation/Execution||When publishing multi-part files, different permissions are written to the parent directory when job was executed on Hadoop or Photon.|
This page has no comments.