Page tree

Trifacta Dataprep



Contents:

   

Contents:


These release notes apply to the following product tiers of Dataprep by Trifacta®:

  • Dataprep by Trifacta Enterprise Edition
  • Dataprep by Trifacta Professional Edition
  • Dataprep by Trifacta Starter Edition
  • Dataprep by Trifacta Premium
  • Dataprep by Trifacta Standard
  • Dataprep by Trifacta Legacy

Tip: You can see your product tier in the Trifacta application. Select Help menu > About Cloud Dataprep.

For more information, see Product Editions.

For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.

June 21, 2022

Release 9.3

What's New

Run Trifacta Photon jobs in your VPC:

You can execute Trifacta Photon jobs within your enterprise's virtual private cloud (VPC). 

Feature Availability: This feature may not be available in all product editions.

In-VPC execution must be enabled by an administrator. 

NOTE: This feature requires additional configuration of the Google Cloud Platform through the gcloud command line tools.

For more information, see Run Dataprep in Your VPC.

Run Connectivity jobs in your VPC:

Jobs related to ingesting, sampling, and publishing data for relational databases can now be executed within your enterprise's virtual private cloud (VPC).

NOTE: This feature is in Beta release.

Feature Availability: This feature may not be available in all product editions.

In-VPC execution must be enabled by an administrator.

NOTE: This feature requires additional configuration of the Google Cloud Platform through the gcloud command line tools.

For more information, see Run Dataprep in Your VPC.

Expandable left nav bar:

The new left navigation bar can be expanded to display full-text options for each menu item. Collapse it to reclaim the screen area. Available options remain consistent. See Home Page.

Configure range joins:

Specify ranges of key values in your joins.

NOTE: This feature may need to be enabled by an administrator. See Dataprep Project Settings Page.

For more information, see Configure Range Join.

Billing:

Edit credit card and billing information and review billing history and invoices.

Feature Availability: This feature may not be available in all product editions.

For more information, see Plans and Billing Page.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Connectivity:

Transformer page:

Improved performance of the Transformer page through asynchronous loading of initial samples.

Changes in System Behavior

Generate an initial sample:

When generating an initial sample from a set of files in a directory, the maximum number of files that can be read is now limited to 10 files by default. For more information on changing the maximum number, see Dataprep Project Settings Page.

Quickbooks Online connections are disabled:

This feature has been disabled due to technical issues. It will be re-enabled when these issues are resolved in a future release.

Deprecated

None.

Key Bug Fixes

None.

New Known Issues

None.

May 13, 2022

Release 9.2 - push 2

Changes in System Behavior

Intelligent caching:

Due to technical issues, the intelligent caching of recipe steps feature for performance improvements has been disabled.

NOTE: This feature is in Beta release.

When the technical issues are addressed, this feature will be enabled.

April 20, 2022

Release 9.2

What's New

Lock/unlock column data type:

You can now lock or unlock a column's data type. When the data type is locked, the Trifacta application no longer attempts to infer the column's data type when subsequent recipe steps are applied. 

Tip: You can unlock the individual 's column data type through column menu. To the left of the column name, you can click the icon and select Automatically update to change the column's data type. For more information, see Column Menus.

Tip: As an early step in your recipe, you can use the Advanced column selector in the Change column data type transformation to specify locking of the data types for all columns.

For more information, see Change Column Data Type.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Connectivity:

  • Google Analytics are now generally available and supported on Dataprep by Trifacta.

Publish Array data type as arrays to BigQuery:

You can now publish  Dataprep by Trifacta®  Array data type as BigQuery arrays. 

Parameterize data in hidden folders:

Feature Availability: This feature may not be available in all product editions.

Optionally, you can scan hidden folders for wildcard- or pattern-based matches when building your parameterized imported datasets. 

Tip: This capability can be useful for creating imported datasets from profiles generated as part of job runs. These profiles are stored in the .profiler hidden directory where the job results are published.

NOTE: This feature is disabled by default. It can be enabled by an administrator.


NOTE: Scanning hidden folders may impact performance. For existing imported datasets with parameters, you should enable the inclusion of hidden folders on individual datasets and run a test job to evaluate impact.

For more information on including hidden files, see Dataprep Project Settings Page.

For more information on creating datasets with parameters from files, see Parameterize Files for Import.

Simplified permissions for publishing to BigQuery:

By default,  Dataprep by Trifacta requires that the bigquery.datasets.create permission is enabled for each user of the product to run Dataflow jobs on BigQuery data sources. In some environments, this permission cannot be provided to users, and these Dataflow jobs fail.

As a workaround, you can provide to  Dataprep by Trifacta a pre-existing BigQuery dataset, in which intermediate query results can be stored. When this dataset is provided to the Trifacta application, temporary tables are created within it as part of Dataflow job execution, and the bigquery.datasets.create permission is not required. 

NOTE: This BigQuery dataset must be created outside of Dataprep by Trifacta by your BigQuery administrator and must be located in the same region as your BigQuery source tables.

For more information on configuring the BigQuery temp dataset for the Trifacta application, see Dataprep Project Settings Page.

Documentation:

Published documented solution for integrating  Dataprep by Trifacta with your Virtual Private Cloud Service Controls (VPC SC). For more information on this integration, see Configure VPC-SC Perimeter.

Changes in System Behavior

Set column data type transformation locks the column's type by default:

Starting in this release, the column data type is locked by default when you change the column data type.

NOTE: This change in behavior does not affect recipe steps that were defined before this release. Column data types continue to be re-inferred after those recipe steps. For those steps, you can edit them and mark them as locking the data type, if preferred.

If required, you can unlock the column's data type. For more information, see Change Column Data Type.


Connectivity:

  • The Google Analytics connection type now supports the UniversalAnalytics schema.

    NOTE: Previously, this schema was called GoogleAnalytics by the driver vendor. You may need to update your custom SQL queries to reference this new schema name.

Generate an initial sample:

When generating an initial sample from a set of files in a directory, the maximum number of files that can be read is now limited to 50.

  • Previously, the Trifacta application read files until either 10MB of data or all matching files had been scanned.
  • This change is to limit the number of files that must be read for various operations in the Transformer page. It only applies to generating the initial sample type. Other sampling types, such as random sample, can scan the full set of files.

As needed, an administrator can change this maximum limit.

Performance:

The intelligent caching of recipe steps feature for performance improvements has been made available again. The issues that required removing it from the platform have been addressed.

NOTE: This feature is in Beta release.

This feature can be enabled by an administrator.

For more information, see Dataprep Project Settings Page.

Email notifications:

In a future release, the setting for email notifications based on job success will default to Default (Any Jobs) at the project or workspace level and at the flow level. This change means that the user who executes a job and others who have access to the flow receive, by default, an email notification whenever a job executes for flows where email notification settings have never been modified. As part of this change, each email will contain a richer set of information about the job that was executed. 

If needed, this new default setting can be modified:

Deprecated

None.

Key Bug Fixes

TicketDescription
TD-70522Cannot import converted files such as Excel, PDF, or JSON through SFTP connections.
TD-69279Test Connection button fails a ValidationFailed error when editing a working connection configured with SSH tunneling.
TD-66185Flatten transformation cannot handle multi-character delimiters.

New Known Issues

TicketDescription
TD-70326

A newer version of the SDK family exists and updating is recommended warning appears for Apache Beam in Dataflow job screen.

Workaround: The Apache Beam upgrade to address this issue is in active planning and execution. This issue has no impact on the execution of Dataflow jobs. When the upgrade is complete, the message will be gone.

TD-69813

Dataprep by Trifacta array type columns in datasets that were imported before Release 9.2 are still published as String type.

Workaround: You can create a new imported dataset from the same source to publish those columns as BigQuery arrays.



March 15, 2022

Release 9.1

What's New

Encryption:

  • Support for use of customer-managed encryption keys (CMEK) during Dataflow job execution. Trifacta application can also check for use of CMEKs before writing results to BigQuery or  Cloud Storage.

    Private Preview: This feature is disabled by default. For more information on enabling this feature in your project, please contact Alteryx Support.

    Feature Availability: This feature may not be available in all product editions.

JavaScript User Defined Functions:

  • Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.

    NOTE: This feature is in Beta release.

    Feature Availability: This feature may not be available in all product editions.

Connectivity:

  • Enable connectivity between the Trifacta application and your cloud databases using SSH tunneling is generally available with this release.

    Tip: This feature is now generally available.

    NOTE: For this release, SSH tunneling can be enabled on the following connection types: Oracle Database , PostgreSQL , MySQL , and Microsoft SQL Server .

    For more information, see Configure SSH Tunnel Connectivity.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Job execution:

The Trifacta application can check for changes to your dataset's schemas before jobs are executed and optionally halt job execution to prevent data corruption.

  • These options can be configured by a project administrator.
    Feature Availability: This feature may not be available in all product editions.

    For more information, see
    Dataprep Project Settings Page.

Tip: Schema validation can be overridden for individual jobs. For more information, see Run Job Page.

Dataset configuration:

For an imported dataset, you can configure settings through a new interface, including column names and column data types to use in the Trifacta application.

NOTE: This experimental feature is intended for demonstration purposes only. This feature may be modified or removed from the Google Cloud without warning in a future release. It should not be deployed in a production environment.

NOTE: This feature is part of a larger effort to improve how data is imported into the Trifacta application. This feature must be enabled by a workspace administrator.

Sample Job IDs:

When a sample is collected, a job ID is generated and displayed in the Trifacta application.   These job IDs enable you to identify the sample jobs.

Import:

For long-loading Parquet datasets, you can monitor the ingest process as you continue your work.

NOTE: This feature is in Beta release.

For more information, see Flow View Page.

Changes in System Behavior

Publishing:

Beginning in this release, you can publish  Dataprep by Trifacta Array type columns to BigQuery as BigQuery arrays for Trifacta primitive data types. Arrays containing non-primitive data types continue to be published as String values.

Performance:

A recent release introduced improved performance through intelligent caching of recipe steps.

  • This feature was released as a Beta feature.
  • Due to some recently discovered issues, this feature has been disabled for the time being. It cannot be enabled by a workspace administrator at this time.

    NOTE: If this Beta feature had been enabled in your environment, you may experience a reduction in performance when moving between recipe steps in the Transformer page.

  • The feature will be re-enabled in a future release.

Deprecated

None.

Key Bug Fixes

TicketDescription
TD-60881For ADLS datasets, parameter indicators in Flow View are shifted by one character.

New Known Issues

None.

February 9, 2022

Release 9.0

What's New

JavaScript User Defined Functions:

Create user-defined functions (UDFs) in JavaScript and upload them to your project for use in your recipe steps. JavaScript UDFs enable users to create customized and consistent functions to meet their specific requirements.

Feature Availability: This feature may not be available in all product editions.

This feature is disabled by default. For more information on enabling JavaScript UDFs in your project, please contact Alteryx Support.

For more information, see JavaScript UDFs.

When enabled, JavaScript UDFs are defined through the Library page. For more information, see User Defined Functions Page.

Connectivity:

Build connections to accessible REST API endpoints.

This feature is disabled by default. For more information about enabling REST API connectivity in your environment, please contact Alteryx Support.

Feature Availability: This feature may not be available in all product editions.

For more information, see REST API Connections.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Dataset Schema Refresh:

You can now refresh your imported datasets with the current schema information from the source file or table. Schema refresh enables you to capture any changes to the columns in your dataset.

Feature Availability: This feature may not be available in all product editions.

Changes in System Behavior

None.

Deprecated

None.

Key Bug Fixes

TicketDescription
TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

New Known Issues

None.

January 27, 2022

Release 8.11 - push 2

What's New

None.

Changes in System Behavior

None.

Deprecated

None.

Key Bug Fixes

TicketDescription
TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

New Known Issues

None.

January 20, 2022

Release 8.11

What's New

BigQuery Running Environment:

Beginning in this release, sampling jobs can be executed in BigQuery.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Session Management:

You can view the current and recent sessions of the Trifacta application. You can review the devices that are authorized and revoke any unfamiliar devices.

Performance:

  • Improved performance during design time through intelligent caching of recipe steps. 
    NOTE: This feature is in Beta release.

  • Improvements in job execution performance, due to skipping some output validation steps for file-based outputs.

    NOTE: When scheduled or API jobs are executed, no validations are performed of any writesettings objects. Issues with these objects may cause failures during transformation or publishing stages of job execution.

Changes in System Behavior

Sample sizes can be increased up to 40MB

Feature Availability: This feature may not be available in all product editions.

Prior to this release, the size of a sample was capped at 10MB. This size represented:

  • the actual size of the sample object stored in the base storage layer
  • the default maximum size of the sample displayed in the Trifacta application. This sample size can be reduced from 10MB, if needed.

Beginning in this release:

  • The actual size of the stored sample has increased to 40MB.

    NOTE: On backend storage, sample sizes are now four times larger than in previous releases. For datasources that require decompression or conversion, actual storage sizes may exceed this 40 MB limit.

  • The size of the sample displayed for a recipe can be configured to be up to 40MB in size by individual users.

For more information, see Change Recipe Sample Size.

Data type mismatches can now be written out in CSV format

Beginning in this release, for CSV outputs mismatched values are written as regular values by default. In prior releases, mismatched values were written as null values in CSV outputs.

See Improvements to the Type System.

Deprecated

None.

Key Bug Fixes

None.

New Known Issues

TicketDescription
TD-68162

Flow parameters cannot be displayed or edited in the Transformer page and cannot embedded in recipe steps.

Workaround: To edit your flow parameters, select Parameters from the Flow View context menu.

NOTE: There is no current workaround for embedding in recipe steps. While your existing parameters should continue to work at execution time, avoid changing names of your flow parameters or editing recipe steps in which they are referenced. New flow parameters cannot be used in recipes at this time.


December 7, 2021

Release 8.10

What's New

User management:

Introducing user and role management. In the Admin console in the Trifacta application, you can enable and disable user access and determine access levels to individual object types, such as flows, connections, and plans. 

Feature Availability: This feature may not be available in all product editions.

Connectivity:

  • Enable connectivity between the Trifacta application and your cloud databases using SSH tunneling.

    NOTE: In this release, this feature must be enabled by request. For more information, please contact Alteryx Support.


    NOTE: SSH tunneling is enabled on a per-connection basis. For this release, SSH tunneling can be enabled on the following connection types: Oracle Database , PostgreSQL , MySQL , and Microsoft SQL Server .

    For more information, see Configure SSH Tunnel Connectivity.

  • Early Preview (read-only) connections available with this release:
    Feature Availability: This feature may not be available in all product editions.

Session Management:

You can view the current and recent sessions for your account in the Trifacta application. As needed, you can revoke any unfamiliar devices or sessions. For more information, see Sessions Page.

Changes in System Behavior

Ingestion:

Maximum permitted record length has been increased from 1 MB to 20 MB. For more information, see Working with JSON v2.

Split transformation:

When splitting a column based on positions, the positions no longer need to be listed in numeric order. See Changes to the Language.

Deprecated

None.

Key Bug Fixes

None.

New Known Issues

TicketDescription
TD-66185

Flatten transformation cannot handle multi-character delimiters.

Workaround: When a column of arrays is flattened using the Trifacta Photon running environment, multi-character String delimiters are not supported. As a workaround, you can create a regular expression delimiter, as in the following, which uses either left bracket or right bracket as the delimiter:

/[|]/


November 23, 2021

Release 8.9

What's New

Refer and Earn:

Beginning in this release, for every new sign-up you refer, you get a reward of your choice. For more information, see Referrals Page.

Self-serve upgrades from your free trial

Through the trial expiration page, you can review and select the preferred plan that suits you. Provide the required card details through the application and subscribe to your preferred plan.

Feature Availability: This feature may not be available in all product editions.

For more information, see Start a Subscription.

BigQuery Running Environment:

Beginning in this release, imported datasets created with customer SQL are supported for execution in the BigQuery running environment. For more information, see BigQuery Running Environment.

Connectivity:

Early Preview (read-only) connections available with this release:

Feature Availability: This feature may not be available in all product editions.

Plans:

  • Create plan tasks to delete files and folders from file-based backend storage.

    Feature Availability: This feature may not be available in all product editions.

    For more information, see Create Delete Task.

  • You can now reference output metadata from within your plans. See Plan Metadata References.

Collaboration:

You can view the list of collaborators and their corresponding avatars on shareable objects, such as Flows, Plans, and Connections pages.

Sampling:

  • Adjust the size of samples loaded in the browser for your current recipe to improve performance and address low-memory conditions. See  Change Recipe Sample Size.

Changes in System Behavior

None.

Deprecated

None.

Key Bug Fixes

TicketDescription
TD-65502Datasets from parameters are improperly being permitted to be referenced in recipes and returns an error during job execution.

New Known Issues

None.

Earlier Releases

For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.

This page has no comments.