Contents:
These release notes apply to the following product tiers of Dataprep by Trifacta®:
- Dataprep by Trifacta Enterprise Edition
- Dataprep by Trifacta Professional Edition
- Dataprep by Trifacta Starter Edition
- Dataprep by Trifacta Premium
- Dataprep by Trifacta Standard
- Dataprep by Trifacta Legacy
Tip: You can see your product tier in the Trifacta application. Select Help menu > About Cloud Dataprep.
For more information, see Product Editions.
For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.
November 23, 2021
Release 8.9
What's New
Refer and Earn:
Beginning in this release, for every new sign-up you refer, you get a reward of your choice. For more information, see Referrals Page.
Self-serve upgrades from your free trial:
Through the trial expiration page, you can review and select the preferred plan that suits you. Provide the required card details through the application and subscribe to your preferred plan.
For more information, see Start a Subscription
BigQuery Running Environment:
Beginning in this release, imported datasets created with customer SQL are supported for execution in the BigQuery running environment. For more information, see BigQuery Running Environment.
Connectivity:
Early Preview (read-only) connections available with this release:
- Presto
- Microsoft Advertising
- For more information, see Early Preview Connection Types.
Plans:
Create plan tasks to delete files and folders from file-based backend storage.
For more information, see Create Delete Task.
- You can now reference output metadata from within your plans. See Plan Metadata References.
Collaboration:
You can view the list of collaborators and their corresponding avatars on shareable objects, such as Flows, Plans, and Connections pages.
- For more information, see Flows Page.
- For more information, see Connections Page.
- For more information, see Plans Page.
Sampling:
- Adjust the size of samples loaded in the browser for your current recipe to improve performance and address low-memory conditions. See Change Recipe Sample Size.
Changes in System Behavior
None.
Deprecated
None.
Key Bug Fixes
Ticket | Description |
---|---|
TD-65502 | Datasets from parameters are improperly being permitted to be referenced in recipes and returns an error during job execution. |
New Known Issues
None.
October 12, 2021
Release 8.8
What's New
Project Usage:
- VCU usage and active users are now displayed in the Trifacta application for administrators. For more information, see Usage Page.
Trifacta Photon:
- You can now configure the Trifacta application to execute Trifacta Photon jobs in your VPC.
NOTE: This feature is in Beta release.
For more information, please contact Trifacta Support.
Changes
Cancellation of jobs is temporarily disabled:
In previous releases, you could cancel in-progress jobs for flow and sampling jobs through the Trifacta application. As of this release, canceling of job types, such as sampling, transformation, and profiling jobs, is temporarily disabled.
NOTE: This change applies to all types of jobs executed across all running environments, including BigQuery. For plan runs, some jobs, such as flow tasks, may continue to completion before the plan is canceled.
Tip: For Dataflow jobs, you can still cancel them through the Dataflow interface in Trifacta platform.
Job cancellation may be re-enabled in the future.
Billing:
Charges for your project and user usage of Dataprep by Trifacta are applied to your account based on the UTC (Greenwich) time zone. However, Google Marketplace tracks and reports usage based on the Pacific (U.S. West Coast) time zone, so some discrepancies in reporting have been observed.
Beginning at the end of October 2021, these discrepancies will be addressed. The daily reporting interval will be changed to start and end at midnight Pacific time to match how Google Marketplace reports. However, the usage tracking will remain based on the UTC time zone.
NOTE: During the year, UTC time may mean:
- Pacific time zone is UTC-07:00 during daylight savings time.
- Pacific time zone is UTC-08:00 during standard time.
vCPU usage has been tracked on an hourly basis and will be unchanged.
For more information, see Usage Page.
Import:
Improvements have been made in how double quotes are handled in CSV files during import to align Dataprep by Trifacta with other systems that support CSV import.
Example values in source CSV file:
"""My product""",In stock,"16,000",0.05
Note that the value
16,000
must be double-quoted, since the value contains a comma, which is the field delimiter.Previously, this value appeared in the Transformer page in columns as the following:
c1 c2 c3 c4 """My product"""
In stock
"16,000"
0.05
As of this version, the Trifacta application handles the values in a better manner when displaying them in the Transformer page:
c1 c2 c3 c4 "My product"
In stock
16,000
0.05
c1: Escaped values (tripe double-quotes) in the source no longer render in the application as triple double-quotes and are represented as quoted values.
c3: Note that the double quotes in
c3
have been stripped. Leading and trailing quotes are trimmed if the quotes are balanced within a cell.NOTE: This change in behavior applies only to newly created imported datasets sourced from a CSV file. Existing imported datasets should not be affected. However, if a newly imported dataset is transformed by a previously existing recipe that compensated for the extra quotes in the Transformer page, the effects on output data could be unpredictable. These recipes and their steps should be reviewed.
This change does apply to any newly imported dataset sourced from CSV and may cause the data to change. For example, if you export an older flow and import into a new workspace or project, this change in parsing behavior applies to the datasets that are newly created in the new environment. Recipes may require review upon import.
When results are generated in CSV, output files should continue to reflect the formatting of the source data before import. See above.
Tip: You can also choose the Include quotes option when creating a CSV output.
When profiling is enabled, values that appear in CSV as
""
are now marked as missing.
API:
To prevent overloading mission-critical API endpoints, rate limiting on a select set of API endpoints has been implemented in the Trifacta platform. For more information, see Changes to the APIs.
BigQuery Running Environment:
When running jobs in BigQuery, some additional data types, functions, and transformations are now supported:
Data types: The following data types are now supported for execution in BigQuery:
- Arrays
- Objects (Maps)
- Aggregation functions:
- LIST
- LISTIF
- UNIQUE
- See Aggregate Functions.
- Date functions:
- WEEKNUM
- CONVERTFROMUTC
- CONVERTTOUTC
- CONVERTTIMEZONE
- DATEDIF: All unit types are now supported.
- See Date Functions.
String functions:
- SUBSTITUTE
- PROPER
- REMOVESYMBOLS
- DOUBLEMETAPHONE
- See String Functions.
Nested functions:
- ARRAYCONCAT
- ARRAYCROSS
- ARRAYINTERSECT
- ARRAYLEN
- ARRAYSTOMAP
- ARRAYUNIQUE
- ARRAYZIP
- FILTEROBJECT
- KEYS
- ARRAYELEMENTAT
- LISTAVERAGE
- LISTMAX
- LISTMIN
- LISTMODE
- LISTSTDEV
- LISTSUM
- LISTVAR
- ARRAYSORT
- ARRAYINDEXOF
- ARRAYMERGEELEMENTS
- ARRAYRIGHTINDEXOF
- ARRAYSLICE
- See Nested Functions.
- Other functions:
- IPTOINT
- IPFROMINT
- See Other Functions.
Transformations:
Search term
Transform
Unnest elements unnest Expand Array to rows flatten Extract between delimiters extractbetweendelimiters Unpivot unpivot Standardize column standardize Nest columns nest Extract matches to Array extractlist Replace between delimiters replacebetweenpatterns Scale to min max scaleminmax Scale to mean scalestandardize Convert key/value to Object extractkv Join Join datasets
For more information, see Join Types.Legend:
- Search term: the value you enter in the Transform Builder
- Transform: name of the underlying transform
For more information, see Transformation Reference.
BigQuery Running Environment:
- Support for reading Google Cloud Storage files for execution in BigQuery.
- For more information, see Flow Optimization Settings Dialog.
- For more information, see Google Cloud Storage Access.
Deprecated
None.
Known Issues
None.
Fixes
Ticket | Description |
---|---|
TD-64383 | Dataflow jobs that use custom SQL to query an authorized view may fail when the Service Account in use has access to the authorized view but no access to underlying BigQuery table. |
September 15, 2021
Release 8.7
What's New
Templates:
From the Flows page, you can now access pre-configured templates directly from the templates gallery.
Tip: Click Templates in the Flows page. Select the template, and the template is opened in Flow View for you.
- For more information, see Flows Page.
- For more information on using a template in the product, see Start with a Template.
Browsers:
- Update to supported browsers:
- Mozilla Firefox is generally supported.
- Microsoft Edge is now supported.
NOTE: This feature is in Beta release. - New versions of supported browsers are now supported.
- For more information, see Browser Requirements.
Plans:
Create plan tasks to deliver messages to a specified Slack channel.
For more information, see Create Slack Task.
Import data:
When you are importing from or writing to Base Storage, you can choose to display hidden files and folders for access to them.
Tip: Use this option to access files generated for your job's visual profile and then publish them to BigQuery for additional analysis.
For more information, see Import Data Page.
Sharing:
- Paste in a comma-separated list of email addresses to share flows, plans, or connections with multiple users at the same time.
- See Share Flow Dialog.
- See Share Connection Dialog.
Publishing:
Strict type matching for publishing to BigQuery Datetime columns.
Tip: You can enable or disable strict type matching during publication to BigQuery. Strict type matching is enabled by default for new flows. You can disable the flag to revert to previous BigQuery publishing behaviors. See BigQuery Table Settings.
For more information, see BigQuery Data Type Conversions.
Recipe panel:
- Some enhancements to flagging steps for review. See Flag for Review.
Changes
None.
Deprecated
API:
- Deprecated API endpoint to transfer assets between users has been removed from the platform. This endpoint was previously replaced by an improved method of transfer.
- Some connection-related endpoints have been deprecated. These endpoints have little value for public use.
- For more information, see Changes to the APIs.
Known Issues
Ticket | Description |
---|---|
TD-63517 | Unpivoting a String column preserves null values in Dataflow but converts them to empty strings in Photon. Running jobs on the different running environments generates different results. Workaround: After the unpivot step, you can add an Edit with formula step. Set the columns to all of the columns in the unpivot and add the following formula, which converts all missing values to null values: if(ismissing($col),NULL(),$col) |
Fixes
Ticket | Description |
---|---|
TD-63564 | Schedules created by a flow collaborator with editor access stop working if the collaborator is removed from the flow. Collaborators with viewer access cannot create schedules. |
August 16, 2021
Release 8.6
What's New
Template Gallery:
Tip: You can start a trial account by selecting a pre-configured template from our templates gallery. See www.trifacta.com/templates.
Collaboration:
Flow editors and plan collaborators can be permitted to schedule jobs. See Dataprep Project Settings Page.
Connectivity:
Upload tabular data from PDF documents. See Import PDF Data.
Early Preview (read-only) connections available with this release:
- Google Ads
- NetSuite
- For more information, see Early Preview Connection Types.
Performance:
Conversion jobs are now processed asynchronously.
Better management of file locking and concurrency during job execution.
Better Handling of JSON files:
The Trifacta application now supports the regularly formatted JSON files during import. You can now import flat JSON records contained in a single array object. With this, each array is treated as a single line and imported as a new row. For more information, see Working with JSON v2
Usage reporting:
Detailed reporting on vCPU and active users is now available in the Trifacta application.
NOTE: Active user reporting may not be available until September 1, 2021 or later.
For more information, see Usage Page.
Changes
Dataflow machines:
The following machine types are now available when running a Dataflow job:
"e2-standard-2", "e2-standard-4", "e2-standard-8", "e2-standard-16", "e2-standard-32"
Deprecated
None.
Known Issues
TD-63564: Schedules created by a flow collaborator with editor access stop working if the collaborator is removed from the flow.
Tip: Flow owners can delete the schedule and create a new one. When this issue is fixed, the original schedule will continue to be executed under the flow owner's account.
Collaborators with viewer access cannot create schedules.
Fixes
- TD-61478: Time-based data types are imported as String type from BigQuery sources when type inference is disabled.
July 20, 2021
Release 8.5
What's New
Tip: When you complete your Dataprep by Trifacta Enterprise Edition or Dataprep by Trifacta Professional Edition trial, you can choose to license a higher or lower tier product edition. For more information, see Product Editions.
Parameterization:
Create environment parameters to ensure that all users of the project or workspace use consistent references.
NOTE: You must be a workspace administrator or project owner to create environment parameters.
Tip: Environment parameters can be exported from one project or workspace and imported into another, so that these references are consistent across the enterprise.
- For more information, see Environment Parameters Page.
- For more information on parameters in general, see Overview of Parameterization.
- Parameterize names of your storage buckets using environment parameters.
- See Create Dataset with Parameters.
- See Create Outputs.
- For more information on parameters, see Overview of Parameterization.
Schedules:
Project owners and workspace administrators can review, enable, disable, and delete schedules through the application.
See Schedules Page.
Flow View:
- Click a node to see its lineage within the flow. See Flow View Page.
Job execution:
- Define SQL scripts to execute before data ingestion or after publication for file-based or table-based jobs.
This feature may need to be enabled in your environment. For more information, see Dataprep Project Settings Page.
For more information, see Create Output SQL Scripts.
Resource usage:
- Review the total vCPU hours consumed by job execution within your project across an arbitrary time period.
- For more information, see Usage Page.
- For more information, see Usage Metrics.
Connectivity:
Contribute to the future direction of connectivity: Click I'm interested on a connection card to upvote adding the connection type to the Trifacta application. See Create Connection Window.
Early Preview (read-only) connections available with this release:
- Apache Impala
- For more information, see Early Preview Connection Types.
Connectivity:
- Connect to your relational database systems hosted on Cloud SQL. In the Connections page, click the Cloud SQL card for your connection type.
For more information, see Create Connection Window.
Connectivity:
- Read-only support for Teradata connections. For more information, see Teradata Connections.
API:
Cancel in-progress Dataflow jobs via API.
See Changes to the APIs.
Job execution:
You can choose to ignore the recipe errors before job execution and then review any errors in the recipe through the Job Details page.
- For more information, see Run Job Page.
- For more information, see Job Details Page.
Language:
- NUMVALUE function can be used to convert a String value formatted as a number into an Integer or Decimal value.
- NUMFORMAT function now supports configurable grouping and decimal separators for localizing numeric values.
- For more information, see Changes to the Language.
Performance:
- Improved performance when browsing folders containing a large number of files on Base Storage.
Resource usage:
- Review the total vCPU hours consumed by your datasets, recipes, and job execution within your project across an arbitrary time period.
- For more information, see Usage Page.
- For more information, see Usage Metrics.
Changes
None.
Deprecated
None.
Known Issues
None.
Fixes
- TD-62190: You may not be able to view the SQL that was used to execute a job within BigQuery. This issue is due to a regression in the new BigQuery console in which job identifiers containing dashes are not supported. A ticket has been filed with Google.
June 7, 2021
Release 8.4
What's New
Template Gallery:
- Check out the new gallery of flow templates, which can be imported into your workspace. These templates are pre-configured to solve the most compelling loading and transformation use cases in the product. For more information, see www.trifacta.com/templates.
- For more information on importing flows into your workspace, see Import Flow.
- For more information on using a template in the product, see Start with a Template.
Connectivity:
Early Preview (read-only) connections available with this release:
- Splunk
- YouTube Analytics
- For more information, see Early Preview Connection Types.
Collaboration:
- You can receive email notifications whenever a plan or a flow is shared with you by the owner.
This feature may need to be enabled in your environment. For more information, see Dataprep Project Settings Page.
- For more information, see Email Notifications Page.
Support for delete actions on merge (upsert) operations in BigQuery:
When publishing to a BigQuery table, you can choose to update or, with this release, to delete matching records during a merge option. For more information, see BigQuery Table Settings.
Job execution:
You can choose to ignore the recipe errors before job execution and then review any errors in the recipe through the Job Details page.
- For more information, see Run Job Page.
- For more information, see Job Details Page.
Language:
- New content on parsing functions. See Changes to the Language.
Changes
Trifacta Photon limits on execution time
Trifacta Photon is an in-memory running environment that is hosted on the same node as Dataprep by Trifacta, which allows for faster execution suitable for small- to medium-sized jobs.
NOTE: Jobs that are executed on Trifacta Photon may be limited to run for a maximum of 10 minutes, after which they fail with a timeout error. If your job fails due to this limit, please switch to running the job on Dataflow.
Trifacta Photon can be enabled or disabled by a project administrator. For more information, see Dataprep Project Settings Page.
Execution of scheduled jobs on Trifacta Photon is not supported
In conjunction with the previous change, execution of scheduled jobs is not supported on Trifacta Photon. Since Trifacta Photon jobs are now limited to 10 minutes of execution time, scheduled jobs have been automatically migrated to execution on Dataflow to provide better execution success. For more information, see Trifacta Photon Running Environment.
Deprecated
None.
Known Issues
- TD-62190: You may not be able to view the SQL that was used to execute a job within BigQuery. This issue is due to a regression in the new BigQuery console in which job identifiers containing dashes are not supported. A ticket has been filed with Google.
Fixes
- TD-60881: Incorrect file path and missing file extension in the application for parameterized outputs
- TD-60382: Date format
M/d/yy
is handled differently by PARSEDATE function on Trifacta Photon and Spark.
May 20, 2021
Release 8.3 - push 3
What's New
Connectivity:
Support for SFTP connections.
For more information, see SFTP Connections.NOTE: This connection type is import only.
Changes
Trifacta Photon enabled by default
Trifacta Photon is an in-memory running environment that is hosted on the same node as Dataprep by Trifacta, which allows for faster execution suitable for small- to medium-sized jobs.
NOTE: Jobs executed in Trifacta Photon are executed within the Trifacta VPC. Data is temporarily streamed to the Trifacta VPC during job execution and is not persisted.
Beginning in this release, Trifacta Photon is enabled by default. Users can choose to run jobs on Trifacta Photon.
NOTE: For Dataprep by Trifacta Enterprise Edition, Trifacta Photon is enabled by default for new projects. For existing projects, a project administrator must still choose to enable it.
Trifacta Photon can be enabled or disabled by a project administrator. For more information, see Dataprep Project Settings Page.
Deprecated
None.
Known Issues
None.
Fixes
None.
May 10, 2021
Release 8.3
What's New
Running Environments:
- Support for full job execution on BigQuery.
- Support for visual profiling of jobs executed in BigQuery.
- For more information, see Configure Running Environments.
- For more information, see Overview of Job Execution.
Cancel Jobs in Dataflow:
You can cancel Dataflow jobs directly from the product.
NOTE: In some cases, the product is unable to cancel the job from the application. In these cases, click View in Dataflow Job and from there you can cancel the job in progress .
- For more information, see Job Details Page.
- For more information, see Jobs Page.
Support for merge (upsert) operations in BigQuery:
When publishing to a BigQuery table, you can choose to write results using the merge option. When selected, you specify a primary key of fields and then decide how data is merged into the table. For more information, see BigQuery Table Settings.
Connectivity:
Early Preview (read-only) connections available with this release:
- Authorize.net
- Cockroach DB
- DB2
- Google Data Catalog
- Google Spanner
- Magento
- Redis
- Shopify
- Smartsheet
- Trello
- QuickBase
- For more information, see Early Preview Connection Types.
Job execution:
Introducing new filter pushdowns to optimize the performance of your flows during job execution. For more information, see Flow Optimization Settings Dialog.
Job results:
You can now preview job results and download them from the Overview tab of the Job details page. For more information, see Job Details Page.
Tip: You can also preview job results in Flow View. See View for Outputs.
Changes
Improved method of JSON import
Beginning in this release, the Trifacta application now uses the conversion service to ingest JSON files during import. This improved method of ingestion can save significant time wrangling JSON into records.
NOTE: The new method of JSON import is enabled by default but can be disabled as needed.
For more information, see Working with JSON v2.
Flows that use imported datasets created using the old method continue to work without modification.
NOTE: It is likely that support for the v1 version of JSON import is deprecated in a future release. You should switch to using the new version as soon as possible. For more information on migrating your flows and datasets to use the new version, see Working with JSON v1.
Future work on support for JSON is targeted for the v2 version only.
Optionally, you can re-enable the old version, which is useful for migrating to the new version.
For more information on using the old version and migrating to the new version, see Working with JSON v1.
Deprecated
None.
Known Issues
- TD-61478: Time-based data types are imported as String type from BigQuery sources when type inference is disabled.
Fixes
- TD-60701: Most non-ASCII characters incorrectly represented in visual profile downloaded in PDF format.
- TD-59854: Datetime column from Parquet file incorrectly inferred to the wrong data type on import.
April 26, 2021
Release 8.2 push2
What's New
Upgrade: Trial customers can upgrade through the Admin console. See Admin Console.
This is the initial release of for the following product tiers:
- Dataprep by Trifacta Enterprise Edition
- Dataprep by Trifacta Professional Edition
- Dataprep by Trifacta Starter Edition
Changes
None.
Deprecated
None.
Known Issues
None.
Fixes
None.
April 14, 2021
Release 8.2
This is the initial release of for the following product tiers:
- Dataprep by Trifacta Enterprise Edition
- Dataprep by Trifacta Professional Edition
- Dataprep by Trifacta Starter Edition
What's New
Photon:
Introducing Trifacta Photon, an in-memory running environment for running jobs. Embedded in the Dataprep by Trifacta, Trifacta Photon delivers improved performance in job execution and is best-suited for small- to medium-sized jobs.
NOTE: Trifacta Photon must be enabled by a project owner. For more information, see Dataprep Project Settings Page.
- When you choose to run a job, you can now choose to run a job on Trifacta Photon.
- For more information, see Run Job Page .
Quick scan sampling:
- Trifacta Photon also enables quick scan sampling. A quick scan sample generates an appropriate selection of rows from the dataset from which the sample was initiated. These samples are faster to generate. For more information, see Overview of Sampling.
- For more information on generating samples, see Samples Panel.
Preferences:
- Re-organized user account, preferences, and storage settings to streamline the setup process. See Preferences Page.
Connectivity:
Early Preview (read-only) connections available with this release:
- Greenplum
- HubSpot
- MariaDB
- Microsoft Dynamics 365 Sales
- ServiceNow
- Smartsheet
- For more information, see Early Preview Connection Types.
Plan metadata references:
Use metadata values from other tasks and from the plan itself in your HTTP task definitions.
- For more information, see Create HTTP Task.
- For more information, see Plan Metadata References.
Improved accessibility of job results:
The Jobs tabs have been enhanced to display the list of latest and the previous jobs that have been executed for the selected output.
For more information, see View for Outputs.
Sample Jobs Page:
Simplified output and destination experience:
From the Home Page, you can quickly redesign your output and destination experience. The step-by-step procedures enables you to create an improved and streamlined output creation experience. For more information, see Start with a Template.
Changes
Improved methods for disabling the product:
Project owners can choose to disable Dataprep by Trifacta from within the product. For more information, see Enable or Disable Dataprep.
After the product has been disabled in a project, Trifacta data is placed in a hidden state for later purging. For more information on purging or restoring data, see Wipe Out Dataprep Data.
API:
The following API endpoints are scheduled for deprecation in a future release:
NOTE: Please avoid using the following endpoints.
/v4/connections/vendors /v4/connections/credentialTypes /v4/connections/:id/publish/info /v4/connections/:id/import/info
These endpoints have little value for public use.
Deprecated
None.
Known Issues
- TD-60701: Most non-ASCII characters incorrectly represented in visual profile downloaded in PDF format.
Fixes
- TD-59236: Use of percent sign (%) in file names causes Transformer page to crash during preview.
- TD-59218: BOM characters at the beginning of a file causing multiple headers to appear in Transformer Page.
Earlier Releases
For release notes from previous releases, see Earlier Releases of Dataprep by Trifacta.
This page has no comments.