On April 28, 2021, Google is changing the required permissions for attaching IAM roles to service accounts. If you are using IAM roles for your Google service accounts, please see Changes to User Management.
Contents:
This section contains an archive of release notes for previous releases of Cloud Dataprep by TRIFACTA INC..
For the latest release notes, see Release Notes for Cloud Dataprep.
May 16, 2019
Features
None.
Changes
Cloud Dataprep by TRIFACTA INC. now supports WebAssembly: The product now uses the WebAssembly browser client, which is the default in-browser web client for Google Chrome.
- WebAssembly is available by default in Google Chrome version 68+. Please upgrade to a supported version of Google Chrome. No further installation or configuration is required. For more information, see Desktop Requirements.
- This change addresses multiple known issues involving web client crashes observed while previewing or transforming data.
- Previously, the product supported the PNaCl browser client. This client is still available for use.
- NOTE: In a future release, support for the PNaCl browser client will be deprecated.
- For more information on enabling the PNaCl client, see https://community.trifacta.com/s/article/Enable-PNaCl-web-client.
Cloud Dataflow templates support: Future versions of Cloud Dataprep by TRIFACTA INC. will contain a new method to execute jobs in a programmatic (API) manner. At that time, support for Cloud Dataflow templates will be revisited.
- Re-running a job using Cloud Dataflow using templates, is supported in the current version.
Deprecated
None.
Known Issues
None.
Fixes
March 20, 2019
Features
Standardize: Standardize column values through a simple new interface. See Standardize Page.
- For more information on other methods of standardization, see Overview of Standardization.
Dataflow Job Execution: Customize the Cloud Dataflow regional endpoints, regional zones and machine type for each Cloud Dataprep by TRIFACTA INC. job.
- You can use the Cloud Dataprep by TRIFACTA INC. UI to configure job execution in your local region / zone and choose larger machine types to improve performance of Cloud Dataprep by TRIFACTA INC. job execution.
- Tip: You can specify project-level defaults in your Project Settings page. See Project Settings Page.
- See Run Job Page.
For more information on these regions and zones, see https://cloud.google.com/dataflow/docs/concepts/regional-endpoints.
For more information on machine types, https://cloud.google.com/compute/docs/machine-types.
File Lineage:
- Track file-based lineage using
$filepath
and$sourcerownumber
references while transforming data. - In addition to directly imported files, the
$sourcerownumber
reference also works for datasets with parameters. See Source Metadata References.
Job Management:
- Review details and monitor the status of in-progress jobs. See Job Details Page.
- Filter list of jobs by source of job execution or by date range. See Jobs Page.
New column selection model: In data grid, you can select multiple columns before receiving suggestions and performing transformations on them. For more information, see Data Grid Panel.
- New Selection Details panel enables selection of values and groups of values within a selected column. See Selection Details Panel.
- Copy and paste columns and column values through the column menus. see Copy and Paste Columns.
New functions:
- ARRAYINDEXOF Function
- ARRAYMERGEELEMENTS Function
- ARRAYRIGHTINDEXOF Function
- ARRAYSLICE Function
- ARRAYSORT Function
- LISTAVERAGE Function
- LISTMAX Function
- LISTMIN Function
- LISTMODE Function
- LISTSTDEV Function
- LISTSUM Function
- LISTVAR Function
- TRANSLITERATE Function
Changes
None.
Deprecated
None.
Known Issues
TD-39411: Cannot import BigQuery table or view when source is originally from Google Suite.
- Cloud Dataprep by TRIFACTA INC. only supports native BigQuery tables and views. Cloud Dataprep by TRIFACTA INC. does not support BigQuery sources that reference data stored in Google Suite, such as Google Sheets.
- Workaround: Create a copy of the BigQuery table linked to the Google Suite source within BigQuery. Then, import the native BigQuery table as a dataset in Cloud Dataprep by TRIFACTA INC. using the Import Dataset page.
TD-39386: Some users may not be able to edit datasets with parameters, receiving an HTTP 403 error (permission denied) on sources that should be accessible.
- Workaround: Create a replacement dataset with parameters from scratch and swap out the old dataset with the new dataset with parameters.
TD-39296: Cannot run Cloud Dataflow jobs on datasets with parameters sourced from Parquet file or files.
- Workaround: Generate source using another supported file format or union all Parquet-sourced datasets as first step.
TD-39295: Parquet jobs fail on Cloud Dataflow when dataset contains columns of INT96 data type.
- Workaround: Data type INT96 has been deprecated from the library used to convert Parquet data. Please change the source to another data type and re-import. For more information, see https://github.com/apache/parquet-mr/commit/6901a2040848c6b37fa61f4b0a76246445f396db.
TD-39173: Cannot preview imported datasets when source is Avro file.
- Workaround: File can still be imported and wrangled.
TD-38869: Upload of Parquet files does not support nested values, which appear as null values in the Transformer page.
- Workaround: Unnest the values before importing into the platform.
TD-37688: Documentation for new Selection Details Panel was not updated.
- The Selection Details panel replaces and extends the Suggestion Cards Panel. The feature is present, while the documentation is outdated.
- Updated documentation will be available in the next release.
- Workaround: Documentation for the new Selection Details panel is available here: https://docs.trifacta.com/display/SS/Selection+Details+Panel
TD-37683: Send a copy does not create independent sets of recipes and datasets in new flow. If imported datasets are removed in the source flow, they disappear from the sent version.
- Workaround: Create new versions of the imported datasets in the sent flow.
Fixes
November 19, 2018
Features
Variable overrides:
- For flow executions, you can apply override values to multiple variables. See Flow View Page .
- Apply variable overrides to scheduled job executions. See Add Schedule Dialog.
- Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.
New transformations:
- Bin column: Place numeric values into bins of equal or custom size for the range of values
- Scale column: Scale a column's values to a fixed range or to zero mean, unit variance distributions.
- One-hot encoding: Encode values from one column into separate columns containing
0
or1
, depending on the absence or presence of the value in the corresponding row. - Group By: Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.
New functions:
- ARRAYELEMENTAT Function : Returns element value of input array for the provided index value.
- DOUBLEMETAPHONE Function : Returns primary and secondary phonetic spellings of an input string using the Double Metaphone algorithm.
- DOUBLEMETAPHONEEQUALS Function : Returns
true
if two strings match phonetic spellings using Double Metaphone algorithm. Tolerance threshold can be adjusted. - UNIQUE Function : Generates a new column containing an array of the unique values from a source column.
Changes
Deprecated
None.
Known Issues
TD-34822: Case-sensitive variations in date range values are not matched when creating a dataset with parameters.
- NOTE: Date range parameters are now case-insensitive.
Fixes
TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
- NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.
TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
- NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
September 21, 2018
This release of Cloud Dataprep by TRIFACTA® INC. is generally available (GA).
Features
Share flows within the same project: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use. For more information, see Overview of Sharing.
NOTE: If you try to share a flow with a known user of Cloud Dataprep by TRIFACTA INC. and receive aThat is not a valid email
error, please ask that user to login again into
Cloud Dataprep by TRIFACTA INC. in the same GCP project.
Changes
None.
Deprecated
None.
Known Issues
TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.
- Workaround: Cast the NUMERIC type to FLOAT, and the import should succeed.
- NOTE: Support for NUMERIC data type for BigQuery began on August 20, 2018. For details, see https://cloud.google.com/bigquery/docs/release-notes.
- Support for NUMERIC data type in is planned for a future release.
TD-34061: Running jobs on datasets sourced from more than 6000 files may fail.
NOTE: Due to a limitation in Cloud Dataflow, when you run a job on a parameterized dataset containing more than 1000 files, the input paths data must be compressed, which results in non-readable location values in the Cloud Dataflow job details.
Workaround: For this and other performance reasons, try to limit your parameterized datasets to no more than 5000 source files.
TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.
NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.
Fixes
July 18, 2018
Features
Introducing pre-defined transformations for common tasks: Through the context panel, you can search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser.
- See Search Panel.
- See Transform Builder.
Match your recipe to the target: Assign a new target to your recipes to provide matching guidance during wrangling. See Overview of RapidTarget.
- Targets assigned to a recipe appear in a column header overlay to assist you in aligning your dataset to match the dataset schema to the target schema. See Data Grid Panel.
Share flows: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use.
NOTE: This feature may not be immediately available in your user account or in your collaborators' accounts. Please check again in a few days. For more information, see Overview of Sharing.
Import/Export Flows: Export flows and import them into a GCP project for flows created in Cloud Dataprep by TRIFACTA® INC..
- See Export Flow.
- See Import Flow.
- You can also export the dependencies of an executed job as a separate flow. See Flow View Page.
- You can only import flows that are exported from Cloud Dataprep by TRIFACTA® INC. of the same version.
Changes
Deprecated
Aggregate transform: The aggregate transform has been removed from the platform.
Aggregate functionality has been integrated into pivot, so you can accomplish the same tasks.
NOTE: All prior functionality for the Aggregate transform is supported in the new release using the Pivot transform.- In the Search panel, enter
pivot
. See Search Panel.
Known Issues
TD-33900: When headers use protected names, the columns may be renamed.
- Workaround: At the beginning of the recipe, you may be able to rename your source. For larger flows, this workaround may not be practical
TD-33888: 'Unable to load wrangled Dataset Script is malformed (Cannot read property 'push' of undefined)" error when opening recipe with Case transformations.
- Workaround: Recipe should still be accessible. If so, click the broken step and select Copy to clipboard.... Paste recipe step into clipboard. Delete the original. Rebuild the Case transformation using the version you copied to the clipboard.
TD-33798: "Could not create dataset" error when importing Avro dataset from Google Cloud Storage.
- Workaround: Import the file into BigQuery. Then, import the dataset as a BigQuery table into Cloud Dataprep by TRIFACTA INC..
TD-31627: Prefixes added to column names in the Join page are not propagated to subsequent recipe steps that already existed.
- Workaround: Perform a batch rename of column names in a step after the join. See Rename Columns.
TD-31305: Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.
- This issue also applies to flows that were upgraded from a previous release.
- Workaround: Recreate the samples after the move or copy.
TD-31252: Assigning a target schema through the Column Browser does not refresh the page.
- Workaround: To update the page, reload the page through the browser.
TD-31165: Job results are incorrect when a sample is collected and then the last transform step is undone.
- Workaround: Recollect a sample after undoing the transform step.
TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.
- Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
Fixes
date(yr, month, 0)
as first date of the previous month. It should return a null value.
May 23, 2018
Features
None.
Changes
GDPR: The product is now compliant with GDPR regulations in the European Union. This regulation provides enhanced data privacy requirements for users. For more information, see https://www.eugdpr.org/.
As part of this compliance, Cloud Dataprep by TRIFACTA INC. has updated Terms of Service and Privacy Policy for all users, effective immediately:
- Cloud Dataprep by TRIFACTA INC. Terms of Service
- Cloud Dataprep by TRIFACTA INC. Privacy Policy
Deprecated
None.
Known Issues
Fixes
None.
April 25, 2018
Features
None.
Changes
Deprecated
None.
Known Issues
None.
January 23, 2018
Features
BigQuery read/write access across projects:
Read from BigQuery tables associated with GCP projects other than the current one where Cloud Dataprep by TRIFACTA INC. was launched.
Write results into BigQuery tables associated with other projects.
You must configure Cloud Dataprep by TRIFACTA INC. & Cloud Dataflow service accounts to have read or write access to BigQuery datasets and tables outside of the current GCP project.
Re-run job on Cloud Dataflow:
After you run a job in Cloud Dataprep by TRIFACTA INC., you can re-run the job directly from the Cloud Dataflow interface.
Inputs and outputs are parameters that you can modify.
Operationalize the job with a third-party scheduling tool.
Convert phone and date patterns:
In Column Details, you can select a phone number or date pattern to generate suggestions for standardizing the values in the column to a single format.
See Column Details Panel.
Changes
New Flow Objects: The objects in your flow have been modified and expanded to provide greater flexibility in flow definition and re-use:
References: Create references to the outputs of your recipes and use them as inputs to other recipes.
Output object: Specify individual publishing outputs in a separate object associated with a recipe. Publishing options include format, location, and data type.
For more information, see Object Overview.
Deprecated
Known Issues
date(yr, month, 0)
as first date of the previous month. It should return a null value.
Fixes
November 2, 2017
Features
New Transformer page: New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.
Transformation suggestions are now displayed in a right-side panel, instead of on the bottom of the page.
A preview for a transformation suggestion is displayed only when you hover over the suggestion.
Improved sampling: Enhanced sampling methods provide access to customizable, task-oriented subsets of your data. See Samples Panel.
Improved Transformer loading due to persistence of initial sample.
For more information on the new sampling methods, see Overview of Sampling.
Changes
Deprecated
Known Issues
TD-24312: Improved Error Messages for Trifacta users to identify pre-job run failures.
If an error is encountered during the launch of a job but before job execution, you can now view a detailed error message as to the cause in the failed job card.
Common errors that occur during the launch of a job include:
Dataflow staging location is not writeable
Dataflow cannot read from and write to different regions
Insufficient workers for Dataflow, please check your quota
Fixes
TD-26177: Cloud Dataflow job fails for large avro files.
Avro datasets that were imported before this release may still have failures during job execution on Cloud Dataflow. To fix these failures, you must re-import the dataset.
splitrows
transform allows splitting even if required parameter on
is set to an empty value.
=
) in output path.
This page has no comments.