Page tree

Trifacta Dataprep


Contents:

On April 28, 2021, Google is changing the required permissions for attaching IAM roles to service accounts. If you are using IAM roles for your Google service accounts, please see Changes to User Management.

   

Contents:


This section contains an archive of release notes for previous releases of  Cloud Dataprep by TRIFACTA INC..

For the latest release notes, see Release Notes for Cloud Dataprep.

May 16, 2019

Features

None.

Changes

Cloud Dataprep by TRIFACTA INC. now supports WebAssembly: The product now uses the WebAssembly browser client, which is the default in-browser web client for Google Chrome.

  • WebAssembly is available by default in Google Chrome version 68+. Please upgrade to a supported version of Google Chrome. No further installation or configuration is required. For more information, see Desktop Requirements.
  • This change addresses multiple known issues involving web client crashes observed while previewing or transforming data.
  • Previously, the product supported the PNaCl browser client. This client is still available for use.

 

Cloud Dataflow: Cloud Dataflow SDK has been updated to version 2.11.0 from 2.6.0.

 

Cloud Dataflow templates support: Future versions of Cloud Dataprep by TRIFACTA INC. will contain a new method to execute jobs in a programmatic (API) manner. At that time, support for Cloud Dataflow templates will be revisited.

  • Re-running a job using Cloud Dataflow using templates, is supported in the current version.

 

Deprecated

None.

Known Issues

None.

Fixes

TD-39386: Some users may not be able to edit datasets with parameters, receiving an HTTP 403 error (permission denied) on sources that should be accessible.

 

March 20, 2019

Features

Standardize: Standardize column values through a simple new interface. See Standardize Page.

 

Dataflow Job Execution: Customize the Cloud Dataflow regional endpoints, regional zones and machine type for each Cloud Dataprep by TRIFACTA INC. job.

 

File Lineage:

  • Track file-based lineage using $filepath and $sourcerownumber references while transforming data.
  • In addition to directly imported files, the $sourcerownumber reference also works for datasets with parameters. See Source Metadata References.

 

 

 

Job Management:

  • Review details and monitor the status of in-progress jobs. See Job Details Page.
  • Filter list of jobs by source of job execution or by date range. See Jobs Page.

 

Flows: Organize your flows into folders. See Flows Page.

 

New column selection model: In data grid, you can select multiple columns before receiving suggestions and performing transformations on them. For more information, see Data Grid Panel.

  • New Selection Details panel enables selection of values and groups of values within a selected column. See Selection Details Panel.
  • Copy and paste columns and column values through the column menus. see Copy and Paste Columns.

 

Changes

None.

Deprecated

None.

Known Issues

TD-39411: Cannot import BigQuery table or view when source is originally from Google Suite.

  • Cloud Dataprep by TRIFACTA INC. only supports native BigQuery tables and views. Cloud Dataprep by TRIFACTA INC. does not support BigQuery sources that reference data stored in Google Suite, such as Google Sheets.
  • Workaround: Create a copy of the BigQuery table linked to the Google Suite source within BigQuery. Then, import the native BigQuery table as a dataset in Cloud Dataprep by TRIFACTA INC. using the Import Dataset page.

 

TD-39386: Some users may not be able to edit datasets with parameters, receiving an HTTP 403 error (permission denied) on sources that should be accessible.

  • Workaround: Create a replacement dataset with parameters from scratch and swap out the old dataset with the new dataset with parameters.

 

TD-39296: Cannot run Cloud Dataflow jobs on datasets with parameters sourced from Parquet file or files. 

  • Workaround: Generate source using another supported file format or union all Parquet-sourced datasets as first step.

 

TD-39295: Parquet jobs fail on Cloud Dataflow when dataset contains columns of INT96 data type. 

 

TD-39173: Cannot preview imported datasets when source is Avro file.

  • Workaround: File can still be imported and wrangled.

 

TD-38869: Upload of Parquet files does not support nested values, which appear as null values in the Transformer page.

  • Workaround: Unnest the values before importing into the platform.

 

TD-37688: Documentation for new Selection Details Panel was not updated.

  • The Selection Details panel replaces and extends the Suggestion Cards Panel. The feature is present, while the documentation is outdated.
  • Updated documentation will be available in the next release.
  • Workaround: Documentation for the new Selection Details panel is available here: https://docs.trifacta.com/display/SS/Selection+Details+Panel

 

TD-37683: Send a copy does not create independent sets of recipes and datasets in new flow. If imported datasets are removed in the source flow, they disappear from the sent version.

  • Workaround: Create new versions of the imported datasets in the sent flow.

 

Fixes

TD-36332: Data grid can display wrong results if a sample is collected and dataset is unioned.

 

TD-36192: Canceling a step in recipe panel can result in column menus disappearing in the data grid.

 

TD-31252: Assigning a target schema through the Column Browser does not refresh the page.

 

DP-98: BigQuery does not support reading from tables stored in regions other than US or EU.

 

November 19, 2018

Features

Variable overrides:

  • For flow executions, you can apply override values to multiple variables. See Flow View Page .
  • Apply variable overrides to scheduled job executions. See Add Schedule Dialog.
  • Variable overrides can now be applied to samples taken from your datasets with parameters. See Samples Panel.

 

New transformations:

  • Bin column: Place numeric values into bins of equal or custom size for the range of values
  • Scale column: Scale a column's values to a fixed range or to zero mean, unit variance distributions.
  • One-hot encoding: Encode values from one column into separate columns containing  0  or  1 , depending on the absence or presence of the value in the corresponding row.
  • Group By: Generate new columns or replacement tables from aggregate functions applied to grouped values from one or more columns.

 

New functions:

  • ARRAYELEMENTAT Function : Returns element value of input array for the provided index value.
  • DOUBLEMETAPHONE Function : Returns primary and secondary phonetic spellings of an input string using the Double Metaphone algorithm.
  • DOUBLEMETAPHONEEQUALS Function : Returns true if two strings match phonetic spellings using Double Metaphone algorithm. Tolerance threshold can be adjusted.
  • UNIQUE Function : Generates a new column containing an array of the unique values from a source column.

 

CSV publishing options: Add quotes as CSV file publishing options. See Run Job Page.

 

Review and select patterns: Patterns are available for review and selection, prompting suggestions, in the context panel.

 

Swap in dynamic datasets: Swap a static imported dataset with a dataset with parameters in Flow View. See Flow View Page.

 

Named samples: Generated samples can be named. See Samples Panel.

 

Changes

Join Panel: The Join page has been replaced by the new Join Panel in the context panel. See Join Window.

 

Nested expressions: Expressions can be nested within expressions in  Wrangle . See Wrangle Language.

 

Deprecated

None.

Known Issues

TD-34840: Platform fails to provide suggestions for transformations when selecting keys from an object with many of them.

 

TD-34822: Case-sensitive variations in date range values are not matched when creating a dataset with parameters.

  • NOTE: Date range parameters are now case-insensitive.

 

DP-98: BigQuery does not support reading from tables stored in regions other than US or EU.

 

Fixes

TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.

 

TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

  • NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.

 

TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.

  • NOTE: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

 

September 21, 2018

This release of  Cloud Dataprep by TRIFACTA® INC. is generally available (GA).

Features

Share flows within the same project: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use. For more information, see Overview of Sharing.

NOTE: If you try to share a flow with a known user of Cloud Dataprep by TRIFACTA INC. and receive a That is not a valid email error, please ask that user to login again into Cloud Dataprep by TRIFACTA INC. in the same GCP project.

 

Changes

None.

Deprecated

None.

Known Issues

TD-34574: BigQuery tables and views with NUMERIC data type cannot be imported.

  • Workaround: Cast the NUMERIC type to FLOAT, and the import should succeed.
  • NOTE: Support for NUMERIC data type for BigQuery began on August 20, 2018. For details, see https://cloud.google.com/bigquery/docs/release-notes.
  • Support for NUMERIC data type in is planned for a future release.

 

TD-34061: Running jobs on datasets sourced from more than 6000 files may fail.

  NOTE:  Due to a limitation in Cloud Dataflow, when you run a job on a parameterized dataset containing more than 1000 files, the input paths data must be compressed, which results in non-readable location values in the Cloud Dataflow job details.

 Workaround: For this and other performance reasons, try to limit your parameterized datasets to no more than 5000 source files.

 

TD-33428: Job execution on recipe with high limit in split transformation due to Java Null Pointer Error during profiling.

NOTE: Avoid creating datasets that are wider than 2500 columns. Performance can degrade significantly on very wide datasets.

 

Fixes

TD-33901: Cannot sort flows by name in Flows page.

 

TD-33900: When headers use protected names, the columns may be renamed.

 

TD-33888: 'Unable to load wrangled Dataset Script is malformed (Cannot read property 'push' of undefined)" error when opening recipe with Case transformations.

 

TD-33798: "Could not create dataset" error when importing Avro dataset from Google Cloud Storage.

 

TD-33797: Status icon for the active job in Jobs page flickers as you move the mouse.

 

TD-33108: Textbox for name of reference object in Flow View appears stretched.

 

TD-32123: Window transformation doesn't handle order parameter in descending order.

 

July 18, 2018

Features

New home page and left nav bar: New Home page and left nav bar allows for more streamlined access to recent flows and jobs, as well as learning resources. See  Home Page.

 

Updated onboarding tutorial: Expanded onboarding tutorial that extends existing workflow to include import and job result guides.

 

 New Library page: Manage your datasets and references from the new Library page. See  Library Page.

 

 Redesigned Jobs page: In the new Jobs page, you can more easily locate and review all jobs to which you have access. See  Jobs Page.

 

Introducing pre-defined transformations for common tasks: Through the context panel, you can search across dozens of pre-defined transformations. Select one, and the Transform Builder is pre-populated based on the current context in the data grid or column browser. 

 

New Transformer toolbar: New toolbar provides faster access to common transformations and operations. See  Transformer Toolbar.

 

 Match your recipe to the target: Assign a new target to your recipes to provide matching guidance during wrangling. See Overview of RapidTarget.

  • Targets assigned to a recipe appear in a column header overlay to assist you in aligning your dataset to match the dataset schema to the target schema. See Data Grid Panel.

 

Improved column matching: Better intelligence for column matching during union operations. See  Union Page .

 

Improved Join page: Numerous functional improvements to the Join page. See Join Window.

 

More flexible column names: Support for a broader range of characters in column names. See Rename Columns.

 

Share flows: Collaborate with other users through shared flows within the same GCP project. Or send them a copy for their own use.

NOTE: This feature may not be immediately available in your user account or in your collaborators' accounts. Please check again in a few days. For more information, see Overview of Sharing.

 

Import/Export Flows: Export flows and import them into a GCP project for flows created in Cloud Dataprep by TRIFACTA® INC..

  • See Export Flow.
  • See Import Flow.
  • You can also export the dependencies of an executed job as a separate flow. See Flow View Page.
  • You can only import flows that are exported from Cloud Dataprep by TRIFACTA® INC. of the same version.

 

Introducing dynamic datasets with parameters: Use parameterized rules in imported dynamic datasets to allow scheduled jobs to automatically pick up the right input data. See Overview of Parameterization.

 

Changes

Datasets page: The Datasets page has been replaced by the new Library page. See  Library Page.

 

Deprecated

Aggregate transform: The aggregate transform has been removed from the platform.

  • Aggregate functionality has been integrated into pivot, so you can accomplish the same tasks.


    NOTE: All prior functionality for the Aggregate transform is supported in the new release using the Pivot transform.

  • In the Search panel, enter pivot. See Search Panel.

 

Known Issues

TD-33900: When headers use protected names, the columns may be renamed.

  • Workaround: At the beginning of the recipe, you may be able to rename your source. For larger flows, this workaround may not be practical

 

  TD-33888: 'Unable to load wrangled Dataset Script is malformed (Cannot read property 'push' of undefined)" error when opening recipe with Case transformations.

  • Workaround: Recipe should still be accessible. If so, click the broken step and select Copy to clipboard.... Paste recipe step into clipboard. Delete the original. Rebuild the Case transformation using the version you copied to the clipboard.

 

  TD-33798: "Could not create dataset" error when importing Avro dataset from Google Cloud Storage.

  • Workaround: Import the file into BigQuery. Then, import the dataset as a BigQuery table into Cloud Dataprep by TRIFACTA INC..

 

  TD-32123: Window transformation doesn't handle order parameter in descending order.

 

  TD-31627: Prefixes added to column names in the Join page are not propagated to subsequent recipe steps that already existed.

  • Workaround: Perform a batch rename of column names in a step after the join. See Rename Columns.

 

TD-31305: Copying a flow invalidates the samples in the new copy. Copying or moving a node within a flow invalidates the node's samples.

  • This issue also applies to flows that were upgraded from a previous release.
  • Workaround: Recreate the samples after the move or copy.

 

  TD-31252: Assigning a target schema through the Column Browser does not refresh the page.

  • Workaround: To update the page, reload the page through the browser.

 

 TD-31165: Job results are incorrect when a sample is collected and then the last transform step is undone.

  • Workaround: Recollect a sample after undoing the transform step.

 

 TD-30857: Matching file path patterns in a large directory can be very slow, especially if using multiple patterns in a single dataset with parameters.

  • Workaround: To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

 

 TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.

 

Fixes

TD-31339: Writing to a single file in the top-level directory fails if the temporary output generates more than 32 files.

 

TD-29149: Columns containing String values with leading spaces are incorrectly type cast to Integer data type.

 

TD-28930: Delete other columns causes column lineage to be lost and reorders columns.

 

TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.

 

 

May 23, 2018

Features

None.

Changes

Product Name Change: As of this release, the product is now known as Cloud Dataprep by TRIFACTA® INC..

 

GDPR: The product is now compliant with GDPR regulations in the European Union. This regulation provides enhanced data privacy requirements for users. For more information, see https://www.eugdpr.org/.

As part of this compliance, Cloud Dataprep by TRIFACTA INC. has updated Terms of Service and Privacy Policy for all users, effective immediately:

 

Deprecated

None.

Known Issues

TD-28807: You may receive a Nothing Found message when navigating to a BigQuery project that contains data. With your BigQuery administrator, please verify that the service account in use has been properly set up and has the appropriate permissions so that you can use the project.

 

Fixes

None.

April 25, 2018

Features

None.

Changes

Disabling Cloud Dataprep by TRIFACTA INC.: When a user disables Cloud Dataprep by TRIFACTA INC., all metadata associated with Cloud Dataprep by TRIFACTA INC. will be deleted. This operation is not reversible (see Effect of disabling Cloud Dataprep).

 

Deprecated

None.

Known Issues

None.

January 23, 2018

Features

New Flow View page: New objects in Flow View and better organization of them. See Flow View Page.

 

BigQuery read/write access across projects:

Read from BigQuery tables associated with GCP projects other than the current one where Cloud Dataprep by TRIFACTA INC. was launched.

Write results into BigQuery tables associated with other projects.

You must configure Cloud Dataprep by TRIFACTA INC. & Cloud Dataflow service accounts to have read or write access to BigQuery datasets and tables outside of the current GCP project.

 

Re-run job on Cloud Dataflow:

After you run a job in Cloud Dataprep by TRIFACTA INC., you can re-run the job directly from the Cloud Dataflow interface.

Inputs and outputs are parameters that you can modify.

Operationalize the job with a third-party scheduling tool.

 

Cross joins: Perform cross joins between datasets. See Join Window.

 

Enable or disable type inference on files and tables: Enable (default) or disable initial type inference for BigQuery tables or Avro files used as sources for individual datasets. See Import Data Page.

 

Batch column rename: Rename multiple columns in a single transformation step. See Rename Columns.

 

Reuse your common patterns: Browse and select patterns for re-use from your recent history.

 

Convert phone and date patterns:

In Column Details, you can select a phone number or date pattern to generate suggestions for standardizing the values in the column to a single format.

See Column Details Panel.

 

New SUBSTITUTE function: Replace string literals or patterns with a new literal or column value. See SUBSTITUTE Function.

 

Changes

New Flow Objects: The objects in your flow have been modified and expanded to provide greater flexibility in flow definition and re-use:

References: Create references to the outputs of your recipes and use them as inputs to other recipes.

Output object: Specify individual publishing outputs in a separate object associated with a recipe. Publishing options include format, location, and data type.

For more information, see Object Overview.

 

Deprecated

Wrangled Datasets: Wrangled datasets are no longer objects in Cloud Dataprep by TRIFACTA INC.. Their functionality has been moved to other and new objects. For more information, see Object Overview.

 

Known Issues

TD-28155: Sampling from an Avro file on Cloud Dataflow always scans the entire file. As a result, additional processing costs may be incurred.

 

TD-26069: Photon evaluates date(yr, month, 0) as first date of the previous month. It should return a null value.

 

Fixes

TD-27568: Cannot select BigQuery publishing destinations that are empty databases.

 

TD-25733: Attempting a union of 12 datasets crashes UI.

 

TD-24793: BigQueryNotFoundException were incorrectly reported for output tables that have been moved or deleted by user.

 

TD-24130: Cannot read recursive directory structures with files at different levels of folder depth in Cloud Dataflow.

 

November 2, 2017

Features

Interactive Getting Started Tutorial for New Users: New users to Cloud Dataprep by TRIFACTA INC. can review the "Getting Started 101" tutorial with pre-loaded data through the product.

 

Scheduling: Schedule execution of one or more wrangled datasets within a flow. Scheduled jobs must be configured from Flow View. See Flow View Page.

 

New Transformer page: New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.

Transformation suggestions are now displayed in a right-side panel, instead of on the bottom of the page.

A preview for a transformation suggestion is displayed only when you hover over the suggestion.

 

Improved sampling: Enhanced sampling methods provide access to customizable, task-oriented subsets of your data. See Samples Panel.

Improved Transformer loading due to persistence of initial sample. 

For more information on the new sampling methods, see Overview of Sampling

 

Improved Flow View: Improved user experience with flows. See Flow View Page

 

Disable steps: Disable individual steps in your recipes. See Recipe Panel

 

Set encoding settings during import: You can define per-file import settings including file encoding type and automated structure detection. See Import Data Page.

 

Snappy compression: Read/write support for Snappy compression. See Supported File Formats.

 

Column lineage: Highlight the recipe steps where a specific column is referenced. See Column Menus

 

Search for columns: Search for columns by name. See Data Grid Panel.

 

CASE Function: Build multi-conditional expressions with a single CASE statement. See CASE Function.

 

Support for BQ Datetime: Publish Cloud Dataprep by TRIFACTA INC. Datetime values to BigQuery as Datetime or Timestamp values, depending on the data. See BigQuery Data Type Conversions.

 

 

Changes

Supported browser version required: You cannot login to the application using an unsupported version of Google Chrome.

 

Supported encoding types: The list of supported encoding types has changed.

 

Dependencies Browser: The Dependencies browser has been replaced by the Recipe Navigator.

 

Deprecated

Transform Editor: The Transform Editor for entering raw text  Wrangle  steps has been removed. Please use the Transform Builder for creating transformation steps.

 

Known Issues

TD-27568: Cannot select BigQuery publishing destinations that are empty databases.

 

TD-24312: Improved Error Messages for Trifacta users to identify pre-job run failures.

If an error is encountered during the launch of a job but before job execution, you can now view a detailed error message as to the cause in the failed job card.

Common errors that occur during the launch of a job include:

Dataflow staging location is not writeable

Dataflow cannot read from and write to different regions

Insufficient workers for Dataflow, please check your quota

 

TD-24273: Circular reference in schema of Avro file causes job in Cloud Dataflow to fail.

 

TD-23635: Read-only BigQuery databases are listed as publishing destinations. Publish fails.

 

Fixes

TD-26177: Cloud Dataflow job fails for large avro files.

 Avro datasets that were imported before this release may still have failures during job execution on Cloud Dataflow. To fix these failures, you must re-import the dataset.

 

TD-25438: Deleting an upstream reference node does not propagate results correctly to the Transformer page.

 

TD-25419: When a pivot transform is applied, some column histograms may not be updated.

 

TD-23787: When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.

 

TD-22467: Last active sample is not displayed during preview of multi-dataset operations.

 

TD-22128: Cannot read multi-file Avro stream if data is greater than 500 KB.

 

TD-19865: You cannot configure a publishing location to be a directory that does not already exist. See Run Job Page.

 

TD-17657: splitrows transform allows splitting even if required parameter on is set to an empty value.

 

TD-24464: 'Python Error' when opening recipe with large number of columns and a nest

 

TD-24322: Nest transform creates a map with duplication keys.

 

TD-23920 : Support for equals sign (=) in output path.

 

TD-23646: Adding a specific comment appears to invalidate earlier edit.

 

TD-23111: Long latency when loading complex flow views

 

TD-23099: View Results button is missing on Job Cards even with profiling enabled

 

TD-22889: Extremely slow UI performance for some actions

 

 

 


This page has no comments.