Page tree

Trifacta Dataprep



Contents:

If you licensed Dataprep by Trifacta before Oct. 14, 2020, you are using the Dataprep by Trifacta Legacy product edition. On October 14, 2022, this product edition will be decommissioned by Google and will be no longer available for use. Current customers of this product edition are encouraged to transition to one of the product editions hosted by Trifacta. See Product Editions.

   

Contents:


This section contains an archive of release notes for previous releases of  Dataprep by Trifacta®.

For the latest release notes, see Release Notes for Dataprep by Trifacta.

July 20, 2021

Release 8.5

What's New

Tip: When you complete your Dataprep by Trifacta Enterprise Edition or Dataprep by Trifacta Professional Edition trial, you can choose to license a higher or lower tier product edition. For more information, see Product Editions.

Parameterization:

  • Create environment parameters to ensure that all users of the project or workspace use consistent references.

    NOTE: You must be a workspace administrator or project owner to create environment parameters.

    Tip: Environment parameters can be exported from one project or workspace and imported into another, so that these references are consistent across the enterprise.

  • Parameterize names of your storage buckets using environment parameters.

Schedules:

  • Project owners and workspace administrators can review, enable, disable, and delete schedules through the application.

    Feature Availability: This feature is not available in
    Dataprep by Trifacta Starter Edition only.

    See Schedules Page.

Flow View:

Job execution:

  • Define SQL scripts to execute before data ingestion or after publication for file-based or table-based jobs.

Resource usage:

  • Review the total vCPU hours consumed by job execution within your project across an arbitrary time period.

Connectivity:

Contribute to the future direction of connectivity: Click I'm interested on a connection card to upvote adding the connection type to the Trifacta application. See Create Connection Window.

  • Early Preview (read-only) connections available with this release:

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium

  • Apache Impala

Connectivity:

  • Connect to your relational database systems hosted on Cloud SQL. In the Connections page, click the Cloud SQL card for your connection type.
    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta® Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium


    For more information, see Create Connection Window.

Connectivity:

API:

  • Cancel in-progress Dataflow jobs via API.

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium
    • Dataprep by Trifacta Standard

    See Changes to the APIs.

Job execution:

You can choose to ignore the recipe errors before job execution and then review any errors in the recipe through the Job Details page.

Language:

  • NUMVALUE function can be used to convert a String value formatted as a number into an Integer or Decimal value.
  • NUMFORMAT function now supports configurable grouping and decimal separators for localizing numeric values.
  • For more information, see Changes to the Language.

Performance:

  • Improved performance when browsing folders containing a large number of files on  Cloud Storage

Resource usage:

  • Review the total vCPU hours consumed by your datasets, recipes, and job execution within your project across an arbitrary time period. 

Changes

None.

Deprecated

None.

Known Issues

None.

Fixes

  • TD-62190: You may not be able to view the SQL that was used to execute a job within BigQuery. This issue is due to a regression in the new BigQuery console in which job identifiers containing dashes are not supported. A ticket has been filed with Google.

June 7, 2021

Release 8.4

What's New

Template Gallery:

  • Check out the new gallery of flow templates, which can be imported into your workspace. These templates are pre-configured to solve the most compelling loading and transformation use cases in the product. For more information, see www.trifacta.com/templates.
    • For more information on importing flows into your workspace, see Import Flow.
    • For more information on using a template in the product, see Start with a Template

Connectivity:

  • Early Preview (read-only) connections available with this release:

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium

  • Splunk
  • YouTube Analytics

Collaboration:


Support for delete actions on merge (upsert) operations in BigQuery:

When publishing to a BigQuery table, you can choose to update or, with this release, to delete matching records during a merge option. For more information, see BigQuery Table Settings.

Job execution:

You can choose to ignore the recipe errors before job execution and then review any errors in the recipe through the Job Details page.

Language:

Changes

Trifacta Photon limits on execution time

Trifacta Photon is an in-memory running environment that is hosted on the same node as  Dataprep by Trifacta, which allows for faster execution suitable for small- to medium-sized jobs.

Feature Availability: This feature is not available in
Dataprep by Trifacta Legacy only.

NOTE: Jobs that are executed on Trifacta Photon may be limited to run for a maximum of 10 minutes, after which they fail with a timeout error. If your job fails due to this limit, please switch to running the job on Dataflow.

Trifacta Photon can be enabled or disabled by a project administrator. For more information, see Dataprep Project Settings Page.

Execution of scheduled jobs on Trifacta Photon is not supported

In conjunction with the previous change, execution of scheduled jobs is not supported on Trifacta Photon. Since Trifacta Photon jobs are now limited to 10 minutes of execution time, scheduled jobs have been automatically migrated to execution on Dataflow to provide better execution success. For more information, see Trifacta Photon Running Environment.

Deprecated

None.

Known Issues

  • TD-62190: You may not be able to view the SQL that was used to execute a job within BigQuery. This issue is due to a regression in the new BigQuery console in which job identifiers containing dashes are not supported. A ticket has been filed with Google.

Fixes

  • TD-60881:  Incorrect file path and missing file extension in the application for parameterized outputs
  • TD-60382: Date format M/d/yy is handled differently by PARSEDATE function on Trifacta Photon and Spark.

May 20, 2021

Release 8.3 - push 3

What's New

Connectivity:

  • Support for SFTP connections.

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium


    NOTE: This connection type is import only.

    For more information, see SFTP Connections.

Changes

Trifacta Photon enabled by default

Trifacta Photon is an in-memory running environment that is hosted on the same node as  Dataprep by Trifacta, which allows for faster execution suitable for small- to medium-sized jobs.

Feature Availability: This feature is not available in
Dataprep by Trifacta Legacy only.

NOTE: Jobs executed in Trifacta Photon are executed within the Trifacta VPC. Data is temporarily streamed to the Trifacta VPC during job execution and is not persisted.

Beginning in this release, Trifacta Photon is enabled by default. Users can choose to run jobs on Trifacta Photon.

NOTE: For Dataprep by Trifacta Enterprise Edition, Trifacta Photon is enabled by default for new projects. For existing projects, a project administrator must still choose to enable it.

Trifacta Photon can be enabled or disabled by a project administrator. For more information, see Dataprep Project Settings Page.

Deprecated

None.

Known Issues

None.

Fixes

None.

May 10, 2021

Release 8.3

What's New

Running Environments:

Cancel Jobs in Dataflow:

You can cancel  Dataflow jobs directly from the product.

NOTE: In some cases, the product is unable to cancel the job from the application. In these cases, click View in Dataflow Job and from there you can cancel the job in progress .

Support for merge (upsert) operations in BigQuery:

When publishing to a BigQuery table, you can choose to write results using the merge option. When selected, you specify a primary key of fields and then decide how data is merged into the table. For more information, see BigQuery Table Settings.

Connectivity:

  • Early Preview (read-only) connections available with this release:

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium

  • Authorize.net
  • Cockroach DB
  • DB2
  • Google Data Catalog
  • Google Spanner
  • Magento
  • Redis
  • Shopify
  • Smartsheet
  • Trello
  • QuickBase

Job execution:

Introducing new filter pushdowns to optimize the performance of your flows during job execution. For more information, see Flow Optimization Settings Dialog.

Job results:

You can now preview job results and download them from the Overview tab of the Job details page. For more information, see Job Details Page.

Tip: You can also preview job results in Flow View. See View for Outputs.

Changes

Improved method of JSON import

Beginning in this release, the Trifacta application now uses the conversion service to ingest JSON files during import. This improved method of ingestion can save significant time wrangling JSON into records.

NOTE: The new method of JSON import is enabled by default but can be disabled as needed.

For more information, see Working with JSON v2.

Flows that use imported datasets created using the old method continue to work without modification.

NOTE: It is likely that support for the v1 version of JSON import is deprecated in a future release. You should switch to using the new version as soon as possible. For more information on migrating your flows and datasets to use the new version, see Working with JSON v1.

Future work on support for JSON is targeted for the v2 version only.

Optionally, you can re-enable the old version, which is useful for migrating to the new version.

Feature Availability: This feature is not available in
Dataprep by Trifacta Legacy only.

For more information on using the old version and migrating to the new version, see Working with JSON v1.

Deprecated

None.

Known Issues

  • TD-61478: Time-based data types are imported as String type from BigQuery sources when type inference is disabled.

Fixes

  • TD-60701: Most non-ASCII characters incorrectly represented in visual profile downloaded in PDF format.
  • TD-59854: Datetime column from Parquet file incorrectly inferred to the wrong data type on import.

April 26, 2021

Release 8.2 push2

What's New

Upgrade: Trial customers can upgrade through the Admin console. See Admin Console.

This is the initial release of for the following product tiers:

  • Dataprep by Trifacta Enterprise Edition
  • Dataprep by Trifacta Professional Edition
  • Dataprep by Trifacta Starter Edition

Changes

None.

Deprecated

None.

Known Issues

None.

Fixes

None.

April 14, 2021

Release 8.2

This is the initial release of for the following product tiers:

  • Dataprep by Trifacta Enterprise Edition
  • Dataprep by Trifacta Professional Edition
  • Dataprep by Trifacta Starter Edition

What's New

Photon:

Introducing Trifacta Photon, an in-memory running environment for running jobs. Embedded in the  Dataprep by TrifactaTrifacta Photon delivers improved performance in job execution and is best-suited for small- to medium-sized jobs.

Feature Availability: This feature is not available in
Dataprep by Trifacta Legacy only.

NOTE: Trifacta Photon must be enabled by a project owner. For more information, see Dataprep Project Settings Page.

  • When you choose to run a job, you can now choose to run a job on Trifacta Photon.
  • For more information, see Run Job Page .

Quick scan sampling:

  • Trifacta Photon also enables quick scan sampling. A quick scan sample generates an appropriate selection of rows from the dataset from which the sample was initiated. These samples are faster to generate. For more information, see Overview of Sampling.
  • For more information on generating samples, see Samples Panel.

Preferences:

  • Re-organized user account, preferences, and storage settings to streamline the setup process. See Preferences Page.

Connectivity:

  • Early Preview (read-only) connections available with this release:

    Feature Availability: This feature is available in the following editions:

    • Dataprep by Trifacta Enterprise Edition
    • Dataprep by Trifacta Professional Edition
    • Dataprep by Trifacta Premium

Plan metadata references:

Feature Availability: This feature is available in the following editions:

  • Dataprep by Trifacta Enterprise Edition
  • Dataprep by Trifacta Professional Edition
  • Dataprep by Trifacta Premium

Use metadata values from other tasks and from the plan itself in your HTTP task definitions.


Improved accessibility of job results:

The Jobs tabs have been enhanced to display the list of latest and the previous jobs that have been executed for the selected output.

For more information, see View for Outputs.

Sample Jobs Page:

You can monitor the status of all sample jobs that you have generated. Project administrators can access all sample jobs in the workspace. For more information, see Sample Jobs Page.

Simplified output and destination experience:

From the Home Page, you can quickly redesign your output and destination experience. The step-by-step procedures enables you to create an improved and streamlined output creation experience. For more information, see Start with a Template.

Changes

Improved methods for disabling the product:

Project owners can choose to disable  Dataprep by Trifacta from within the product. For more information, see Enable or Disable Dataprep.

After the product has been disabled in a project, Trifacta data is placed in a hidden state for later purging. For more information on purging or restoring data, see Wipe Out Dataprep Data.

API:

The following API endpoints are scheduled for deprecation in a future release:

NOTE: Please avoid using the following endpoints.

/v4/connections/vendors
/v4/connections/credentialTypes
/v4/connections/:id/publish/info
/v4/connections/:id/import/info

These endpoints have little value for public use.

Deprecated

None.

Known Issues

  • TD-60701: Most non-ASCII characters incorrectly represented in visual profile downloaded in PDF format.

Fixes

  • TD-59236:  Use of percent sign (%) in file names causes Transformer page to crash during preview.
  • TD-59218:  BOM characters at the beginning of a file causing multiple headers to appear in Transformer Page.


March 16, 2021

Release 8.1

What's New

Connectivity:

  • Introducing Early Preview connections. In each release of cloud-based product editions, new connection types may be made available in read-only mode for users to begin exploring their datasets stored in the connected datastores.

    NOTE: Early Preview connection types are read-only and are subject to change before they may be made generally available.

    Feature Availability: This feature is available in
    Dataprep by Trifacta Premium only.

  • Early Preview connections available with this release:
    • Airtable
    • Cassandra
    • Freshdesk
    • Google Analytics
    • MailChimp

Specify column headers during import:

You can specify the column headers for your dataset during import. For more information, see Import Data Page.

Sample Jobs Page:

You can monitor the status of all sample jobs that you have generated. Project administrators can access all sample jobs in the workspace. For more information, see Sample Jobs Page.

Job results:

Results of data quality checks are now part of the visual profile PDF available with your job results. In the PDF, you can download the data quality results over the entire dataset .

Feature Availability: This feature is available in
Dataprep by Trifacta Premium only.

  • Visual profiling must be enabled for the job.
  • For more information, see Job Details Page.

Sharing:

  • Define permissions on individual objects when they are shared.

    NOTE: Fine-grained sharing permissions apply to flows and connections only.

    For more information, see Changes to User Management.

API:

  • You can now transfer ownership of assets created in Dataprep by Trifacta between users, based on their user identifiers or email addresses. For more information, see Changes to the APIs.
  • Customize connection types (connectors) to ensure consistency across all connections of the same type and to meet your enterprise requirements. For more information, see Changes to the APIs.

Macro updates:

You can replace an existing macro definition with a macro that you have exported to your local desktop.

NOTE: Before you replace the existing macro, you must export a macro to your local desktop. For more information, see Export Macro.

For more information, see Macros Page.

Changes

Freed IP address ranges:

The following IP address range is the only one in use by the Trifacta Service:

34.68.114.64/28

Please discontinue whitelisting any other IP address ranges for the Trifacta Service.

These ranges have been freed to the general Internet.

Changes to Preferences:

The Preferences area of the Trifacta application has been changed. For more information, see Changes to Configuration.

Deprecated

None.

Known Issues

  • TD-58523: Cannot import dataset with filename in Korean alphabet from HDFS.

    • Workaround: You can upload files with Korean characters from your desktop. You can also add a 1 to the end of the file on HDFS, and it can then be imported.

  • TD-55299: Imported datasets with encodings other than UTF-8 and line delimiters other than \n may generate empty outputs on Spark or Dataflow running environments.

  • TD-51516: Input data containing BOM (byte order mark) characters may cause Spark or Dataflow running environments to read data improperly and/or generate invalid results.

Fixes

  • TD-56170: The Test Connection button for some relational connection types does not perform a test authentication of user credentials.
  • TD-54440: Header sizes at intermediate nodes for JDBC queries cannot be larger than 16K.
    • Previously, the column names for JDBC data sources were passed as part of a header in a GET request. For very wide datasets, these GET requests often exceeded 16K in size, which represented a security risk.

February 16, 2021

Release 8.0

Features

Tip: Add a profile picture to your account! For more information, see User Profile Page.

Flow templates:

Introducing flow templates, which are predefined flows with guidelines for creating the flow objects needed to solve a specific transformation and publication use case. These step-by-step guides leverage placeholders for flow objects to assist you in rapidly assembling your end-to-end flow pipeline.

The first available template simplifies the Data Warehouse Onboarding process, which simplifies the ingestion of datasets, transformation of them, and loading them into your data warehouse. From the Home page, you can quickly set up a pipeline from data lakes into data warehouses:

  • GCS  to BigQuery pipeline: Use this template to create a flow by importing a Google Cloud Storage, transforming the data, and publishing the outputs on the BigQuery. For more information, see Start with a Template.

Authorization:

APIs:

  • Individual workspace users can be permitted to create and use their own access tokens for use with the REST APIs. For more information, see Dataprep Project Settings Page.

Connectivity:

  • Support for connections to SharePoint Lists. See SharePoint Connections.
  • Support for using OAuth2 authentication for Salesforce connections. See Salesforce Connections.

  • Support for re-authenticating through connections that were first authenticated using OAuth2.

Import:

  • Improved method for conversion and ingestion of XLS/XSLX files. For more information, see Import Excel Data.

Recipe development:

  • The Flag for Review feature enables you to set review checkpoints in your recipes. You can flag recipe steps for review by other collaborators for review and approval. For more information, see Flag for Review.

Metric-based data quality rules:

Update Macros:

  • Replace / overwrite an existing macro's steps and inputs with a newly created macro.
  • Map new macro parameters to the existing parameters before replacing.
  • Edit macro input names and default values as needed. 

Job execution:

  • You can enable the Trifacta application to apply SQL filter pushdowns to your relational datasources to remove unused rows before their data is imported for a job execution. This optimization can significantly improve performance as less data is transferred during the job run. For more information, see Flow Optimization Settings Dialog.
  • Optimizations that were applied during the job run now appear in the Job Details Page. See Job Details Page.

Changes

None.

Deprecated

None.

Known Issues

  • TD-56830: Receive malformed_query: enter a filter criterion when importing table from Salesforce.

    • Feature Availability: This feature is available in Dataprep by Trifacta Premium only.
    • NOTE: Some Salesforce tables require mandatory filters when they are queried. Mandatory filters are not currently supported for Salesforce connections


  • TD-56170: The Test Connection button for some relational connection types does not perform a test authentication of user credentials.

    • Workaround: Append the following to your Connect String Options:

      ;ConnectOnOpen=true
    • This option forces the connection to validate user credentials as part of the connection. There may be a performance penalty when this option is used.

Fixes

None.

This page has no comments.