Page tree

Release 8.2.1



Contents:

   

Contents:


This section describes how you interact through the Trifacta® platform with your Azure or AWS Databricks Tables data.

NOTE: Use of Databricks Tables requires installation on Azure or AWS, integration with Databricks, and a Databricks Tables connection.

Uses of Databricks Tables

The Trifacta platform can use Databricks Tables for the following tasks:

  1. Create datasets by reading from Databricks Tables tables.
  2. Write data to Databricks Tables.
Table TypeSupportNotes
Databricks managed tablesRead/Write
Delta tablesRead/Write

NOTE: Versioning and rollback of Delta tables is not supported within the Trifacta platform. The latest version is always used. You must use external tools to manage versioning and rollback.

External tables

Read/Write

NOTE: When writing to an external table the TRUNCATE and DROP publishing actions are not supported.

Databricks unmanaged tablesRead/Write

Delta Tables (managed and unmanaged tables)

Read/Write
Partitioned tablesRead

The underlying format for Databricks Tables is Parquet.

Limitations

  • Access to external Hive metastores is not supported.
  • Ad-hoc publishing to Databricks Tables is not supported.

Before You Begin Using Databricks Tables

  • Databricks Tables deployment: Your Trifacta administrator must enable use of Databricks Tables. See Create Databricks Tables Connections

  • Databricks Personal Access Token: You must acquire and save a Databricks Personal Access Token into your . For more information, see Databricks Settings Page.

Storing Data in Databricks Tables

NOTE: The Trifacta platform does not modify source data in Databricks Tables. Datasets sourced from Databricks Tables are read without modification from their source locations.

Reading from Databricks Tables

You can create a Trifacta dataset from a table or view stored in Databricks Tables.

NOTE: Custom SQL queries are supported. Multi-statement custom SQL is not supported for Databricks Tables. Custom SQL queries must be a single SELECT statement. For more information, see Create Dataset with SQL.

For more information on how data types are imported from Databricks Tables, see Databricks Tables Data Type Conversions.

Writing to Databricks Tables

You can write data back to Databricks Tables using one of the following methods:

  • Job results can be written directly to Databricks Tables as part of the normal job execution. 
    • Data is written as a managed table to DBFS in Parquet format.
    • Create a new publishing action to write to Databricks Tables. See Run Job Page.
  • For more information on how data is converted to Databricks Tables, see Databricks Tables Data Type Conversions.

Ad-hoc Publishing to Databricks Tables

Not supported.

This page has no comments.