Databricks Connections
Create a connection to your Databricks workspace from Alteryx Analytics Cloud (AAC). Databricks offers a unified data analytics platform for data scientists, engineers, and business analysts. It provides a multi-cloud, managed offering for data processing, machine learning, and data governance.
This page provides high-level information on how to configure AACAAC to integrate with Databricks hosted on AWS or Azure. Databricks is a unified data analytics platform that is optimized for use on the AWS or Azure infrastructure.
For documentation on Databricks, go to the Databricks documentation.
Prerequisites
For AWS Databricks, configure your base storage environment to S3 and disable ADS before setting up Databricks workspaces. Go to AWS S3 as Private Data Storage to learn more.
For Azure Databricks, configure your base storage environment to ADLS and disable ADS before setting up Databricks workspaces. Go to ADLS as Private Data Storage to learn more.
Have a Databricks E2 workspace.
Configure 1 of these authentication methods...
Obtain a Personal Access Token (PAT) from Databricks. Learn how to generate your PAT.
Have an admin set up OAuth 2.0 for Databricks.
Have an admin configure the Databricks workspace in AACAAC. Learn how to configure a Databricks workspace.
Connect AACAAC to your Databricks workspace. Learn how to connect your Databricks workspace.
Limitations
General
AACAAC supports managed tables and external delta tables from Databricks. For external delta tables, AACAAConly supports append and truncate actions.
You can't query materialized views with job clusters.
Due to a Databricks limitation, only use date values after October 1, 1582. Date values before October 1, 1582 result in an error.
SQL Pushdown
Typecasting issues during integer division expression. The output becomes
Float
even if you chooseInteger
for the output column data type. For example, the output of the expression[total_weight]/[quantity]
, where both columns areInteger
data types, becomes aFloat
data type.Precision error with float data types. For example, the expression
[weight]-1 => 1.099999
in Databricks SQL where{int weight=2.1}
. The same expression using Photon as the engine results in1.1
.There is no specific way to order rows except when you specify the
order_by
clause. This affects the Zeilen-ID-Tool and Zusammenführung-Tool.For JSON/MAP data in inputs - the SQL pushdown might not work. DatabricksSpark would be used as fallback.
DateTime output ignores the millisecond values of the input DateTime value. For example,
2021-12-31 23:59:59.999 → 2021-12-31 23:59:59.000
. Import jobs also ignore millisecond values when creating a dataset.The
substring
function might give incorrect results when the value position and length are very large.Precision issues with
Double
andFloat
data types. For example:select 687399551400673279 + 5.35 returns 687399551400673284.35 select 687399551400673279 + double(5.35) returns 687399551400673300 select double(687399551400673279) + double(5.35) returns 687399551400673300
Anmerkung
Float
is a base-2 numeric type. When given a literal which is base-10 the representation might not be exact.Double
is a base-2 numeric type. When given a literal which is base-10 the representation might not be exact.Use
Decimal
data types to accurately represent fractional or large base-10 numbers.The
exp
function returns infinity in cases with very largeFloat
values.The
rpad
function produces empty results as opposed to null. For example:SELECT rpad('hi', 500000000000000000000000000000000, 'ab'); returns ""
Create and Configure a Databricks Connection
On the AACAAC landing page, select Connections from the global header.
Select Create Connection at the top of the page.
Search for
Databricks
and then select the Databricks connection.Enter a unique Connection name and an optional Connection description.
Select an available Databricks workspace under Workspace Name.
If your admin configured Databricks with Personal Access Token authentication, the Personal Access Token for the selected workspace automatically populates.
If your admin configured Databricks with OAuth 2.0 authentication, select the OAuth 2.0 client for your Databricks workspace.
(Optional) Enter Connect String Options. For details on this option, go to the later section on this page.
Select Test Connection to check if your workspace connects correctly.
Select Create.
You can now access your Databricks tables from the Data page.
Connect String Options
The connect string options are optional. If you are passing additional properties and values to complete the connection, you must structure the connect string options in this way:
;<prop1>=<val1>;<prop2>=<val2>...
Where:
<prop>
: The name of the property.<val>
: The value for the property.
Delimiters:
;
: Any set of connect string options must begin and end with a semi-colon.You can omit a semicolon from the end of the connect string options.
=
: You must separate property names and values with an equal sign (=).