AWS Glue Connections
This section describes how to create a connection to your AWS Glue Data Catalog.
Supported Environments:
Operation | Designer Cloud Powered by Trifacta Enterprise Edition | Amazon | Microsoft Azure |
---|---|---|---|
Read | Supported | Supported | Not supported |
Write | Not supported | Not supported | Not supported |
Prerequisites
Before you create a connection, you must enable Designer Cloud Powered by Trifacta Enterprise Edition to access AWS Glue. For more information, see AWS Glue Access.
Limitations
For more information, see "Supported Deployments" in AWS Glue Access.
Create Connection
You can create one or more connections to databases in your AWS Glue deployment.
Create through application
Any user can create an AWS Glue connection through the application.
Steps:
Login to the application.
In the menu, click User menu > Preferences > Connections.
In the Create Connection page, click the AWS Glue connection card.
Specify the properties for your AWS Glue connection. The following parameters are specific to AWS Glue connections:
Property | Description |
---|---|
EMR Master Node DNS | This DNS value can be retrieved from the EMR console. |
Port | The port number through which to connect to the DNS master node |
Connection String Options | No values are required here. Additional information is provided below. |
For more information, see Create Connection Window.
Connection URL
The properties that you provide are inserted into the following URL, which connects Designer Cloud Powered by Trifacta Enterprise Edition to the connection:
jdbc:hive2://<host>:<port>/<database><connect-string-options>
where:
<database>
= name of the default database to which to connect. This value can be empty.
The connect string options are optional. If you are passing additional properties and values to complete the connection, the connect string options must be structured in the following manner:
;<prop1>=<val1>;<prop2>=<val2>...
where:
<prop>
: the name of the property<val>
: the value for the property
Delimiters:
;
: any set of connect string options must begin with a semi-colon.;
: sets of connect string options must separated by a semi-colon.=
: property names and values must be separated with an equal sign (=
).
Examples:
Designer Cloud Powered by Trifacta Enterprise Edition may insert additional authentication properties as part of the connect string options.
Driver Information
This connection uses the following driver:
Driver name:
org.apache.hive.jdbc.HiveDriver
Driver version: The driver depends on the version of EMR that is in use.
Driver documentation: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
Troubleshooting
For more information, see https://docs.aws.amazon.com/glue/latest/dg/troubleshooting-connection.html.
Use
After the integration has been made between the platform and AWS Glue, you can import datasets.
Import using custom SQL queries. For more information, see Create Dataset with SQL.
Testing
Import a dataset from AWS Glue. Add it to a flow, and run a job. Verify the results.For more information, see Verify Operations.
Using AWS Glue Connections
Enable
For more information, seeAWS Glue Access.
Uses of Glue
The Designer Cloud Powered by Trifacta platform can use Glue for the following tasks:
Create datasets by reading from Glue tables.
Before you begin using Glue
Read Access: Your Glue administrator must configure read permissions to Glue databases.
Write Access: Not supported.
Secure Access
For more information, see Configure for AWS.
Reading partitioned data
The Designer Cloud Powered by Trifacta platform can read in partitioned tables. However, it cannot read individual partitions of partitioned tables.
Tip
If you are reading data from a partitioned table, one of your early recipe steps in the Transformer page should filter out the unneeded table data so that you are reading only the records of the individual partition.
Storing data in Glue
Users should know where shared data is located and where personal data can be saved without interfering with or confusing other users.
Note
The Designer Cloud Powered by Trifacta platform does not modify source data in Glue. Datasets sourced from Glue are read without modification from their source locations.
Reading from Glue
You can create a Alteryx dataset from a table or view stored in Glue. For more information, see Database Browser.
Notes on reading from views using custom SQL
If you have enabled custom SQL and are reading data from a view, nested functions are written to a temporary filename, unless they are explicitly aliased.
Tip
If your custom SQL uses nested functions, you should create an explicit alias from the results. Otherwise, the job is likely to fail.
Problematic Example:
SELECT UPPER(`t1`.`colum1`), TRIM(`t1`.`column2`),...
When these are read from a Glue view, the temporary column names are: _c0
, _c1
, etc. During job execution, Spark ignores the column1
and column2
reference.
Improved Example:
SELECT UPPER(`t1`.`column1`) as col1, TRIM(`t1`.`column2`) as col2,...
In this improved example, the two Glue view columns are aliased to the explicit column names, which are correctly interpreted and used by the Spark running environment during job execution.
Writing to Glue
Not supported.
SQL Syntax
The following syntax requirements apply to this connection.
Object delimiter: backtick
Example syntax:
SELECT `column1`,`column2` FROM `databaseName`.`tableName`;
For more information on SQL in general, see Supported SQL Syntax.
Reference
Supported Versions: n/a
Supported Environments:
Note
S3 must be set as the base storage layer, and the platform must be integrated with EMR. See Set Base Storage Layer.
Operation | Designer Cloud Powered by Trifacta Enterprise Edition | Amazon | Microsoft Azure |
---|---|---|---|
Read | Supported | Supported | Not supported |
Write | Not supported | Not supported | Not supported |