Contents:
As needed, you can insert custom SQL statements as part of the data import process. These custom SQL statements allow you to pre-filter the rows and columns of relational source data, including Hive, within the database, where performance is faster. This query method can also be used for wider operations on relational sources from within Designer Cloud Enterprise Edition.
- This feature is enabled by default. For more information on disabling, see Enable Custom SQL Query.
Limitations
General
All queries are blindly executed. It is your responsibility to ensure that they are appropriate. Queries like DELETE
and DROP
can destroy data in the database. Please use caution.
NOTE: Column names in custom SQL statements are case-sensitive. Case mismatches between SQL statement and your datasource can cause jobs to fail.
- SQL statements are stored as part of the query instance for the object. If the same query is being made across multiple users using private connections, the SQL must be shared and entered by individual users.
- SQL statements must be valid for the syntax of the target relational system. Syntax examples are provided below.
- If you modify the custom SQL statement when reading from a source, all samples generated based on the previous SQL are invalidated.
Declared variables are not supported.
- When using custom SQL to read from a Hive view, the results of a nested function are saved to a temporary name, unless explicitly aliased.
- If aliases are not used, the temporary column names can cause jobs to fail, on Spark in particular.
- For more information, see Using Hive.
Single Statement
The following limitations apply to creating datasets from a single statement.
Selecting columns with the same name, even with
"*"
, is not supported and generates an ambiguous column name error.Tip: You should use fully qualified column names or proper aliasing. See Column Aliasing below.
Users are encouraged to provide fully qualified path to table being used. Example:
SELECT "id", "value" FROM "public"."my_table"
- You should use proper escaping in SQL.
- Vendor specific limitations:
- SQL Server:
- Due to SQL Server driver issues,
sql_variant
data types cannot be selected from a table. - If you have table with
sql_variant
, please build select statements without those columns.
- Due to SQL Server driver issues,
- SQL Server:
Multi-Statement
These limitations apply to creating datasets using a sequence of multiple SQL statements.
NOTE: Use of multiple SQL statements must be enabled. See Enable Custom SQL Query.
Repeatable: When using multi-statements, you must verify that the statements are repeatable without failure. These statements are run multiple times during validation, datasets creation, data preview, and opening the dataset in the Transformer page.
NOTE: To ensure repeatability, any creation or deletion of data in the database must occur before the final required SELECT statement.
Line Termination: Each query must terminate with a semi-colon and a new line.
Validation: All statements are run immediately when validating or creating dataset.
NOTE: No DROP or DELETE checking is done prior to statement execution. Statements are the responsibility of the user.
- SELECT requirement: In a multi-statement execution, the last statement must be a SELECT statement.
- Database transactions: All statements are run in a transaction. DDL statements in most dialects (vendors) can't be run within a transaction and might be automatically committed by the driver.
Use
To use, please complete the following steps.
Steps:
- In the Datasets page, click Import Data.
- In the Import Data page, select a relational connection or Hive connection.
- Hive and relational connections must be enabled and created.
- Within your source, locate the table from which you wish to import. Do not select the table.
Click the Preview icon to review the columns in the dataset.
Tip: You may wish to copy the database, table name, and column names to a text editor to facilitate generating your SQL statement.
Click Create Dataset with SQL. Enter or paste your SQL statement.
Through the custom SQL interface, it is possible to enter SQL statements that can delete data, change table schemas, or otherwise corrupt the targeted database. Please use this feature with caution.
Figure: Create Dataset with SQL dialog
See Examples below.
To test the SQL, click Validate SQL. For details, see below.
To apply the SQL to the import process, click Create Dataset.
The customized source is added to the right panel. To re-edit, click Custom SQL.
Complete the other steps to define your imported dataset.
When the data is imported, it is altered or filtered based on your SQL statement.
- After dataset creation, you can modify the SQL, if needed. See Dataset Details Page.
Create with Variables
If parameterization has been enabled, you can specify variables as part of your SQL statement. Suppose you had table names like the following:
publish_create_all_types_97912510 publish_create_all_types_97944183 publish_create_all_types_14202824
You can insert an inline variable as part of your custom SQL to capture all of these variations.
Figure: Insert variables in your custom SQL
Through the Run Job page, you can specify overrides for the default value, so the same job definition can be used across all matching tables without much modification. For more information, see Run Job Page.
For more information on this feature, see Overview of Parameterization.
SQL Validation
You cannot create a SQL-based dataset if any of your SQL statements do not pass validation. Errors must be corrected in the SQL or in the underlying database.
- All
SELECT
statements are planned, which includes syntactical validation. However, these statements are not executed. Validation should be a matter of a few seconds. For multi-line statements, all non-
SELECT
statements are planned and executed. The finalSELECT
statement is only planned.NOTE: For multi-line SQL statements, validation may take longer to complete if the non-
SELECT
statements require significant time to execute.
NOTE: Values for seconds in a SQL timestamp parameter are not supported. The finest supported granularity is at the minutes level.
Examples
Here are some basic SQL examples to get started.
Basic Syntax
Your SQL statements must be valid for the syntax expected by the target relational system. In particular, object delimiters may vary between systems.
NOTE: The proper syntax depends on your database system. Please consult the documentation for your product for details.
Tip: Although some relational systems do not require object delimiters around column names, it is recommended that you add them to all applicable objects.
Tip: Avoid using column type identifiers (e.g. int
) and other SQL keywords as object names. Some systems may generate invalid SQL errors.
Relational System | Object Delimiter | Example Syntax |
---|---|---|
Hive | backtick | SELECT `column1`,`column2` FROM `databaseName`.`tableName` |
Oracle | double-quote | Double quotes required around database and table names and not required around column names. SELECT "column1","column2" FROM "databaseName"."tableName" |
SQL Server | none | SELECT "column1","column2" FROM "databaseName"."tableName"
|
Postgres | double-quote | Double quotes required around database, table names, and column names. SELECT "column1","column2" FROM "databaseName"."tableName" |
Teradata | double-quote | Double quotes required around database and table names and not required around column names. SELECT "column1","column2" FROM "databaseName"."tableName" |
NOTE: In the following sections, Oracle syntax is used in the examples. Please modify the examples for your target system.
Column Aliasing
If your select statement results in multiple columns with same name, the query fails to validate or fails on execution, such as selecting all columns in a JOIN. In these cases, columns must be properly aliased.
NOTE: This error will be caught either during validating or during dataset import.
For example, in the following JOIN, the EMPLOYEE
and DEPARTMENT
tables have column names department_id
and department_id
.
SELECT * FROM EMPLOYEE INNER JOIN DEPARTMENT ON (department_id = department_id)
The above query generates an error. Columns must be properly aliased, as in the following:
SELECT e.id, e.department_id, e.first_name, e.last_name, d.department_name FROM EMPLOYEE AS E INNER JOIN DEPARTMENT d ON (e.department_id = d.department_id)
Collect Whole Table
SELECT * FROM "DB1"."table2"
Filter Columns
SELECT lastName,firstName FROM "DB1"."table2
Filter Rows
SELECT lastName,firstName FROM "DB1"."table2" WHERE invoiceAmt > 10000
Multi-line statement
The following example uses a multi-line SQL sequence:
NOTE: Multi-line SQL support is considered an advanced use case. This feature must be enabled.
The following example inserts values in the TABLE_INVENTORY
table and then queries the table. It utilizes Oracle syntax:
INSERT INTO "SALES"."TABLE_INVENTORY" ("ID", "AVAILABILITY") VALUES (1, 10); SELECT * FROM "SALES"."TABLE_INVENTORY"
This page has no comments.