...
Info |
---|
NOTE: Column names in custom SQL statements are case-sensitive. Case mismatches between SQL statement and your datasource can cause jobs to fail. |
- SQL statements are stored as part of the query instance for the object. If the same query is being made across multiple users using private connections, the SQL must be shared and entered by individual users.
- SQL statements must be valid for the syntax of the target relational system. Syntax examples are provided below.
- If you modify the custom SQL statement when reading from a source, all samples generated based on the previous SQL are invalidated.
Declared variables are not supported.
- When using custom SQL to read from a Hive view, the results of a nested function are saved to a temporary name, unless explicitly aliased.
- If aliases are not used, the temporary column names can cause jobs to fail, on Spark in particular.
- For more information, see Using Hive.
- If you are using custom SQL to query an AWS Glue metadata store, you cannot apply the
LIMIT
keyword. For more information, see Enable AWS Glue Access.
...
The following limitations apply to creating datasets from a single statement.
All single-statement SQL queries must begin with a
SELECT
keyword.Selecting columns with the same name, even with
"*"
, is not supported and generates an ambiguous column name error.Tip Tip: You should use fully qualified column names or proper aliasing. See Column Aliasing below.
Users are encouraged to provide fully qualified path to table being used. Example:
Code Block SELECT "id", "value" FROM "public"."my_table"
- You should use proper escaping in SQL.
...
Info |
---|
NOTE: Use of multiple SQL statements must be enabled. See Enable Custom SQL Query. |
...
Repeatable: When using multi-statements, you must verify that the statements are repeatable without failure. These statements are run multiple times during validation, datasets creation, data preview, and opening the dataset in the Transformer page.
Info NOTE: To ensure repeatability, any creation or deletion of data in the database must occur before the final required SELECT statement.
Line Termination: Each query must terminate with a semi-colon and a new line.
Validation: All statements are run immediately when validating or creating dataset.
Info NOTE: No DROP or DELETE checking is done prior to statement execution. Statements are the responsibility of the user.
- SELECT requirement: In a multi-statement execution, the last statement must be a SELECT statement.
- Database transactions: All statements are run in a transaction. DDL statements in most dialects (vendors) can't be run within a transaction and might be automatically committed by the driver.
...
- In the Library page, click Import Data.
- In the Import Data page, select a relational connection or Hive connection.
- Hive and relational connections must be enabled and created.
- Within your source, locate the table from which you wish to import. Do not select the table.
Click the Preview icon to review the columns in the dataset.
Tip Tip: You may wish to copy the database, table name, and column names to a text editor to facilitate generating your SQL statement.
Click Create Dataset with SQL. Enter or paste your SQL statement.
Warning Through the custom SQL interface, it is possible to enter SQL statements that can delete data, change table schemas, or otherwise corrupt the targeted database. Please use this feature with caution.
D caption Create Dataset with SQL dialog
The customized source is added to the right panel. To re-edit, click Custom SQL.
Complete the other steps to define your imported dataset.
When the data is imported, it is altered or filtered based on your SQL statement.
- After dataset creation, you can modify the SQL, if needed. See Dataset Details Page.
...
Your SQL statements must be valid for the syntax expected by the target relational system. In particular, object delimiters may vary between systems.
Info |
---|
NOTE: The proper syntax depends on your database system. Please consult the documentation for your product for details. |
Tip |
---|
Tip: Although some relational systems do not require object delimiters around column names, it is recommended that you add them to all applicable objects. |
...
Relational System | Object Delimiter | Example Syntax | |||
---|---|---|---|---|---|
Hive | backtick |
| |||
AWS Glue | See Hive. | ||||
Oracle | double-quote | Double quotes required around database and table names and not required around column names.
| |||
SQL Server | none |
| |||
Postgres | double-quote | Double quotes required around database, table names, and column names.
| |||
Teradata | double-quote | Double quotes required around database and table names and not required around column names.
|
...