Tip: For HDFS and S3, you can select folders, which selects each file within the directory as a separate dataset.
See S3 Browser.
Redshift: If connected to an S3 datawarehouse, you can import source from the connected database. See Redshift Browser.
Hive: If connected to a Hive instance, you can load datasets from individual tables within the set of Hive databases. See Hive Browser.
ADL: If enabled, you can import data into your Azure deployment from ADLS. The ADLS browser is very similar to the one for HDFS. See HDFS Browser.
For more information on the supported input formats, see Supported File Formats.
New/Edit: Click to create or edit a connection.
NOTE: This feature may be disabled in your environment. For more information, contact your
2. Add datasets
When you have found your source directory or file:
If custom SQL query is enabled, you can click Create Dataset with SQL to enter a customized SQL statement to pre-filter the table within the database to include only the rows and columns of interest.
Through this interface, it is possible to enter SQL statements that can delete data, change table schemas, or otherwise corrupt the targeted database. Please use this feature with caution.
For more information, see Create Dataset with SQL.
If parameterization has been enabled, you can apply parameters to the source paths of your datasets to capture a wider set of sources. Click Create Dataset with Parameters. See Create Dataset with Parameters.
3. Configure selections
When a dataset has been selected, the following fields appear on the right side of the screen. Modify as needed:
- Detect structure: By default,
attempts to interpret the structure of your data during import. This structuring attempts to apply an initial tabular structure to the dataset.
D s product
- Unless you have specific problems with the initial structure, you should leave the Detect structure setting enabled. Recipes created from these imported datasets automatically include the structuring as the first, hidden steps. These steps are not available for editing, although you can remove them through the Recipe panel. See Recipe Panel.
- When detecting structure is disabled, imported datasets whose schema has not been detected are labeled, unstructured datasets. When recipes are created for these unstructured datasets, the structuring steps are added into the recipe and can be edited as needed.
- For more information, see Initial Parsing Steps.
Remove special characters from column names: When selected, characters that are not alphanumeric or underscores are stripped, and space characters are converted to underscores.
Tip: This feature matches the column renaming behavior in Release 5.0 and earlier.
For more information, see Sanitize Column Names.
- Column Infer column data type inferencetypes: You can choose whether or not to apply
to your individual dataset.
D s item item type inference
- In the preview panel, you can see the data type that is to be applied after the dataset is imported. This data type may change depending on whether column data type inference is enabled or disabled for the dataset.
, select the Column Data Type Inference the Infer column data types checkbox.
D s item item type inference Tip
Tip: To see the effects of
, you can toggle the checkbox and review data type listed at the top of individual columns. To override an individual column's data type, click the data type name and select a new value.
D s item item type inference