Through the Import Data page, you can upload datasets or select datasets from sources that are stored on connected datastores. From the Library page, click Import Data.

Import Data page

General Limitations

NOTE: For file-based sources, expects that each row of data in the import file is terminated with a consistent newline character, including the last one in the file.

  • For single files lacking this final newline character, the final record may be dropped.

  • For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the running environment.

File and path limitations:

Basic Workflow

1. Connect to sources

NOTE: Compressed files are recognized and can be imported based on their file extensions.

Upload: can also load files from your local file system.

Tip: You can drag and drop files from your desktop to to upload them.

NOTE: You can upload a file up to 1 GB in size.

NOTE: When you upload an updated version of a previously uploaded file, the new file is stored as a separate upload altogether. In your flow, you must swap out the old dataset to point to the new one.

HDFS: If connected to a Hadoop cluster, you can select file(s) or folders to import. See HDFS Browser.

Hive: If connected to a Hive instance, you can load datasets from individual tables within the set of Hive databases. See Hive Browser.

S3: If connected to an S3 instance, you can browse your S3 buckets to select source files.

Tip: For HDFS and S3, you can select folders, which selects each file within the directory as a separate dataset.

See S3 Browser.

Redshift: If connected to an S3 datawarehouse, you can import source from the connected database. See Redshift Browser.

WASB: If enabled, you can import data into your Azure deployment from WASB. For more information, see WASB Browser.

ADL: If enabled, you can import data into your Azure deployment from ADLS Gen1. See ADLS Gen1 Browser.

ADLS Gen2: If enabled, you can import data into your Azure deployment from ADLS Gen1. See ADLS Gen2 Browser.

Alation: If connected to Alation, you can search for and import Hive tables as imported datasets. For more information, see Using Alation.

Waterline: If connected to Waterline, you can search for and import datasets through the data catalog. For more information, Using Waterline.

Databases: If connected to a relational datastore, you can load tables or views from your database. See Database Browser.

NOTE: For long-loading relational sources, you can monitor progress through each stage of ingestion. After these sources are ingested, subsequent steps to import and wrangle the data may be faster.

For more information, see Configure JDBC Ingestion.

For more information, see Overview of Job Monitoring.

For more information on the supported input formats, see Supported File Formats.

New/Edit: Click to create or edit a connection. By default, the displayed connections support import.

Search: Enter a search term to locate a specific connection.

NOTE: This feature may be disabled in your environment. For more information, contact your .

See Create Connection Window.

2. Add datasets

When you have found your source directory or file:

If parameterization has been enabled, you can apply parameters to the source paths of your datasets to capture a wider set of sources. Click Create Dataset with Parameters. See Create Dataset with Parameters.

3. Configure selections

When a dataset has been selected, the following fields appear on the right side of the screen. Modify as needed:

You can select a single dataset or multiple datasets for import.

You can modify settings used during import for individual files. In the card for an individual dataset, click Edit Settings.

NOTE: In some cases, there may be discrepancies between row counts in the previewed data versus the data grid after the dataset has been imported, due to rounding in row counts performed in the preview.

For more information on supported encodings, see Configure Global File Encoding Type.

4. Import selections

Single dataset

If you have selected a single dataset for import:

  1. To immediately wrangle it, click Import & Wrangle. The dataset is imported. A recipe is created for it, added to a flow, and loaded in the Transformer page for wrangling. See Transformer Page.
  2. To import the dataset, click Import. The imported dataset is created. You can add it to a flow and create a recipe for it later. See Library Page.

Multiple datasets

You can import multiple datasets from multiple sources at the same time. In the Import Data page, continue selecting sources, and additional dataset cards are added to the right panel.

NOTE: If you are importing from multiple files at the same time, the files are not necessarily read in a regular or predictable order.

NOTE: When you import a dataset with parameters from multiple files, only the first matching file is displayed in the right panel.

In the right panel, you can see a preview of each dataset and make changes as needed.

Import Multiple Datasets

If you have selected multiple datasets for import: