D toc |
---|
Excerpt |
---|
Through the Import Data page, you can upload datasets or select datasets from sources that are stored on connected datastores. From the Library page, click Import Data. |
D caption | ||
---|---|---|
| ||
Import Data page |
General Limitations
Info | |||||
---|---|---|---|---|---|
NOTE: For file-based sources,
|
D s dataadmin role |
---|
D s minrows |
---|
File and path limitations:
- The colon character (
:
) cannot appear in a filename or a file path. - Filenames cannot begin with special characters like dot (
.
) or underscore(_
). - Input file or table paths can have a maximum length of 1024 characters.
Basic Workflow
1. Connect to sources
During import, the
D s webapp |
---|
- Compressed files are recognized and can be imported based on their file extensions.
- Filenames that do not have an extension are treated as TXT files.
Upload:
D s product | ||
---|---|---|
|
Tip |
---|
Tip: You can drag and drop files from your desktop to to upload them. |
Info |
---|
NOTE: You can upload a file up to 1 GB in size. |
Info |
---|
NOTE: When you upload an updated version of a previously uploaded file, the new file is stored as a separate upload altogether. In your flow, you must swap out the old dataset to point to the new one. |
HDFS: If connected to a Hadoop cluster, you can select file(s) or folders to import. See HDFS Browser.
Hive: If connected to a Hive instance, you can load datasets from individual tables within the set of Hive databases. See Hive Connections.
S3: If connected to an S3 instance, you can browse your S3 buckets to select source files.
Tip |
---|
Tip: For HDFS and S3, you can select folders, which selects each file within the directory as a separate dataset. |
Redshift: If connected to an S3 data warehouse, you can import source from the connected database. See Amazon Redshift Connections.
WASB: If enabled, you can import data into your Azure deployment from WASB.
ADL: If enabled, you can import data into your Azure deployment from ADLS Gen1.
ADLS Gen2: If enabled, you can import data into your Azure deployment from ADLS Gen1.
Databases: If connected to a relational datastore, you can load tables or views from your database. See Database Browser.
Info |
---|
NOTE: For long-loading relational sources, you can monitor progress through each stage of ingestion. After these sources are ingested, subsequent steps to import and wrangle the data may be faster. For more information, see Configure JDBC Ingestion. For more information, see Overview of Job Monitoring. |
For more information on the supported input formats, see Supported File Formats.
New/Edit: Click to create or edit a connection. By default, the displayed connections support import.
Search: Enter a search term to locate a specific connection.
Info | ||||
---|---|---|---|---|
NOTE: This feature may be disabled in your environment. For more information, contact your
|
2. Add datasets
When you have found your source directory or file:
You can hover over the name of a file to preview its contents.
Info NOTE: Preview may not be available for some sources, such as Parquet.
Click the Plus icon next to the directory or filename to add it as a dataset.
Tip Tip: You can import multiple datasets at the same time. See below.
Excel files: Click the Plus icon next to the parent workbook to add all of the worksheets as a single dataset, or you can add individual sheets as individual datasets. See Import Excel Data.
If custom SQL query is enabled, you can click Create Dataset with SQL to enter a customized SQL statement to pre-filter the table within the database to include only the rows and columns of interest.
Warning Through this interface, it is possible to enter SQL statements that can delete data, change table schemas, or otherwise corrupt the targeted database. Please use this feature with caution.
For more information, see Create Dataset with SQL.
If parameterization has been enabled, you can apply parameters to the source paths of your datasets to capture a wider set of sources. Click Create Dataset with Parameters. See Create Dataset with Parameters.
3. Configure selections
When a dataset has been selected, the following fields appear on the right side of the screen. Modify as needed:
- Dataset Name: This name appears in the interface.
- Dataset Description: You may add an optional description that provides additional detail about the dataset. This information is visible in some areas of the interface.
Tip |
---|
Tip: Click the Eye icon to inspect the contents of the dataset prior to importing. |
Tip |
---|
Tip: You can select a single dataset or multiple datasets for import. |
Edit settings
You can edit any additional or optional settings for an individual dataset. Perform the following:
Steps:
- Click Edit Settings from the card for an individual dataset in the right panel. The dialog box is displayed.
- In the dialog box, select the required options and modify the settings.
- File Import Settings: For more information, see File Import Settings.
- Table Import Settings: For more information, see Table Import Settings.
4. Import selections
Single dataset
If you have selected a single dataset for import:
Tip |
---|
Tip: If present, you can click the Add to new flow checkbox, which adds the imported datasets to an |
- Click Continue. The dataset is imported.
- A recipe is created for it, added to a new flow, and loaded in the Transformer page for wrangling. See Transformer Page.
Multiple datasets
You can import multiple datasets from multiple sources at the same time. In the Import Data page, continue selecting sources, and additional dataset cards are added to the right panel.
Info |
---|
NOTE: If you are importing from multiple files at the same time, the files are not necessarily read in a regular or predictable order. |
Info |
---|
NOTE: When you import a dataset with parameters from multiple files, only the first matching file is displayed in the right panel. |
In the right panel, you can see a preview of each dataset and make changes as needed.
D caption | ||
---|---|---|
| ||
Import Multiple Datasets |
If you have selected multiple datasets for import:
Tip |
---|
Tip: If present, you can click the Add to new flow checkbox, which adds the imported datasets to an |
To import the selected datasets, click Continue.
To begin transforming one of these datasets in Flow View, select it. From its context menu, select Add new recipe. Select the recipe. In the context panel on the right, select Edit Recipe. See Transformer Page.
- To remove a dataset from import, click the X in the dataset card.
D s also | ||||
---|---|---|---|---|
|