Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097

...

Excerpt

Through the Import Data page, you can upload datasets or select datasets from sources that are stored on connected datastores. From the Library page, click Import Data.




D caption
typefigure
Import Data page

...

Info

NOTE: For file-based sources, the

d-s-productplatform
rtrue
expects  expects that each row of data in the import file is terminated with a consistent newline character, including the last one in the file.

  • For single files lacking this final newline character, the final record may be dropped.

  • For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in the
    D s photon
    running environment.



Info

NOTE: For file-based external datastores, the following limitations apply:

  • Only the first 10,000 files can be retrieved.
  • The first sample is pulled from a maximum of the first 100 files in the directory. If the size of these 100 files is less than 10MB, the Transformer page indicates that this represents the full dataset.
  • When imported, the size computation reported in the Flow View page is over the first 10,000 files.
  • Jobs are run across all files in the directory, even if there are more than 10,000 files.
D s minrows

File and path limitations:

...

1. Connect to sources

During import, the

D s webapp
type
identifies
Portal
 identifies file formats based on the extension of the filename.

  • Compressed files are recognized and can be imported based on their file extensions.
  • Filenames that do not have an extension are treated as TXT files.

Upload: The

true
d-s-product
r
platform
can also load files from your local file system.

...

Info

NOTE: When you upload an updated version of a previously uploaded file, the new file is stored as a separate upload altogether. In your flowWhere the imported dataset based on the previous version is used, you must swap out the old dataset to point to the new one.

...

  • You can hover over the name of a file to preview its contents.

    Info

    NOTE: Preview may not be available for some sources, such as Parquet.

  • Click the Plus icon next to the directory or filename to add it as a dataset.

    Tip

    Tip: You can import multiple datasets at the same time. See below.

  • Excel files: Click the Plus icon next to the parent workbook to add all of the worksheets as a single dataset, or you can add individual sheets as individual datasets. See Import Excel Data. 

  • If custom SQL query is enabled, you can click Create Dataset with SQL to enter a customized SQL statement to pre-filter the table within the database to include only the rows and columns of interest.

    For more information, see Create Dataset with SQL.

...

Info

NOTE: Hidden folder names begin with a dot (.) or underscore (_). In general, these folders are hidden for a reason. File structures may change without notice.

Tip

Tip: If you have run a job with profiling enabled, you can import your profile files as datasets and then publish them to other datastores, such as BigQuery, for additional analysis. These files are stored in the .profiler folder beneath your job results folder in jobrun. For more information on these files, see Overview of Visual Profiling.

3. Configure selections

When a dataset has been selected, the following fields appear on the right side of the screen. Modify as needed:

...

4. Import selections

Single dataset

If you have selected a single dataset for import:

Tip

Tip: If present, you can click the Add to new flow checkbox, which adds the imported datasets to an Untitled flow.

  • Click Continue. The dataset is imported. 
  • A recipe is created for it, added to a new flow, and loaded in the Transformer page for wrangling. See Transformer Page.

...

You can import multiple datasets from multiple sources at the same time. In the Import Data page, continue selecting one or more datasets. Continue selecting sources, and additional dataset cards are added to the right panel.

...

In the right panel, you can see a preview of each dataset and make changes as needed.

D caption
typefigure
ImportImporting Multiple Datasets

...

tip

Tip: If present, you can click the Add to new flow checkbox, which adds the imported datasets to an Untitled flow. For more information, see Flow View Page.

  • To import the selected datasets, click Continue
  • To begin transforming one of these datasets in Flow View, select it. From its context menu, select Add new recipe. Select the recipe. In the context panel on the right, select Edit Recipe. See Transformer Page.
  •  
  • To remove a dataset from import, click the X in the dataset card.

...