Skip to main content

Import Data Page

Through the Import Data page, you can upload datasets or select datasets from sources that are stored on connected datastores. From the Library for Data page, click Import Data.

ImportDatasetPage-G.png

Figure: Import Data page

General Limitations

Note

For file-based sources, the Dataprep by Trifacta platform expects that each row of data in the import file is terminated with a consistent newline character, including the last one in the file.

  • For single files lacking this final newline character, the final record may be dropped.

  • For multi-file imports lacking a newline in the final record of a file, this final record may be merged with the first one in the next file and then dropped in theTrifacta Photonrunning environment.

Note

For file-based external datastores, the following limitations apply:

  • Only the first 10,000 files can be retrieved.

  • The first sample is pulled from a maximum of the first 100 files in the directory. If the size of these 100 files is less than 10MB, the Transformer page indicates that this represents the full dataset.

  • When imported, the size computation reported in the Flow View page is over the first 10,000 files.

  • Jobs are run across all files in the directory, even if there are more than 10,000 files.

Note

An imported dataset requires about 15 rows to properly infer column data types and the row, if any, to use for column headers.

File and path limitations:

  • The colon character ( :) cannot appear in a filename or a file path.

  • Filenames cannot begin with special characters like dot ( .) or underscore( _).

  • Input file or table paths can have a maximum length of 1024 characters.

Basic Task

1. Connect to sources

During import, the Trifacta Application identifies file formats based on theextensionof the filename.

  • Compressedfiles are recognized and can be imported based on their file extensions.

  • Filenames that do not have an extension are treated as TXT files.

Upload: The Dataprep by Trifacta platform can also load files from your local file system.

Tip

You can drag and drop files from your desktop to to upload them.

Note

You can upload a file up to 1 GB in size.

Note

When you upload an updated version of a previously uploaded file, the new file is stored as a separate upload altogether. Where the imported dataset based on the previous version is used, you must swap out the old dataset to point to the new one.

To change your upload location, click Edit. Navigate to the preferred Cloud Storage location.

Cloud Storage: Browse or search your Cloud Storage files to select content for import. See Google Cloud Storage Access.

BigQuery: Browse your BigQuery instance for tables to import as datasets.

Note

For long-loading BigQuery sources, you can monitor progress through each stage of ingestion. After these sources are ingested, subsequent steps to import and wrangle the data may be faster. For more information, see Overview of Job Monitoring.

Tip

A timeout of 2 minutes is enforced on all queries for data to import, after which an error occurs. As a workaround, you can use custom SQL to submit a more streamlined query. If that query still times out, you can reference a materialized view instead.

If you have read permissions for multiple projects, you can read from tables that are part of other projects. See BigQuery Connections.

Note

When working with datasets sourced from Avro files, lineage information and the SOURCEROWNUMBER function are not supported.

Databases: If connected to a relational datastore, you can load tables or views from your database. SeeDatabase Browser.

Note

For long-loading relational sources, you can monitor progress through each stage of ingestion. After these sources are ingested, subsequent steps to import and wrangle the data may be faster.

For more information, see Overview of Job Monitoring.

Google Sheets: Browse or search your Google Drive for Google Sheets to import.

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

Note

You must enable access to your Google Drive. No other Google Drive data is read for import.

See Import Google Sheets Data.

For more information on the supported input formats, see Supported File Formats.

New/Edit: Click to create or edit a connection. By default, the displayed connections support import.

Search:Enter a search term to locate a specific connection.

SeeCreate Connection Window.

2. Add datasets

When you have found your source directory or file:

  • You can hover over the name of a file to preview its contents.

    Note

    Preview may not be available for some sources, such as Parquet.

  • Click the Plus icon next to the directory or filename to add it as a dataset.

    Tip

    You can import multiple datasets at the same time. See below.

  • Excel files: Click the Plus icon next to the parent workbook to add all of the worksheets as a single dataset, or you can add individual sheets as individual datasets.

  • If custom SQL query is enabled, you can click Create Dataset with SQL to enter a customized SQL statement to pre-filter the table within the database to include only the rows and columns of interest.

    For more information, seeCreate Dataset with SQL.

If parameterization has been enabled, you can apply parameters to the source paths of your datasets to capture a wider set of sources. Click Create Dataset with Parameters. SeeCreate Dataset with Parameters.To show hidden files or folders, select Show hidden.

Note

Hidden folder names begin with a dot (.) or underscore (_). In general, these folders are hidden for a reason. File structures may change without notice.

Tip

If you have run a job with profiling enabled, you can import your profile files as datasets and then publish them to other datastores, such as BigQuery, for additional analysis. These files are stored in the .profiler folder beneath your job results folder in jobrun. For more information on these files, see Overview of Visual Profiling.

3. Configure selections

When a dataset has been selected, the following fields appear on the right side of the screen. Modify as needed:

  • Dataset Name: This name appears in the interface.

  • Dataset Description: You may add an optional description that provides additional detail about the dataset. This information is visible in some areas of the interface.

Tip

Click the Eye icon to inspect the contents of the dataset prior to importing.

Tip

You can select a single dataset or multiple datasets for import.

Edit settings

You can edit any additional or optional settings for an individual dataset. Perform the following:

Steps:

  1. Click Edit Settings from the card for an individual dataset in the right panel. The dialog box is displayed.

  2. In the dialog box, select the required options and modify the settings.

4. Import selections

You can import one or more datasets. Continue selecting sources, and additional dataset cards are added to the right panel.

Note

If you are importing from multiple files at the same time, the files are not necessarily read in a regular or predictable order.

Note

When you import a dataset with parameters from multiple files, only the first matching file is displayed in the right panel.

In the right panel, you can see a preview of each dataset and make changes as needed.ImportMultipleDatasets.png

Figure: Importing Multiple Datasets

  • To import the selected datasets, click Continue.

  • To remove a dataset from import, click the X in the dataset card.