In addition to CSV and other formats,  can directly import Microsoft® Excel® workbooks and folders containing workbooks. The worksheets of a workbook can be imported as:

Limitations

Import Sources


Configure

When Excel data is imported into , each sheet in an imported file must be converted to a CSV and then ingested for use. The product's conversion service supports two methods:

Conversion methodDescription
Apache POI (Java)(default) This new method improves performance, scalability, and reliability of the Excel conversion process.
PythonThe Python method has been available in the product for some time.

Tip: The Java-based Apache POI conversion method is recommended. However, if you encounter issues while using this method, you may choose to use the following settings to enable the Python method.

Settings:

Administrators can modify the following settings to enable either conversion method.

SettingDescription
Enable Apache POI based converter for Excel data conversion

Set this value to Enabled to use the Apache POI (Java) conversion method.

Set this value to Disabled to use the Python conversion method.

Detect maximum column count in XLSX sheet

When the above setting is enabled, set this value to Enabled to force the converter to identify the maximum number of columns in each XLSX sheet before beginning the conversion, which assists in detecting the sheet's structure.

This setting does not apply to files in XLS format.

For more information, see Dataprep Project Settings Page.

Use

Steps:

  1. In the menu bar, click Library
  2. In the Library page, click Import Data. Select the connection to use. See Import Data Page.

    Import Excel workbook

    Tip: If you experience issues uploading large XLS/XLSX files, you can convert the files to CSV files and then upload them.

  3. After you select the workbook, it is uploaded and converted to CSV format and stored by the platform. Depending on the size of the workbook, this process may take a while.
  4. By default, all worksheets in the workbook are imported as individual datasets. To change how the data is imported, click Edit in the right panel.

    Import settings for Excel datasets
  5. Dataset creation:
    1. 1 dataset per sheet: (Default) Each selected sheet in the workbook is imported as a separate dataset. 
      Specify the base name of the datasets that you are creating. If you are creating a single dataset, the name of the workbook is used. 
    2. Selected sheets into 1 dataset: All selected sheets in the workbook are combined and imported as a single dataset.

      NOTE: The schemas of each dataset must match. Columns must be listed in the same order in each dataset. The column headers are taken from the first selected dataset.

    3. All and future sheets into 1 dataset: If the workbook is updated periodically with new sheets that you would like to add in the future, select this option. After initial selection of sheets, all sheets that are added to the workbook in the future are automatically added as part of the imported dataset.

      NOTE: This option is available only if you are connected to a backend file storage system.

  6. Selected sheets: 
    1. You can select the sheets to import. 

      NOTE: If you are importing a folder of Excel files, data preview and initial sampling are executed against the first file found in the folder.

    2. To preview the data of an individual sheet, mouse over a dataset and click Jump to.

  7. Remove special characters from column names: Select this option to remove any special characters from the inferred column headers during import.
  8. From the drop-down, you can specify how you want the application to parse the data for column headers. 
  9. To save changes, click Save.
  10. After your datasets have been added, you can edit the name and description information for each in the right navigation panel.
  11. Optionally, you can assign the new dataset(s) to an existing flow or create a new one to contain them