Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next


When a dataset is initially loaded into the Transformer page, one or more steps may be automatically added to the new recipe in order to assist in parsing the data. The added steps are based on the type of data that is being loaded and the ability of the application to recognize the structure of the data.

File Encoding

When a text file is used as an imported dataset, 

D s product
 assumes that the imported files are encoded in UTF-8, by default.


NOTE: Assessing the file encoding type based on parsing an input file is not an accurate method. Instead,

D s product
assumes that the file is encoded in the default encoding. If it is not, the
D s webapp
should be prompted with the appropriate encoding type.


NOTE: In some cases, imported files are not properly parsed due to issues with encryption types or encryption keys in the source datastore. For more information, please contact your datastore administrator.

  • As needed, you can change the encoding to use when parsing individual files. In the Import Data page, click Edit Settings in the right-hand panel.  For more information, see Import Data Page.

Automatic Structure Detection


Step 1: Split the rows. In most cases, the first step added to your recipe is a Splitrows transformation, which breaks up the individual rows based on a consistently recognized pattern at the end of each line.  Often, this value is a carriage return or a carriage return-new line. These values are written in 

D s lang
 as \r and \r\n, respectively. See the example below.


NOTE: The maximum permitted length of any individual record on input is 20 MB.

Step 2: Split the columns. Next, the application attempts to break up individual rows into columns.



Tip: If you select the delimiter in a column with a very large number of delimiters, any suggestion card limits the split to a maximum of 250 columns. You can edit the suggested transformation to increase the number of split columns as needed. Increasing the limit can impact browser performance.

Header Row

When a dataset is imported, the application may infer the names of your columns from the first row of the dataset. 


Tip: Avoid importing data that contains missing or empty values in the first row. These gaps can cause problems in your headers.

  • In some cases, the application may be unable to create this header row. Instead, the columns are titled column1column2column3 and so on.
  • If the column names are split across multiple rows in your dataset, you may need to modify the header transformation step. For more information, see Rename Columns.

Excel, CSV

Microsoft Excel files are internally converted to CSV files and then loaded into the Transformer page. CSV files are treated using the general parsing steps. See previous section.


  • For JSON files, it is important to import them in unstructured format.
  • D s product
    requires that JSON files be submitted with one valid JSON object per line. 
    • Multi-line JSON import is not supported.
    • Consistently malformed JSON objects or objects that overlap linebreaks might cause import to fail.


For more information, see Working with JSON v1v2.

Known Issues

  • Some characters in imported datasets, such as NUL (ASCII character 0) characters, may cause problems with recognizing line breaks. If initial parsing is having trouble with line breaks, you may need to fix the issue in the source data prior to import, since the Splitrows transformation must be the first step in your recipe.