Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

For datasets that have a fixed width for each row, determining the column breaks can be more challenging, due to the uncertain number of spaces and tabs between each data element. With enhanced pattern matching, the application can help you identify the appropriate locations to break columns and then trim down the data to eliminate the whitespace padding. 

Steps:

  1. Import your fixed-width dataset through the application and begin wrangling. 
  2. The data should now look similar to the following:



    D caption
    Fixed-width dataset after import
  3. From the drop-down to the right of the column name, select Column Details.

  4. In the Column Details panel, click the Patterns tab. 

  5. Click in the All Patterns area. 

    Info

    NOTE: Selecting a specific pattern token will generate suggestions for only that particular token.

    Info

    NOTE: If the application has inferred that the dataset is fixed-width, then the All Patterns area is the only available selection. If the dataset is not inferred as fixed-width, you should see multiple categories of patterns.

  6. From the suggestion cards, click the Split one.
  7. Close the Column Details panel. 
  8. In the Transform preview window, verify that the column splits look ok. 
    1. If a column contains multiple columns of data, click Edit.
    2. Verify that you are splitting based on position numbers, which means that column splits are done based on the number of characters from the left side of each line.  
    3. Your recipe step might look similar to the following:

      D trans
      p03Value7, 67, 117, 167, 217, 221, 239, 251, 253, 303, 315, 317, 329, 341, 391, 400, 512, 560, 610, 630, 650, 660
      Typestep
      p01NameColumn to split
      p01Valuecolumn1
      p02NameOption
      p02ValueBy positions
      p03NamePositions
      SearchTermSplit columns by positions

    4. In the list of values for positions, insert a new position number for the column or columns that contain multiple columns of data.
    5. Verify your changes in the Transform Preview panel.
  9. Click Add.
  10. Verify that the columns are split correctly. 
  11. You can use the following step to remove the whitespace from each cell value. 

    D trans
    Typestep
    p01NameColumn
    p01Value*
    p02NameFormula
    p02ValueTRIM($col)
    SearchTermEdit column with formula

  12. Click Add.