Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Through simple selections, you can identify columns to remove, values on which to base row deletion, or strings to remove from your dataset. As needed, these transforms transformations can be modified for more sophisticated removal transformations.

...

  • When data is removed from your dataset, no actual deletion is performed.
    • D s product
      rtrue
       does not modify source data. All recipe executions generate new sets of data based on the transformations you define, which are applied to a generated version of the source data.
    • Transform Transformation steps are previewed and can be undone on sampled data in the Transformer page, so you should feel free to experiment with data removal.
  • In large volume datasets, be careful applying patterns or regular expressions to your data. You should limit your application of these pattern-based changes to the minimum range of columns, rows, or strings required to complete the task. 

...

To drop multiple columns, you can specify comma-separated column names in your drop transformyour Delete Columns transformation:

D trans
Typestep
p01NameColumns
p01ValueColA,ColC,ColE
p02NameAction
p02ValueDelete selected columns
SearchTermDelete columns

...

For more information, see Drop TransformRemove Data.

Tip

Tip: You can also drop multiple columns through the Column Browser. See Column Browser Panel.

...

  1. In this case, you could use the column histogram to select the value Tempe in the city column, or you can use the Filters panel to filter for rows containing the value Tempe
  2. Then, select the Delete suggestion card.  

    D caption
    typefigure
    Select Tempe in the City column to remove all entries for that city
  3. After selecting Delete, the application evaluates your selected value and attempt your intention with the selection. Is it a string literal or a pattern? If it's a pattern, what does the pattern represent? You may select one of the variants in the Delete card to find the right match.

    Info

    NOTE: Be sure to scroll up and down in the data grid to review the values that are affected. In some cases, your selection may turn into a pattern, which could apply to more than just the desired values. In the previous example, selecting Tempe may yield a matching pattern of {alpha}{5}, which would match any five-letter city name, including Tempe. Select other variants in the Delete card to change the matching pattern. Click Modify to review the matching string.

  4. After defining and modifying your delete transformFilter Rows transformation, you can use the preview to see the rows that will be removed, prior to adding the transform transformation to your recipe.
Tip

Tip: You can also use the keep transform Filter Rows to retain rows based on a specified condition, effectively deleting the rows that do not match. See Keep Transform Filter Data.

Filter rows based on matching conditions

...

For more information, see Filter TransformData.

Delete rows based on multiple blank cells

...

For more information, see Delete TransformFilter Data.

Remove values

To delete values from a column, select the values in the data grid. In the suggestion cards, select the Replace card. In the following example, the city column is removed of all values matching Tempe

D trans
p03Value''
Typestep
p01NameColumn
p01Valuecity
p02NameFind
p02Value'Tempe'
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or patterns

The replace transform Replace transformation applies only to string values. The rest of a matching row is unaffected.

The above transform matches transformation matches all values in the column, even partial values, the match string is removed from the column value. For example, an entry Tempest would be turned into st if the above transform transformation was added.

To ensure that only full-column value matches are applied, you can add

D s item
itempatterns
 to indicate the start and end of the column value as in the following:

...

In the above case, only values of Tempe that are the entire column value are matched. For more information on this pattern-based matching, see Text Matching

...

.

Using regular expressions

For more sophisticated matching, you can apply regular expressions to your replace command. In the following example, all integers from 0-99 are matched in the qty column. Because there is no replacement value, they are deleted.

Warning

Regular expressions are very powerful pattern matching tools. You should be careful in your use of them. See Text Matching.

D transp03Value''Typestepp01NameColumnp01Valuecityp02NameFindp02Value/^\d$|^\d\d$/p03NameReplace withp04Valuetruep04NameMatch all occurrencesSearchTermReplace text or patterns

 

 

CharacterDefinition

^

Beginning of string. Required to prevent matching on the last digit of any numeric value.
$End of string. Required to prevent a 2-digit match on three-digit numbers.
\dA single digit
|Logical or. In this case, it is used to define separate regexes for 1- and 2-digit values.