Skip to main content

Transform a Row

You can transform rows in your dataset based on conditions you specify or by using a custom formula.

Deduplicate Rows

You can use the Remove duplicate rows transformation to remove identical rows from the dataset.

Steps:

  1. In the Transformer page, click the Recipe icon. The Recipe Panel is displayed.

  2. In the Search Transformations panel, enter Remove duplicate rows. The duplicated rows are highlighted in the Transformer page.

  3. To remove the duplicate rows, click Add.

Limitations:

  • This transformation is case-sensitive. So, if a column has values Hash and HASH, the rows containing those values are not considered duplicates and cannot be removed with this transformation.

  • Whitespace and the beginning and ending of values is not ignored.

Delete Rows on a Condition

By selection

You can delete the rows for any column based on a condition that you select. When you select values in the data quality bar that for a column, a set of predictive suggestions is displayed in the right panel. Locate the Delete rowstransformation, which contains a where clause to indicate the condition on which a row is deleted.

For example, if your data contains mismatched values, select the red bar in the data quality bar and use the Delete rows transformation to delete the mismatched values from the dataset:

CS-InvalidDataSuggestions.png

Figure: Deleting rows based on values that are invalid for the column's data type

By custom condition

You can also delete a set of rows based on a condition expression that you specify. If the conditional expression evaluates to true, then the row is deleted.

In the following example, you can delete a set of POS_Sales values based on a conditional parameter. All POS_Sales where the value is less than or equal to 5 are deleted, and the rest are retained.

Store_Nbr

Item_Nbr

POS_Sales

POS_Qty

POS_Cost

200

322000

22.00

44

23.76

200

323000

1.75

1

2.43

200

325000

0.00

0

0.00

200

325000

9.44

8

7.44

200

326000

2.00

2

1.64

200

327000

5.00

10

5.40

200

328000

7.00

7

6.51

200

341000

0.00

0

0.00

200

341000

7.78

21

19.53

200

342000

0.00

0

0.00

200

350000

5.00

10

5.40

Steps:

  1. In the Transformer page, click the Recipe icon. The Recipe panel is displayed.

  2. In the Search Transformations panel, enter Filter in.

  3. In the Filter rows transformation, enter the required details:

    1. Condition: Filter based on the condition type that you select in the drop-down. Some condition types do not support specifying the condition by formula.

    2. Column: The column containing the values to filter. For example, POS_Sales.

    3. Values or Formula: Specify the values or the formula used to determine the condition.

      1. If these values are present, then the condition evaluates to true.

      2. The formula must evaluate to true or false.

    4. Action: The action to be performed to the rows based on the specified conditions. In this case, select Delete matching rows.

      Transformation Name

      Filter rows

      Parameter: Condition

      Less than or equal to

      Parameter: Column

      POS_Sales

      Parameter: Value

      POS_Sales <= 5

      Parameter: Action

      Delete matching rows
  4. To add the recipe to the step, click Add. The selected rows are deleted from the dataset.

Results:

Store_Nbr

Item_Nbr

POS_Sales

POS_Qty

POS_Cost

200

322000

22.00

44

23.76

200

325000

9.44

8

7.44

200

328000

7.00

7

6.51

200

341000

7.78

21

19.53

Filter the Data Grid

You can filter the sampled data to display only a subset of rows in the Transformer page. You can filter the dataset by rows, columns, and data types.

Columns or rows that are filtered are hidden only from the view. The hidden data is part of the sample and the dataset and is included in the output.

Steps:

  1. In the Transformer page, click the Filter icon.

  2. In the Filter by name panel, enter the required options:

    CS-FilterRowsColumns.png

    Figure: Filter by name

    1. Columns: Search for individual columns or filter columns of a specific type. Filtered columns are displayed, and the rest are hidden.

    2. Rows: Highlight search term matches found in any column for a row.

      Note

      You can also filter the rows and columns by data type.

  3. To clear all filters, click Clear all filters.

Hide and show columns

In the Transformer page, click the Eye icon in the status bar at the bottom of the page. Click one or more columns to show or hide them.

CS-TransformRows-VisibleColumnsPanel.png

Figure: Visible Columns panel

Tip

You can also use the Columns menu in the Transformer page to show or hide columns.

Note

When a column is hidden from a dataset, it is hidden for all users. You should check the column browser for hidden columns in shared datasets.

Filter during previews

When you are creating or editing transformations, the data grid displays a preview of results of the transformation in development. As needed, you can narrow the display to show only the affected rows, columns, or both. Select the Show only affected rows checkbox in the status bar at the bottom of the Transformer page.