Page tree


Contents:

   

Contents:


A key task in cleaning up your data is to remove unwanted columns and rows, which can simplify future transformations and improve job execution performance. Dataprep by Trifacta provides multiple mechanisms for removing data from your dataset.

Tip: When you are deleting data, you should consider if that data may have other uses in the future or for other users. If so, you should consider doing the data removal through a separate recipe off of your current recipe, which preserves the data for other uses in the current recipe.

Delete Columns

You can delete one or more columns based on the following:

  • By selection
  • Through transformation

    Tip: When you delete through transformation steps, you have additional controls at your disposal.

By selection

You can delete a single column or multiple columns: 

  • To delete a column from your dataset, click the column and select Delete from the column drop-down. 
  • If you select Delete others, all other remaining columns are deleted except the selected column.

TipTo delete multiple columns, select them in the data grid or column browser. Then select Delete from the column menu.

The column or columns are removed from the data grid, and a new step is added to your recipe. 

Through transformation

You can delete columns through the transformation steps.

Steps:

  1. In the Transformer page, click Delete columns
  2. The Delete columns transformation is populated in the Transformer Builder.
  3. Select one or more columns, as required:
    1. Multiple: Select one or more columns from the drop-down list. 
    2. All: Select all columns in the dataset.

      NOTE: This step removes all columns in your dataset.

    3. Range: Specify a start and ending columns. All columns inclusive of start and end are deleted. 
    4. Advanced: Specify the columns using a comma-separated list.  Ranges of columns can be specified using the tilde (~) character. Examples:

      EntryDescription
      Store_Nbr ~ DailyColumns from Store_Nbr to Daily in the dataset are deleted.
      Store_Name,Store_Manager,Store_Nbr ~ DailyThe following columns are deleted: Store_Name Store_Manager Store_Nbr  to  Daily


  4. From the Action area, select one of the following options:
    1. Delete selected columns: Deletes only the selected columns.
    2. Delete unselected columns: Deletes all other remaining columns except the selected columns.
  5. To delete columns, click Add

Example transformation:

The following transformation deletes the columns between Store_Nbr and Daily, inclusive. 

Transformation Name Delete columns
Parameter: Columns Advanced
Parameter: Column Store_Nbr~Daily
Parameter: Action Delete selected columns

Delete Rows

Since rows do not have an identifying header, you must identify the rows to remove in your dataset based on a specified condition.  You can delete rows based on the following:

  • By selection
  • By custom conditions

By selection

You can delete rows by selecting values. You are prompted for data filtering suggestions when you select values in:

  • column histograms
  • column data quality bars
  • cells or values within a cell 

When you make a selection, select the Delete rows transformation in the context panel. The Transform Builder contains a transformation to filter rows based on the the condition that you have selected. For example, if you selected the value California in the State column, then the transformation is specified to filter out rows in which State=California.

In the Transform Builder, you must decide if the transformation keeps matching rows (deleting all others) or deletes matching rows. In the following example, rows in which State=California are selected for deletion:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition State == "California"
Parameter: Action Delete matching rows

By custom conditions

You can delete a set of rows based on a condition specified in the condition column . If the conditional expression is  true , then the selected rows are  deleted. 

  1. In the Transformer page, click the Recipe icon. The Recipe panel is displayed.
  2. In the Search Transformations panel, enter Filter in.
  3. In the Filter rows transformation, enter the required details:
    1. Condition: Filter based on the condition type that you select in the drop-down. Some condition types do not support specifying the condition by formula.
    2. Column: The column containing the values to filter. For example, action_count.
    3. Values or Formula: Specify the values or the formula used to determine the condition. 
      1. If these values are present, then the condition evaluates to true.
      2. The formula must evaluate to true or false
    4. Action: The action to be performed to the rows based on the specified conditions. 
    5. In the following example, the rows where the action_count column values fall between 1 and 10 are deleted:

      Transformation Name Filter rows
      Parameter: Condition Custom formula
      Parameter: Type of formula Custom single
      Parameter: Condition (action_count >= 1) && (action_count <= 10)
      Parameter: Action Delete matching rows

      Tip: You can apply logical operators such as && (logical AND) above to build more sophisticated logical tests.

  4. To add the recipe to the step, click Add. The dataset rows are filtered based on the configured transformation.

This page has no comments.