Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »



Contents:

The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.

Retains a set of rows in your dataset, which are specified by the conditional in the row expression. All other rows are removed from the dataset. The keep transform is the opposite of the delete transform. See Delete Transform.

Basic Usage

keep row:(customerStatus == 'active')

Output: For each row in the dataset, if the value of the customerStatus column is active, then the row is retained. Otherwise, the row is deleted from the dataset.

Parameters

keep row:(expression)

TokenRequired?Data TypeDescription
keepYtransformName of the transform
rowYstringExpression identifying the row or rows to keep. If expression evaluates to true for a row, the row is retained.

For more information on syntax standards, see Language Documentation Syntax Notes.

row

Expression to identify the row or rows on which to perform the transform. Expression must evaluate to true or false.

Examples:

ExpressionDescription
Score >= 50
true if the value in the Score column is greater than 50.
LEN(LastName) > 8
true if the length of the value in the LastName column is greater than 8.
ISMISSING([Title])
true if the row value in the Title column is missing.
ISMISMATCHED(Score,['Integer'])
true if the row value in the Score column is mismatched against the Integer data type.

For the keep transform, if the expression for the row parameter evaluates to true for a row, it is kept in the dataset. Otherwise, it is removed.

Example:

keep row: (lastOrder >= 10000 && status == 'Active')

Output: Retains all rows in the dataset where the lastOrder value is greater than or equal to 10,000 and the customer status is Active

Usage Notes:

Required?Data Type
YesExpression that evaluates to true or false


Examples

Example - Remove old products and keep new orders

This examples illustrates how you can keep and delete rows from your dataset using the following transforms:

  • delete - Deletes a set of rows as evaluated by the conditional expression in the row parameter. See Delete Transform.
  • keep - Retains a set of rows as evaluated by the conditional expression in the row parameter. All other rows are deleted from the dataset. See Keep Transform.

Source:

Your dataset includes the following order information. You want to edit your dataset so that:

  • All orders for products that are no longer available are removed. These include the following product IDs: P100, P101, P102, P103.
  • All orders that were placed within the last 90 days are retained.

 

OrderIdOrderDateProdIdProductNameProductColorQtyOrderValue
10016/14/2015P100HatBrown190
10021/15/2016P101HatBlack2180
100311/11/2015P103SweaterBlack3255
10048/6/2015P105CardiganRed4320
10057/29/2015P103SweeterBlack5375
100612/1/2015P102PantsWhite6420
100712/28/2015P107T-shirtWhite7390
10081/15/2016P105CardiganRed8420
10091/31/2016P108CoatNavy9495

 

Transformation:

First, you remove the orders for old products. Since the set of products is relatively small, you can start first by adding the following:

NOTE: Just preview this transformation. Do not add it to your recipe yet.

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition (ProdId == 'P100')
Parameter: Action Delete matching rows

When this step is previewed, you should notice that the top row in the above table is highlighted for removal. Notice how the transformation relies on the ProdId value. If you look at the ProductName value, you might notice that there is a misspelling in one of the affected rows, so that column is not a good one for comparison purposes.

You can add the other product IDs to the transformation in the following expansion of the transformation, in which any row that has a matching ProdId value is removed:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition (ProdId == 'P100' || ProdId == 'P101' || ProdId == 'P102' || ProdId == 'P103')
Parameter: Action Delete matching rows

When the above step is added to your recipe, you should see data that looks like the following:

OrderIdOrderDateProdIdProductNameProductColorQtyOrderValue
10048/6/2015P105CardiganRed4320
100712/28/2015P107T-shirtWhite7390
10081/15/2016P105CardiganRed8420
10091/31/2016P108CoatNavy9495

Now, you can filter out of the dataset orders that are older than 90 days. First, add a column with today's date:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula '2/25/16'
Parameter: New column name 'today'

Keep the rows that are within 90 days of this date using the following:

Transformation Name Filter rows
Parameter: Condition Custom formula
Parameter: Type of formula Custom single
Parameter: Condition datedif(OrderDate,today,day) <= 90
Parameter: Action Keep matching rows

Don't forget to delete the today column, which is no longer needed:

Transformation Name Delete columns
Parameter: Columns today
Parameter: Action Delete selected columns

Results:

OrderIdOrderDateProdIdProductNameProductColorQtyOrderValue
100712/28/2015P107T-shirtWhite7390
10081/15/2016P105CardiganRed8420
10091/31/2016P108CoatNavy9495

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 0 rates

  • No labels

This page has no comments.