Excerpt |
---|
Removes or keeps the matching rows of data, based on a condition that you specify or a custom formula that you apply. |
You can filter your dataset based on the following condition types:
Example - is missing
D code |
---|
filter missing: qty action: Drop |
Output: Deletes all rows in which the value in the
qty
column is missing.
Example - is mismatched
D code |
---|
filter col: CoName mismatched: 'String' action: Drop |
Output: Deletes all rows in which the value in the
CoName
column does not match the String data type.
Example - is exactly
D code |
---|
filter col: basic exactly: find(basic, '545', true, 1) == 8 action: Keep |
Output: Keeps all rows where
545
appears at the eighth character in the
basic
column.
Example - is one of
D code |
---|
filter col: zipCode oneOf: '94104','94105' action: Keep |
Output: Keeps all rows in which the value of the
zipCode
column is either
94104
or
94105
and deletes all other rows in the dataset.
Example - Less than (or equal to)
D code |
---|
filter col: row_number lessThanEqual: 5 action: Keep |
Output: Keeps all rows in the dataset where the value in the
row_number
column is less than or equal to
5
. All other rows are deleted.
Example - Greater than (or equal to)
D code |
---|
filter col: row_number greaterThanEqual: 10 action: Drop |
Output: Deletes all rows in the dataset where the value in
row_number
is greater than or equal to
10
.
Example - Is Between
D code |
---|
filter col: row_number greaterThan: 5 lessThanEqual: 15 action: Keep |
Output: Keeps all rows where the
row_number
value is greater than
5
or less than or equal to
15
. All other rows are deleted.
Example - Contains
D code |
---|
filter col: phoneNum contains: `\({digit}{3}\)` action: Keep |
Output: Keeps all rows where the
phoneNum
value contains a three-digit pattern surrounded by parentheses
(XXX)
. All other rows are deleted.
Example - Starts with
D code |
---|
filter col: phoneNum startsWith: '(981)' action: Keep |
Output: Keeps all rows where the
phoneNum
value begins with
(981)
. All other rows are deleted.
Example - Ends with
D code |
---|
filter col: zipCode endsWith: `\-{digit}{4}` action: Drop |
Output: Deletes all rows where the
zipCode
value ends with a four-digit extension.
Example - custom formula
D code |
---|
filter row: (row_number >= 25 && firstName == 'Steve') action: Keep |
Output: Keeps all rows where the
row_number
value is greater than or equal to 25 and the
firstName
value is
Steve
. All other rows are deleted.
D code |
---|
filter col:column_ref [missing: column_ref] [exactly: expression_ref] [mismatched: 'data_type_str'] [exactly: expression] [oneOf: 'string_1','string_2'] [lessthan | lessThanEqual: numVal] [greaterthan | greaterThanEqual: numVal] [contains: string_or_pattern] [startsWith|endsWith: string_or_pattern] action: [Drop|Keep] |
Token | Required? | Data Type | Description |
---|
filter | Y | transform | Name of the transform |
row | N | string | Expression identifying the row or rows to filter. If expression evaluates to true for a row, the row is either kept or deleted. |
col | N | string | Name of the column or expression for columns to drop |
missing | N | string | Name of column to evaluate for missing values. |
mismatched | N | string | String literal for the data type to check for mismatches. |
exactly | N | string | String literal, , or regular expression that evaluates to an exact match for a row value in the specified column. |
oneOf | N | string | List of string literals, any of which can be matched. |
lessThan or lessThanEqual | N | integer, decimal, or expression | Integer or decimal literal or expression evaluating to numeric value below which results in a match. Can also match on the specified expression exactly. Parameter is also used for the Between condition type. |
greaterThan or greaterThanEqual | N | integer, decimal, or expression | Integer or decimal literal or expression evaluating to numeric value above which results in a match. Can also match on the specified expression exactly. Parameter is also used for the Between condition type. |
contains | N | string | String literal, , or regular expression to be matches somewhere within the specified column values. |
startsWith | N | string | String literal, , or regular expression to match the beginnings of the values in the specified column. |
endsWith | N | string | String literal, , or regular expression to match the endings of the values in the specified column. |
action | Y | string | Drop or Keep the listed columns |
Include Page |
---|
| row Parameter |
---|
| row Parameter |
---|
|
Example:
D code |
---|
delete row: (lastContactDate < 01/01/2010 || status == 'Inactive') |
Output: Deletes any row in the dataset where the lastContactDate
is before January 1, 2010 or the status is Inactive
.
Required? | Data Type |
---|
Yes | Expression that evaluates to true or false |
Identifies the column or columns to which to apply the transform. You can specify one column or more columns.
Required? | Data Type |
---|
Yes | String (column name) |
For the Is Missing condition type, this value specifies the column to check for missing values.
Required? | Data Type |
---|
Required for Is Missing condition type only | String (column name) |
For the Mismatched condition type, this value specifies string for the data type identifier value to check for mismatches. For more information, see Valid Data Type Strings.
The col
parameter is also required.
Required? | Data Type |
---|
Required for Mismatched condition type only | String (data type identifier) |
For the Exactly condition type, this value is a String literal,
, or regular expression that exactly matches row values in the specified column.The col
parameter is also required.
Required? | Data Type |
---|
Required for Exactly condition type only | String (expression) |
For the One Of condition type, this value is a list of string literals,
, or regular expressions. If a row value for the specified column matches one of these expressions, the row is either deleted or kept.The col
parameter is also required.
Required? | Data Type |
---|
Required for One Of condition type only | List of string literals, , or regular expressions |
For the Less Than conditional types, this value is an Integer or Decimal literal or an expression that evaluates to an Integer or Decimal literal. If the value in the specified column is less than (or optionally equal to) this value, then the row is either deleted or kept.
The col
parameter is also required.
Required? | Data Type |
---|
Required for Less than (or equal to) condition type only | Integer or Decimal literal or expression evaluating to one of these data types |
For the Less Than conditional types, this value is an Integer or Decimal literal or an expression that evaluates to an Integer or Decimal literal. If the value in the specified column is greater than (or optionally equal to) this value, then the row is either deleted or kept.
The col
parameter is also required.
Required? | Data Type |
---|
Required for Greater than (or equal to) condition type only | Integer or Decimal type or expression evaluating to one of these data types |
For the Contains condition type, this value identifies a String literal,
, or regular expression, which is used to evaluate partial or full matches to row values in the specified column.The col
parameter is also required.
Required? | Data Type |
---|
Required for Contains condition type only | String literal, , or regular expression |
For the Starts With condition type, this value identifies the String literal,
, or regular expression with which a value must start in the specified column to match.The col
parameter is also required.
Required? | Data Type |
---|
Required for Starts with condition type only | String literal, , or regular expression |
For the Ends With condition type, this value identifies the String literal,
, or regular expression with which a value must end in the specified column to match.
The col
parameter is also required.
Required? | Data Type |
---|
Required for Ends with condition type only | String literal, , or regular expression |
Identifies whether the action performed by the transform:
Drop
- Listed columns are dropped from the dataset.Keep
- Listed columns are retained in the dataset, and all other columns are dropped.
Required? | Data Type |
---|
Yes | String (Drop or Keep ) |
See above.
D s also |
---|
label | wrangle_transform_filter |
---|
|