|
Tip: Patterns can also be used to extract values from cell values into a new column. The |
You can use the Replace text or patterns transformation to replace values in one or more columns with literal values, , or regular expressions through any of the following methods. You can use this transformation to replace missing, mismatched, or bad data using the following methods.
When you select a piece of text in the data grid, the replace suggestion card displayed in the Selection Details panel on the right side may contain Pattern-based options for finding the selected value and similar values in the column of data. You can use these suggestions to replace column values.
Steps:
In the Transformer toolbar at the top of the grid, click Replace > Text or Pattern . The Replace text or pattern transformation is displayed in the Transform Builder. For more information, see Transformer Toolbar.
For more information on procedures, see "Replace using Transform Builder" below.
You can review sets of patterns for the selected column in the Column Details panel. When you select a column in the Column Details panel, you are prompted with a set of suggested patterns.
For more information on suggestions, see Overview of Predictive Transformation.
Replace using Transform Builder
When a pattern suggestion is selected, it is specified in the Transform Builder for review and addition to your recipe. In the Transform Builder, you can select one or more columns to replace text or patterns.
Steps:
The following steps describe how to build a pattern-based replacement transformation from scratch in the Transform Builder.
Tip: Some selections in the data grid or related tools can lead to suggestions or pre-configured transformations in the Transform Builder. |
Replace text or pattern
in the Search panel.Advanced: Specify the columns using a comma-separated list. You can combine multiple and range options under Advanced.
Ranges of columns can be specified using the tilde (~
) character.
The following example range selects from the dataset as displayed in the data grid column1
, column3
, and the range of columns between column5
and column8
, inclusive:
column1,column3,column5~column8 |
The Replace with text or pattern
transformation enables you to replace values within the specified column or columns based on a string literal or . When you specify the transformation in the Transform Builder, the Find textbox can be populated with one of the following types of values:
Find type | Description | Delimiter | Example | ||
---|---|---|---|---|---|
Literal | A literal pieces of text | single quotes |
| ||
A
| back-ticks |
| |||
Regular expression | forward slashes |
|
The following examples demonstrate how can be used to find and replace values within a column or set of columns.
This example uses to find the first three characters. In this example, the first three characters of the
Customer ID
column are replaced with the value CustID-
for the selected column in the dataset.
Transformation:
Results:
Before | After |
---|---|
Tri02468 | CustID-02468
|
Mul2239 |
CustID-2239
|
Zev5521 |
CustID-5521
|
This example is based on the search and replace content in your dataset using literals. In the following example, the value ##CLT_NAME##
is replaced with Our Customer, Inc.
across all columns in the dataset.
Transformation:
Tip: For privacy reasons or sensitivity reasons, you can mask the sensitive data with the following replacements. |
The following example uses to find a string of four digits. The replacement is based on the structure of the data, not on the type of data. If you have data that are not credit card numbers yet follows the four-digit pattern, those values can also be replaced. In this example, the
myCreditCardNumbers
column is masked with XXXX
.
Transformation:
Results:
Before | After |
---|---|
1234-1234-1234-1234 | XXXX-XXXX-XXXX-1234 |
1111-1111-1111-1111 |
XXXX-XXXX-XXXX-1111 |
4321-4321-4321-4321 |
XXXX-XXXX-XXXX-4321
|
The previous example captures aspects of the found pattern for use during replacement. A capture group is a mechanism in or regular expressions to capture one or more parts of the matched values into variables.
In the example, the last four-digit segment of the is surrounded by parentheses:
({digit}{4}){end} |
This group of digits is captured as the first (and only) capture group. In the replacement string, it is referenced as:
$1 |
You can have multiple capture groups in a single pattern. In the replacement, these capture groups can be referenced sequentially left-to-right from the pattern: $1
, $2
, and so on.
For more information, see Capture Group References.
Tip: You can use both |
The following example is based on replacing the date and time using the pre-configured suggestions displayed in the search context panel. In this example, the date
yy/mm/dd
is replaced with mm/dd/yy
.
Transformation:
Results:
Before | After |
---|---|
20/11/02 | 11/02/20 |
20/11/22 | 11/22/20 |
20/11/26 | 11/26/20 |
You can specify replacements based on the character position of values in your source column values. This method of finding and replacing values is useful if the source column data is consistently structured.
For example, suppose you have dates in the following format:
Before |
---|
2020-05-01 |
2020-05-02 |
2020-05-03 |
Transformation:
Suppose you wanted to replace the value for the month with Month
, you could add the following transformation step:
Results:
After |
---|
2020-Month-01 |
2020-Month-02 |
2020-Month-03 |
To replace the four digits of the year, you could perform a basic replace text or pattern transformation with a pattern to find of the following:
`{start}{digit}{4}` |
You can use alpha-numeric and position for replacing the customer's address in the the dataset. In this example,
{alpha-numeric}
pattern is applied to find the customer's addresses and used {start}
and {end}
pattern to mention the position of replacement. For more information on Pattern Syntax, see Text Matching .
Transformation:
Results:
Before | After |
---|---|
3298, Church Street |
32##, Church Street
|
4132, Park Avenue | 41##, Park Avenue |
1234, McGrath Road |
12##, McGrath Road
|
You can use the following special tokens to search for matches in your source values. In some cases, these
are consistent with the patterns used for specific data types.
Pattern | Description | |
---|---|---|
| Matches values that begin with an at-sign, such as | |
| Matches values that begin with a hashtag, such as | |
| Matches values that are valid hexadecimal (base-16) numbers. These values contain a string of numerals, letters A-F, and combinations of them, without spaces. Examples: AE00 , 1F2F , 100 . | |
| Matches valid phone numbers within a set of values. For more information on this data type pattern, see Phone Number Data Type. | |
| Matches valid email addresses within a set of values. For more information on this data type pattern, see Email Address Data Type. | |
| Matches valid URL addresses within a set of values. For more information, on this data type pattern, see URL Data Type. |