Page tree

Release 8.2



Contents:

   

Contents:


In Trifacta® Wrangler Enterprise, Trifacta patternsenable you to identify patterns in cell values and to perform replacements on those found elements of text. This section describes how to use patterns to find text and replace them with preferred values.

Tip: Patterns can also be used to extract values from cell values into a new column. The Trifacta patterns listed on this page can also be applied to the Extract text or pattern transformation. For additional example Trifacta patterns, see Extract Values.

Replace Methods

You can use the Replace text or patterns  transformation to replace values in one or more columns with literal values, Trifacta patterns, or regular expressions through any of the following methods. You can use this transformation to replace missing, mismatched, or bad data using the following methods.

Replace by selection

When you select a piece of text in the data grid, the replace suggestion card displayed in the Selection Details panel on the right side may contain Pattern-based options for finding the selected value and similar values in the column of data. You can use these suggestions to replace column values.

Steps:

  1. Select the data you want to replace. The suggestion cards are displayed.
  2. In the Selection Details panel on the right side, select the Replace pattern suggestion card and click Edit.
  3. The Replace text or patterns transformation is specified for you in the Transform Builder, where you can modify the Find value and other parameters as needed. See example below. 

Replace using Transformer toolbar 

In the Transformer toolbar at the top of the grid, click Replace > Text or Pattern . The Replace text or pattern transformation is displayed in the Transform Builder. For more information, see Transformer Toolbar.

For more information on procedures, see "Replace using Transform Builder" below.

Replace using Column Details panel 

You can review sets of patterns for the selected column in the Column Details panel. When you select a column in the Column Details panel, you are prompted with a set of suggested patterns. For more information, see Column Details Panel.

When a pattern suggestion is selected, it is specified in the Transform Builder for review and addition to your recipe. For more information, see "Replace using Transform Builder" below.

Replace using Transform Builder

In the Transform Builder, you can select one or more columns to replace text or patterns. 

Steps:

The following steps describe how to build a pattern-based replacement transformation from scratch in the Transform Builder.

Tip: Some selections in the data grid or related tools can lead to suggestions or pre-configured transformations in the Transform Builder.

  1. Enter Replace text or pattern in the Search panel. For more information, see Search Panel.
  2. Select an individual column or multiple columns from the following options:
    • Multiple: Select one or more columns from the drop-down list. 
    • All: Select all columns in the dataset. See below for an example.
    • Range: Specify a start column and an ending column. All columns in between are selected. 
    • Advanced: Specify the columns using a comma-separated list. You can combine multiple and range options under Advanced. 

      • Ranges of columns can be specified using the tilde (~) character.

      • The following example range selects from the dataset as displayed in the data grid column1, column3, and the range of columns between column5  and column8, inclusive:

        column1,column3,column5~column8
  3. In the Find text box, enter the text value or pattern that matches the value you want to replace. For more information, see "Find Values in a Column" below.
  4. In the Replace text box, enter the value to replace the found text.
  5. For additional controls, click Advanced Options:
    1. Start search after: Enter a text or pattern that precedes the value you want to replace. See below example.
    2. Start search before: Enter a text or pattern that follows the value you want to replace. See below example.
    3. Ignore case: If selected, case is ignored when matching.
    4. Match all occurrences: If selected, all occurrences of the found text in the column are matched and replaced.
  6. Click Add. The transformation is added to your recipe, and the selected columns are replaced with appropriate patterns in the data grid.

Find Values in a Column

The Replace with text or pattern transformation enables you to replace values within the specified column or columns based on a string literal or Trifacta patterns. When you specify the transformation in the Transform Builder, the Find textbox can be populated with one of the following types of values:

Find typeDescriptionDelimiterExample
LiteralA literal pieces of textsingle quotes

'My piece of text'

Trifacta pattern

A Trifacta pattern represents zero or more characters that match a pattern. In Trifacta Wrangler Enterprise, Patterns are a simplified means of expressing regular expressions. For more information on Trifacta pattern syntax, see  Text Matching .

Tip: The examples in this section use Trifacta Patterns, which are simpler to use than regular expressions.

back-ticks

`{start}{digit}{3}`

Regular expression

Regular expressions are a standard-based method of describing patterns in values.

NOTE: Regular expressions are considered a developer-level skill. For more information on regular expression, see on  RE2  and  PCRE  regular expressions.

forward slashes

/^.{0,3}/

Examples

The following examples demonstrate how Trifacta Patterns can be used to find and replace values within a column or set of columns.

Replace first three characters 

This example uses  Trifacta Pattern to find the first three characters. In this example, the first three characters of the Customer ID column are replaced with the value CustID- for the selected column in the dataset.

Transformation:

Transformation Name Replace text or patterns
Parameter: Column CustomerID
Parameter: Find `{start}%{3}`
Parameter: Replace with CustID-

Results:

BeforeAfter
Tri02468CustID-02468
Mul2239 CustID-2239
Zev5521 CustID-5521

Replace using literal expressions 

This example is based on the search and replace content in your dataset using literals. In the following example, the value ##CLT_NAME## is replaced with Our Customer, Inc. across all columns in the dataset.

Transformation:

Transformation Name Replace text or patterns
Parameter: Column All
Parameter: Find '##CLT_NAME##'
Parameter: Replace with 'Our Customer, Inc.'
Parameter: Match all occurrences true

Replace string of four digits

Tip: For privacy reasons or sensitivity reasons, you can mask the sensitive data with the following replacements.

The following example uses  Trifacta Patterns to find a string of four digits. The replacement is based on the structure of the data, not on the type of data. If you have data that are not credit card numbers yet follows the four-digit pattern, those values can also be replaced. In this example, the myCreditCardNumbers column is masked with XXXX.

Transformation:

Transformation Name Replace text or patterns
Parameter: Columns myCreditCardNumbers
Parameter: Find `{start}{digit}{4}{any}{digit}{4}{any}{digit}{4}{any}({digit}{4}){end}`
Parameter: Replace with XXXX-XXXX-XXXX-$1

Results:

BeforeAfter
1234-1234-1234-1234XXXX-XXXX-XXXX-1234
1111-1111-1111-1111 XXXX-XXXX-XXXX-1111
4321-4321-4321-4321 XXXX-XXXX-XXXX-4321

Using capture groups

The previous example captures aspects of the found pattern for use during replacement. A capture group is a mechanism in Trifacta Patterns or regular expressions to capture one or more parts of the matched values into variables. 

In the example, the last four-digit segment of the Trifacta Pattern is surrounded by parentheses:

({digit}{4}){end}

This group of digits is captured as the first (and only) capture group. In the replacement string, it is referenced as: 

$1

You can have multiple capture groups in a single pattern. In the replacement, these capture groups can be referenced sequentially left-to-right from the pattern: $1$2, and so on.

For more information, see Capture Group References. See below example.

Tip: You can use both {digit} and {#} Trifacta patterns for columns containing numeric values.

Replace date and time patterns

The following example is based on replacing the date and time using the pre-configured suggestions displayed in the search context panel. In this example, the date Trifacta Patterns  yy/mm/dd is replaced with mm/dd/yy.

Transformation:

Transformation Name Replace text or patterns
Parameter: Column ORDER_DATE
Parameter: Find `({yy}){delim}({MM}){delim}({dd})`
Parameter: Replace with $2-$1-$3

Results:

BeforeAfter
20/11/0211/02/20
20/11/2211/22/20
20/11/2611/26/20

Replace based on position

You can specify replacements based on the character position of values in your source column values. This method of finding and replacing values is useful if the source column data is consistently structured.

For example, suppose you have dates in the following format:

Before
2020-05-01
2020-05-02
2020-05-03

Transformation:

Suppose you wanted to replace the value for the month with Month, you could add the following transformation step:

Transformation Name Replace between positions
Parameter: Column Before
Parameter: Start position 6
Parameter: End position 8
Parameter: Replace with Month

Results:

After
2020-Month-01
2020-Month-02
2020-Month-03

To replace the four digits of the year, you could perform a basic replace text or pattern transformation with a pattern to find of the following:

`{start}{digit}{4}`

Replace alpha-numeric and position patterns

You can use alpha-numeric and position Trifacta patterns for replacing the customer's address in the  the dataset. In this example, {alpha-numeric} pattern is applied to find the customer's addresses and used {start} and {end} pattern to mention the position of replacement. For more information on Pattern Syntax, see  Text Matching .

Transformation:

Transformation Name Replace text or patterns
Parameter: Column address_street_number
Parameter: Find `{alpha-numeric}`
Parameter: Replace with ##
Parameter: Start search after `{start}{digit}{2}`
Parameter: Stop search before `{any}`

Results:

BeforeAfter
3298, Church Street 32##, Church Street
4132, Park Avenue41##, Park Avenue
1234, McGrath Road 12##, McGrath Road

Replace using special patterns

You can use the following special Trifacta Pattern tokens to search for matches in your source values. In some cases, these Trifacta Patterns are consistent with the patterns used for specific data types.

PatternDescription
`{at-username}`

Matches values that begin with an at-sign, such as @trifacta. This Trifacta Pattern can be useful if you need to remap or mask username values.

`{hashtag}`

Matches values that begin with a hashtag, such as #dataprep. For an example of this Trifacta Pattern, see Extract Values.

`{hex}`
Matches values that are valid hexadecimal (base-16) numbers. These values contain a string of numerals, letters A-F, and combinations of them, without spaces. Examples: AE00, 1F2F, 100.
`{phone}`
Matches valid phone numbers within a set of values. For more information on this data type pattern, see Phone Number Data Type.
`{email}`
Matches valid email addresses within a set of values. For more information on this data type pattern, see Email Address Data Type.
`{url}`
Matches valid URL addresses within a set of values. For more information, on this data type pattern, see URL Data Type.

This page has no comments.