Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0821

D toc

In

D s product
rtrue
,
D s item
itempatterns
enable you to identify patterns in cell values and to perform replacements on those found elements of text. This section describes how to use patterns to find text and replace them with preferred values.

Tip

Tip: Patterns can also be used to extract values from cell values into a new column. The

D s item
itempatterns
listed on this page can also be applied to the Extract text or pattern transformation. For additional example
D s item
itempatterns
, see Extract Values.

Replace Methods

You can use the Replace text or patterns  transformation to replace values in one or more columns with literal values,

D s item
itempatterns
, or regular expressions through any of the following methods. You can use this transformation to replace missing, mismatched, or bad data using the following methods.

Replace by selection

When you select a piece of text in the data grid, the replace suggestion card displayed in the Selection Details panel on the right side may contain Pattern-based options for finding the selected value and similar values in the column of data. You can use these suggestions to replace column values.

Steps:

  1. Select the data you want to replace. The suggestion cards are displayed.
  2. In the Selection Details panel on the right side, select the Replace pattern suggestion card and click Edit.
  3. The Replace text or patterns transformation is specified for you in the Transform Builder, where you can modify the Find value and other parameters as needed. See example below. 

Replace using Transformer toolbar 

In the Transformer toolbar at the top of the grid, click Replace > Text or Pattern . The Replace text or pattern transformation is displayed in the Transform Builder. For more information, see Transformer Toolbar.

For more information on procedures, see "Replace using Transform Builder" below.

Replace using Column Details panel 

You can review sets of patterns for the selected column in the Column Details panel. When you select a column in the Column Details panel, you are prompted with a set of suggested patterns. For more information, see Column Details Panel.

When a pattern suggestion is selected, it is specified in the Transform Builder for review and addition to your recipe. For more information, see "Replace using Transform Builder" below.

Replace using Transform Builder

In the Transform Builder, you can select one or more columns to replace text or patterns. 

Steps:

The following steps describe how to build a pattern-based replacement transformation from scratch in the Transform Builder.

Tip

Tip: Some selections in the data grid or related tools can lead to suggestions or pre-configured transformations in the Transform Builder.

  1. Enter Replace text or pattern in the Search panel. For more information, see Search Panel.
  2. Select an individual column or multiple columns from the following options:
    • Multiple: Select one or more columns from the drop-down list. 
    • All: Select all columns in the dataset. See below for an example.
    • Range: Specify a start column and an ending column. All columns in between are selected. 
    • Advanced: Specify the columns using a comma-separated list. You can combine multiple and range options under Advanced. 

      • Ranges of columns can be specified using the tilde (~) character.

      • The following example range selects from the dataset as displayed in the data grid column1, column3, and the range of columns between column5  and column8, inclusive:

        Code Block
        column1,column3,column5~column8
  3. In the Find text box, enter the text value or pattern that matches the value you want to replace. For more information, see "Find Values in a Column" below.
  4. In the Replace text box, enter the value to replace the found text.
  5. For additional controls, click Advanced Options:
    1. Start search after: Enter a text or pattern that precedes the value you want to replace. See below example.
    2. Start search before: Enter a text or pattern that follows the value you want to replace. See below example.
    3. Ignore case: If selected, case is ignored when matching.
    4. Match all occurrences: If selected, all occurrences of the found text in the column are matched and replaced.
  6. Click Add. The transformation is added to your recipe, and the selected columns are replaced with appropriate patterns in the data grid.

Find Values in a Column

The Replace with text or pattern transformation enables you to replace values within the specified column or columns based on a string literal or

D s item
itempatterns
. When you specify the transformation in the Transform Builder, the Find textbox can be populated with one of the following types of values:

Find typeDescriptionDelimiterExample
LiteralA literal pieces of textsingle quotes
D lang syntax

'My piece of text'

D s item
itempattern

A

D s item
itempattern
represents zero or more characters that match a pattern. In
D s product
, Patterns are a simplified means of expressing regular expressions. For more information on
D s item
itempattern
syntax, see  Text Matching .

Tip

Tip: The examples in this section use

D s item
itemPatterns
, which are simpler to use than regular expressions.

back-ticks
D lang syntax

`{start}{digit}{3}`

Regular expression

Regular expressions are a standard-based method of describing patterns in values.

Info

NOTE: Regular expressions are considered a developer-level skill. For more information on regular expression, see on  RE2  and  PCRE  regular expressions.

forward slashes
D lang syntax

/^.{0,3}/

Examples

The following examples demonstrate how

D s item
itemPatterns
 can be used to find and replace values within a column or set of columns.

Replace first three characters 

This example uses  

D s item
itemPattern
 to find the first three characters. In this example, the first three characters of the Customer ID column are replaced with the value CustID- for the selected column in the dataset.

Transformation:

D trans
p03ValueCustID-
Typeref
p01NameColumn
p01ValueCustomerID
p02NameFind
p02Value`{start}%{3}`
p03NameReplace with
SearchTermReplace text or patterns

Results:

BeforeAfter
Tri02468CustID-02468
Mul2239 CustID-2239
Zev5521 CustID-5521

Replace using literal expressions 

This example is based on the search and replace content in your dataset using literals. In the following example, the value ##CLT_NAME## is replaced with Our Customer, Inc. across all columns in the dataset.

Transformation:

D trans
p03Value'Our Customer, Inc.'
Typestep
p01NameColumn
p01ValueAll
p02NameFind
p02Value'##CLT_NAME##'
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or patterns

Replace string of four digits

Tip

Tip: For privacy reasons or sensitivity reasons, you can mask the sensitive data with the following replacements.

The following example uses  

D s item
itemPatterns
 to find a string of four digits. The replacement is based on the structure of the data, not on the type of data. If you have data that are not credit card numbers yet follows the four-digit pattern, those values can also be replaced. In this example, the myCreditCardNumbers column is masked with XXXX.

Transformation:

D trans
p03ValueXXXX-XXXX-XXXX-$1
Typeref
p01NameColumns
p01ValuemyCreditCardNumbers
p02NameFind
p02Value`{start}{digit}{4}{any}{digit}{4}{any}{digit}{4}{any}({digit}{4}){end}`
p03NameReplace with
SearchTermReplace text or patterns

Results:

BeforeAfter
1234-1234-1234-1234XXXX-XXXX-XXXX-1234
1111-1111-1111-1111 XXXX-XXXX-XXXX-1111
4321-4321-4321-4321 XXXX-XXXX-XXXX-4321

Using capture groups

The previous example captures aspects of the found pattern for use during replacement. A capture group is a mechanism in 

D s item
itemPatterns
 or regular expressions to capture one or more parts of the matched values into variables. 

In the example, the last four-digit segment of the 

D s item
itemPattern
 is surrounded by parentheses:

Code Block
({digit}{4}){end}

This group of digits is captured as the first (and only) capture group. In the replacement string, it is referenced as: 

Code Block
$1

You can have multiple capture groups in a single pattern. In the replacement, these capture groups can be referenced sequentially left-to-right from the pattern: $1$2, and so on.

For more information, see Capture Group References. See below example.

Tip

Tip: You can use both {digit} and {#}

D s item
itempatterns
for columns containing numeric values.

Replace date and time patterns

The following example is based on replacing the date and time using the pre-configured suggestions displayed in the search context panel. In this example, the date

D s item
itemPatterns
  yy/mm/dd is replaced with mm/dd/yy.

Transformation:

D trans
p03Value$2-$1-$3
Typestep
p01NameColumn
p01ValueORDER_DATE
p02NameFind
p02Value`({yy}){delim}({MM}){delim}({dd})`
p03NameReplace with
SearchTermReplace text or patterns

Results:

BeforeAfter
20/11/0211/02/20
20/11/2211/22/20
20/11/2611/26/20

Replace based on position

You can specify replacements based on the character position of values in your source column values. This method of finding and replacing values is useful if the source column data is consistently structured.

For example, suppose you have dates in the following format:

Before
2020-05-01
2020-05-02
2020-05-03

Transformation:

Suppose you wanted to replace the value for the month with Month, you could add the following transformation step:

D trans
p03Value8
Typestep
WrangleTextreplacepositions col: Before with: 'Month' start: 6 end: 8
p01NameColumn
p01ValueBefore
p02NameStart position
p02Value6
p03NameEnd position
p04ValueMonth
p04NameReplace with
SearchTermReplace between positions

Results:

After
2020-Month-01
2020-Month-02
2020-Month-03

To replace the four digits of the year, you could perform a basic replace text or pattern transformation with a pattern to find of the following:

D lang syntax
Typeref
showNotetrue

`{start}{digit}{4}`

Replace alpha-numeric and position patterns

You can use alpha-numeric and position

D s item
itempatterns
 for replacing the customer's address in the  the dataset. In this example, {alpha-numeric} pattern is applied to find the customer's addresses and used {start} and {end} pattern to mention the position of replacement. For more information on Pattern Syntax, see  Text Matching .

Transformation:

D trans
p03Value##
Typestep
p05NameStop search before
p01NameColumn
p01Valueaddress_street_number
p02NameFind
p02Value`{alpha-numeric}`
p05Value`{any}`
p03NameReplace with
p04Value`{start}{digit}{2}`
p04NameStart search after
SearchTermReplace text or patterns

Results:

BeforeAfter
3298, Church Street 32##, Church Street
4132, Park Avenue41##, Park Avenue
1234, McGrath Road 12##, McGrath Road

Replace using special patterns

You can use the following special 

D s item
itemPattern
 tokens to search for matches in your source values. In some cases, these
D s item
itemPatterns
 are consistent with the patterns used for specific data types.

PatternDescription
Code Block
`{at-username}`

Matches values that begin with an at-sign, such as @trifacta. This

D s item
itemPattern
can be useful if you need to remap or mask username values.

Code Block
`{hashtag}`

Matches values that begin with a hashtag, such as #dataprep. For an example of this

D s item
itemPattern
, see Extract Values.

Code Block
`{hex}`
Matches values that are valid hexadecimal (base-16) numbers. These values contain a string of numerals, letters A-F, and combinations of them, without spaces. Examples: AE00, 1F2F, 100.
Code Block
`{phone}`
Matches valid phone numbers within a set of values. For more information on this data type pattern, see Phone Number Data Type.
Code Block
`{email}`
Matches valid email addresses within a set of values. For more information on this data type pattern, see Email Address Data Type.
Code Block
`{url}`
Matches valid URL addresses within a set of values. For more information, on this data type pattern, see URL Data Type.