Match Types

 supports the following types of text matching clauses:

Tip: You can create patterns to match source values in a column by example. By providing example matches for values in your source column, you can rapidly build complex pattern-based matches. For more information on transformation by example, see Overview of TBE.

 Syntax

The following tables contain syntax information about :

Tip: After using , regular expressions, or string literals in a recipe step, you can reuse them in your transformations where applicable.


Character patterns

These patterns apply to single characters and strings of characters

PatternDescription
%Match any character, exactly once
%?Match any character, zero or one times
%*Match any character, zero or more times
%+match any character, one or more times
%{3}Match any character, exactly three times
%{3,5}Match any character, 3, 4, or 5 times
#Digit character [0-9]
{any}Match any character, exactly once
{alpha}Alpha character [A-Za-z_]
{upper}Uppercase alpha character [A-Z_]
{lower}Lowercase alpha character [a-z_]
{digit}Digit character [0-9]
{delim}Single delimiter character e.g :, ,, |, /, -, ., \s
{delim-ws}Single delimiter and all the whitespace around it
{alpha-numeric}Match a single alphanumeric character
{alphanum-underscore}Match a single alphanumeric character or underscore character
{at-username}Match @username values
{hashtag}Match #hashtag values
{hex}Match hexadecimal number (e.g. 2FA3)

Position patterns

These patterns describe positions relative to the entire string.

PatternDescription
{start}Match the start of the line
{end}Match the end of the line

Type patterns

These patterns can be used to match strings that fit a particular data type, except for Datetime patterns.

PatternDescription
{phone}Match a valid U.S. phone number. See Phone Number Data Type.
{email}Match a valid email address. See .Email Address Data Type
{url}Match a valid URL. See URL Data Type.
{ip-address}Match a valid IP address. See IP Address Data Type.
{hex-ip-address}Match a valid hexadecimal IP address (e.g. 0x0CA40012)
{bool}Match a valid Boolean value. See Boolean Data Type.
{street}Match a U.S.-formatted street address (e.g. 123 Main Street)
{occupancy}Match a valid U.S.-formatted occupancy address value (e.g. Apt 2D)
{city}Match a city name within U.S.-formatted address value
{state}Match a valid U.S. state value (e.g. California).
{state-abbrev}Match a valid two-letter U.S. state abbreviation value (e.g. CA)
{zip}Match a valid five-digit zip code

Datetime patterns

PatternDescription
{month}Match full name of month (e.g. January)
{month-abbrev}Match short name of month (e.g. Jan)
{time}Match time value in HOUR:MINUTE:SECOND format (e.g. 11:59:23)
{period}Match time period of the day: AM/PM
{dayofweek}Match long name for day of the week (e.g. Sunday).
{dayofweek-abbrev}Match short name for day of the week (e.g. Sun).
{utcoffset}Match a valid UTC offset value (e.g. -0500, +0400, Z)

NOTE: You can use the Datetime data type formatting tokens as part of your to build a variety of matching patterns for date and time values. See Datetime Data Type.


Grouping patterns

PatternDescription
{[...]}character class matches characters in brackets
{![...]}negated class matches characters not in brackets
(...)grouping, including captures
#, %, ?, *, +, {, }, (, ), \, ’, \n, \tescaped characters or pattern modifiers Use a double backslash (\\) to denote an escaped string literal. For more information, see Escaping Strings in Transformations.
|logical OR

See also Capture Group References.

 Examples

Basic

Match first three characters:

`{start}%{3}`

Match last four letters (numeric or other character types do not match):

`{alpha}{4}{end}`

Match first word:

`{start}{alpha}+`

Matches date values in general YYYY*MM*dd format:

`{yyyy}{delim}{MM}{delim}{dd}`

Matches time values in 12-hour format:

`{h}{delim}{mm}{delim}{s}`

In transformations

The following transformation masks credit card number patterns, except for the last four digits:

Notes:

The above transformation matches values based on the structure of the data, instead of the data type.

So to be safe, you might try the following set of transformations to ensure that you are matching on credit card values.

Step 1: If the number in your source column is valid, write it to a new column.

Notes:

Step 2: The myCreditCardNumbersMasked column now contains values that are valid credit card numbers from your source column. You can now apply the masking step.

Step 3: If needed, you can move the masked values back to the source column. 

The myCreditCardNumbers column now contains only valid credit card numbers that have been asked. The application is likely to infer the data type of the column as String.

Delete the myCreditCardNumbersMasked column.