Skip to main content

RegEx Tool

Use RegEx to use regular-expression syntax to parse, match, or replace data.

Tip

This tool has a One Tool Example. Visit Access Sample Workflows to learn how to access this and many other examples directly in Designer Cloud.

RegEx Support

Although regular expressions are supported in Designer Cloud, users are responsible for their own expressions and how the expressions affect their data.

For more resources on how to write regular expressions, visit https://regex101.com/, a site unaffiliated with Alteryx, or the RegEx Coach, an unaffiliated graphical application for Windows that you can use to experiment with (Perl-compatible) regular expressions interactively.

For additional information about Boost RegEx, visit these resources:

Configure the Tool

  1. Select a Column to Parse.

  2. Enter your Regular Expression. Use the plus-sign button to see common regular expressions. Find more information on the Perl Regular Expression Syntax website.

    • Check Case Sensitive to have searches distinguish between uppercase and lowercase letters.

  3. Select the Output Method to use when parsing, then configure the related properties.

    • Match: Append a column containing a number: 1 if the expression matched, 0 if it did not.

      • Column Name for Match Status: Provide a name for the appended column.

    • Parse: Separate the expression into new columns, and set the Name and Type of the new columns. A new column is created in the Output Columns table with these parameters:

      • Name: Select column name to enter a new name.

      • Type: Use the dropdown to select the new data type.

      • Expression: Populated automatically.

    • Replace: Replace the expression you searched for with a 2nd expression.

      • Replace With: Enter an expression to replace your original regular expression by identifying the marked group to replace the expression with. Use the plus-sign button to see common regular expressions.

      • Check Copy unmatched text to output to copy the unmatched text to the output.

    • Tokenize: Split the incoming data using a regular expression. This option works similarly to the Text To Columns tool, except instead of matching and removing what you don't want to keep, you match for what you want to keep. You want to match to the whole token, and if you have a marked group, only that part is returned. Go to Tokenize Method Examples below.

      • Split into Columns: Split a single column of data at each instance of the specified delimiter into multiple columns.

        • Number of Output Columns: Set how many columns are created.

        • Output Column Header: Enter the name that the newly generated columns should be based on. The new columns are named as the root name with a serially increasing integer appended.

        • If Extra Columns in Output: Select the behavior that is applied to extra columns.

          • Do Not Include: Data that extends past the split is dropped and no warning is generated.

          • Do Not Include and Show Warning: Data that extends past the split is dropped and a warning is generated indicating that there was excess information.

          • Error (Stop Processing Data): Data the extends past the split causes an error and the workflow stops processing.

      • Split into Rows: Split a single column of data at each instance of the specified delimiter into multiple rows. Use a key column in your record so you don't lose track of which value came from which row.

Tokenize Method Examples

These use cases rely on the Regex tool's Tokenize method.

  • Parse a 9 character string 123456789 into 3 fields. The regular expression is ....

  • Parse a 9 character string into 3 fields, returning only the 2nd character. The regular expression is .(.)..

  • Parse a field with the delimiter Ctrl-A. The regular expression for tokenizing a Ctrl-A delimited string is [^\cA]+.

    • [^...] The brackets specify a match to a single character in a set of characters. Starting the set with ^ changes it to match any character not in the set.

    • \cA This simply matches the Ctrl-A character.

    • + This means match 1 or more of the previous.

  • Allow blank tokens to preserve entries: abc, ,def. The regular expression is ([^,]*) (?:,|$).

    • (...) Parenthesis create a marked group of the expression. The tokenize mode allows you to match a larger part of the input field, but only return a subset that was marked. That way, you avoid returning the delimiter. You can only have 1 marked expression.

    • [^,] Starting the set with ^ changes it to match any character not in the set, in this case a ,.

    • * match 0 or more of the previous, which allows for an empty set. You can't end here because the regex engine doesn't like a match of 0 characters because there is an infinite number of matches, so we have to terminate the match in another way.

    • (?:....)is an unmarked group. We need that for the or we use a | for.

    • |matches what comes before or after, but not both. That almost always needs to be used with a marked or unmarked group.

    • $ matches the end of the string. For example, (?:,|$) matches up to a , or the end of the string.

  • Parse HTML links from a home page. The regular expression is <a .*?>.*?</a>. That pulls every link out of a large HTML document into a series of rows.

    • <ais a literal match for the text "<a."

    • .*?. is any character, * is 0 or more. The ? modifies the * to make it match to the shortest possible match. Without that, the expression might find a single token from the beginning of the first link to the end of the last.

    • > This is a literal match for the text ">."

    • .*? The shortest possible match of any characters that still satisfies the entire regex.

    • </a> This is a literal match for the text "</a>." That ends the match.