Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Release 8.2.2


Contents:

   

Contents:


NOTE: Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.

Extracts a subset of data from one column and inserts it into a new column, based on a specified string or pattern. The source column in unmodified. 

Tip: Use the extract transform if you need to retain the source column. Otherwise, you might be able to use the split transform. See Split Transform.

Basic Usage

extract col: text on: 'honda' limit: 10

Output: Extracts the value honda from the source column text up to 10 times and insert into a new column. The source column text is unmodified.

Syntax and Parameters

extract col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:max_count] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)]

NOTE: At least one of the following parameters must be included to specify the pattern to extract: after, at, before, from, on, to.

TokenRequired?Data TypeDescription
extractYtransformName of the transform
colYstringSource column name
quoteNstringSpecifies a quoted object that is omitted from pattern matching
ignoreCaseNbooleanIf true, matching is case-insensitive.
limitNinteger (positive)Identifies the number of extractions that can be performed from a single value. Default is 1.

Matching parameters:

ParameterRequired?Data TypeDescription
afterNstringString literal or pattern that precedes the pattern to match
atNArrayTwo-integer array identifying the character indexes of start and end characters to match
beforeNstringString literal or pattern that appears after the pattern to match
fromNstringString literal or pattern that identifies the start of the pattern to match
onNstringString literal or pattern that identifies the pattern to match.
toNstringString literal or pattern that identifies the end of the pattern to match

For more information on syntax standards, see Language Documentation Syntax Notes.

col

Identifies the column to which to apply the transform. You can specify only one column.

extract col: MyCol on: 'MyString'

Output: Extracts value My String in new column if it is present in MyCol. Otherwise, new column value is blank.

Usage Notes:

Required?Data Type
YesString (column name)

after

extract col: MyCol after: 'Important:'

Output: Extracts value in MyCol that appears after the string Important:. If the after value does not appear in the column, the output value is blank.

A pattern identifier that precedes the value or pattern to match. Define the after parameter value using string literals, regular expressions, or Patterns .

Usage Notes:

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar. from includes the matching value as part of the extracted string.
  • after can be used with either toon, or before. See  Pattern Clause Position Matching.

at

extract col: MyCol at: 2,6

Output: Extracts contents of MyCol that starts at the second character in the column and extends to the sixth character of the column.

Identifies the start and end point of the pattern to interest.

Parameter inputs are in the form of x,y where x and y are positive integers indicating the starting character and ending character, respectively, of the pattern of interest.

  • x must be less than y.
  • If y is greater than the length of the value, the pattern is defined to the end of the value, and a match is made.

Usage Notes:

Required?Data Type
NoArray of two Integers ( X,Y )

The at parameter cannot be combined with any of the following: onafterbeforefromto, and quote. See  Pattern Clause Position Matching.

before

extract col: MyCol before: '|'

Output: Extracts contents of MyCol that occur before the pipe character (|). If the before value does not appear in the column, the output value is blank.

A pattern identifier that occurs after the value or pattern to match. Define the pattern using string literals, regular expressions, or Patterns .

Usage Notes:

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the extracted string.
  • before can be used with either fromon, or afterSee  Pattern Clause Position Matching .

from

extract col: MyCol from: 'go:'

Output: Extracts contents from MyCol that occur after go:, including go:. If the from value does not appear in the column, the output value is blank.

Identifies the pattern that marks the beginning of the value to match. It can be a string literal, Pattern , or regular expression. The from value is included in the match.

Usage Notes:

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar.  from includes the matching value as part of the extracted string.
  • from can be used with either to or beforeSee  Pattern Clause Position Matching .

on

extract col: MyCol on: `###ERROR`

Identifies the pattern to match, which can be a string literal, Pattern , or regular expression.

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

Usage Notes:

Required?Data Type
No

String (literal, regular expression, or Alteryx pattern )

to

extract col:MyCol from:'note:' to: `{end}`

Output: Extracts from MyCol column all values that begin with note: up to the end of the value. 

Identifies the pattern that marks the ending of the value to match. Pattern can be a string literal,  Patterns , or regular expression. The to value is included in the match.

Usage Notes:

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the extracted string.
  • to can be used with either from or after. See Pattern Clause Position Matching.

quote

extract col: MyCol quote: 'First' after: `{start}%?`

Output: Extracts each cell value from the MyCol column, starting at the second character in the cell, as long as the string First does not appear in the cell.

Can be used to specify a string as a single quoted object. This parameter value can be one or more characters.

Usage Notes:

Required?Data Type
NoString
  • Parameter value is the quoted object.
  • The quote value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.

ignoreCase

extract col: MyCol on: 'My String' ignoreCase: true

Output: Extracts the following values if they appear in the MyCol column: My Stringmy stringMy string, etc.

Indicates whether the match should ignore case or not.

  • Set to true to ignore case matching.
  • (Default) Set to false to perform case-sensitive matching.

Usage Notes:

Required?Data Type
NoBoolean

limit

extract col: MyCol on: 'z' limit: 3

Output: Extracts each instance of the letter z in the MyCol column into a separate column, generating up to 3 new columns.

The limit parameter defines the maximum number of times that a pattern can be matched within a column.

NOTE: The limit parameter cannot be used with the following parameters: at, positions, or delimiters.

A set of new columns is generated, as defined by the limit parameter. Each matched instance populates a separate column, until there are no more matches or all of the limit-generated new columns are filled.

Usage Notes:

Required?Data Type
NoInteger (positive)
  • Defines the maximum number of columns that can be created by the extract transform. 
  • If not specified, exactly one column is created. 

 

Examples


Tip: For additional examples, see Common Tasks.

Example - Extract First Name

Source:

Name
Mr. Mike Smith
Dr Jane Jones
Miss Meg Moon

Transformation:

The following transformation extracts the second word in the above dataset. Content is extracted after the first space and before the next space.

Tip: If you want to break out salutation, first name, and last name at the same time, you should use the Split Column transformation instead.

Transformation Name Extract text or pattern
Parameter: Column to extract from Name
Parameter: Option Custom text or pattern
Parameter: Start extracting after ' '
Parameter: End extracting before ' '

Results:

NameName2
Mr. Mike SmithMike
Dr Jane JonesJane
Miss Meg MoonMeg

Example - Extract Log Levels

Source:

The following represents raw log messages extracted from an application. You want to extract the error level for each message: INFOWARNING, or ERROR.

app_log
20115-10-30T15:43:37:874Z INFO Client env:started
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB
20115-10-30T15:43:38:515Z INFO Client env:launched application

Transformation:

The text of interest appears after the timestamp and before the message.

  • In the after clause, a pattern is required. In this case, the selection rule identifies the last segment of the timestamp, with the three pound signs (#) identifying three digits of unknown value. The "Z " value gives the selection rule an extra bit of specificity. Note the backticks to denote the selection rule. 
  • In the before clause, you can use a simple space character string, since it is consistent across all of the data.

Transformation Name Extract text or pattern
Parameter: Column to extract from app_log
Parameter: Option Custom text or pattern
Parameter: Start extracting after `###Z `
Parameter: End extracting before ' '

Results:

app_logapp_log_2
20115-10-30T15:43:37:874Z INFO Client env:startedINFO
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium componentINFO
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DBERROR
20115-10-30T15:43:38:515Z INFO Client env:launched applicationINFO

Example - Clean up marketing contact data with replace, set, and extract

This example illustrates the different uses of the following transformations to replace or extract cell data:
  • set - defines the values to use in a predefined column. See Set Transform.

    Tip: Use the derive transform to generate a new column containing a defined set of values. See Derive Transform.

  • replace - replaces a string literal or pattern appearing in the values of a column with a specific string. See Replace Transform.
  • extract - extracts a pattern-based value from a column and stores it in a new column. See Extract Transform.

Source:

The following dataset contains contact information that has been gathered by your marketing platform from actions taken by visitors on your website. You must clean up this data and prepare it for use in an analytics platform.

LeadIdLastNameFirstNameTitlePhoneRequest
LE160301001JonesCharlesChief Technical Officer415-555-1212reg
LE160301002LyonsEdward 415-012-3456download whitepaper
LE160301003MartinMaryCEO510-555-5555delete account
LE160301004SmithTaliaEngineer510-123-4567free trial

 

Transformation:

Title column: For example, you first notice that some data is missing. Your analytics platform recognizes the string value, "#MISSING#" as an indicator of a missing value. So, you click the missing values bar in the Title column. Then, you select the Replace suggestion card. Note that the default replacement is a null value, so you click Edit and update it:

Transformation Name Edit column with formula
Parameter: Columns Title
Parameter: Formula if(ismissing([Title]),'#MISSING#',Title)

Request column: In the Request column, you notice that the reg entry should be cleaned up. Add the following transformation, which replaces that value:

Transformation Name Replace text or pattern
Parameter: Column Request
Parameter: Find `{start}reg{end}`
Parameter: Replace with Registration

The above transformation uses a Pattern  as the expression of the on: parameter. This expression indicates to match from the start of the cell value, the string literal reg, and then the end of the cell value, which matches on complete cell values of reg only.

This transformation works great on the sample, but what happens if the value is Reg with a capital R? That value might not be replaced. To improve the transformation, you can modify the transformation with the following Pattern  in the on parameter, which captures differences in capitalization:

Transformation Name Replace text or pattern
Parameter: Column Request
Parameter: Find `{start}{[R|r]}eg{end}`
Parameter: Replace with 'Registration'

Add the above transformation to your recipe. Then, it occurs to you that all of the values in the Request column should be capitalized in title or proper case:

Transformation Name Edit column with formula
Parameter: Columns Request
Parameter: Formula proper(Request)

Now, all values are capitalized as titles.

Phone column: You might have noticed some issues with the values in the Phone column. In the United States, the prefix 555 is only used for gathering information; these are invalid phone numbers.

In the data grid, you select the first instance of 555 in the column. However, it selects all instances of that pattern, including ones that you don't want to modify. In this case, continue your selection by selecting the similar instance of 555 in the other row. In the suggestion cards, you click the Replace Text or Pattern transformation.

Notice, however, that the default Replace Text or Pattern transformation has also highlighted the second 555 pattern in one instance, which could be a problem in other phone numbers not displayed in the sample. You must modify the selection pattern for this transformation. In the on: parameter below, the Pattern  has been modified to match only the instances of 555 that appear in the second segment in the phone number format:

Transformation Name Replace text or pattern
Parameter: Column Phone
Parameter: Find `{start}%{3}-555-%*{end}`
Parameter: Replace with '#INVALID#'
Parameter: Match all occurrences true

Note the wildcard construct has been added (%*). While it might be possible to add a pattern that matches on the last four characters exactly (%{4}), that matching pattern would not capture the possibility of a phone number having an extension at the end of it. The above expression does.

NOTE: The above transformation creates values that are mismatched with the Phone Number data type. In this example, however, these mismatches are understood to be for the benefit of the system consuming your Alteryx output.

LeadId column: You might have noticed that the lead identifier column (LeadId) contains some embedded information: a date value and an identifier for the instance within the day. The following steps can be used to break out this information. The first one creates a separate working column with this information, which allows us to preserve the original, unmodified column:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula LeadId
Parameter: New column name 'LeadIdworking'

You can now work off of this column to create your new ones. First, you can use the following replace transformation to remove the leading two characters, which are not required for the new columns:

Transformation Name Replace text or pattern
Parameter: Column LeadIdworking
Parameter: Find 'LE'
Parameter: Replace with ''

Notice that the date information is now neatly contained in the first characters of the working column. Use the following to extract these values to a new column:

Transformation Name Extract text or pattern
Parameter: Column to extract from LeadIdworking
Parameter: Option Custom text or pattern
Parameter: Text to extract `{start}%{6}`

The new LeadIdworking2 column now contains only the date information. Cleaning up this column requires reformatting the data, retyping it as a Datetime type, and then applying the dateformat function to format it to your satisfaction. These steps are left as a separate exercise.

For now, let's just rename the column:

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column LeadIdworking1
Parameter: New column name 'LeadIdDate'

In the first working column, you can now remove the date information using the following:

Transformation Name Replace text or pattern
Parameter: Column LeadIdworking
Parameter: Find `{start}%{6}`
Parameter: Replace with ''

You can rename this column to indicate it is a daily identifier:

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column LeadIdworking
Parameter: New column name 'LeadIdDaily'

Results:

LeadIdLeadIdDailyLeadIdDateLastNameFirstNameTitlePhoneRequest
LE160301001001160301JonesCharlesChief Technical Officer#INVALID#Registration
LE160301002002160301LyonsEdward#MISSING#415-012-3456Download Whitepaper
LE160301003003160301MartinMaryCEO#INVALID#Delete Account
LE160301004004160301SmithTaliaEngineer510-123-4567Free Trial

See Also for Extract Transform:

Error rendering macro 'contentbylabel'

parameters should not be empty

 

  • No labels

This page has no comments.