Countpattern Transform
Note
Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.
Counts the number of instances of a specified pattern in a column and writes that value into a newly generated column. Source column is unchanged.
Basic Usage
countpattern col: myCol on: 'honda'
Output: Generates a new column containing the number of instances of the string honda that appear in each row of the column, myCol.
Syntax and Parameters
countpattern col:column_ref [ignoreCase:true|false] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match']
Token | Required? | Data Type | Description |
|---|---|---|---|
countpattern | Y | transform | Name of the transform |
col | Y | string | Source column name |
ignoreCase | N | boolean | If |
Matching parameters:
Note
At least one of the following parameters must be included to specify the pattern to count: after, before, from, on, to.
Token | Required? | Data Type | Description |
|---|---|---|---|
after | N | string | String literal or pattern that precedes the pattern to match |
before | N | string | String literal or pattern that appears after the pattern to match |
from | N | string | String literal or pattern that identifies the start of the pattern to match |
on | N | string | String literal or pattern that identifies the pattern to match. |
to | N | string | String literal or pattern that identifies the end of the pattern to match |
For more information on syntax standards, see Language Documentation Syntax Notes.
col
Identifies the column to which to apply the transform. You can specify only one column.
countpattern col: MyCol on: 'MyString'
Output: Counts the number of instances of the value MyString in the MyCol column and writes this value to a new column.
Usage Notes:
Required? | Data Type |
|---|---|
Yes | String (column name) |
after
countpattern col: MyCol after: 'entry:'
Output: Counts 1 if there is anything that appears in the MyCol column value after the string entry:. If the value entry: does not appear in the column, the output value is 0.
A pattern identifier that precedes the value or pattern to match. Define the after parameter value using string literals, regular expressions, or Wrangle .
Usage Notes:
Required? | Data Type |
|---|---|
No | String (string literal or pattern) |
The
afterandfromparameters are very similar.fromincludes the matching value as part of the extracted string.aftercan be used with eitherto,on, orbefore. See Pattern Clause Position Matching
before
A pattern identifier that occurs after the value or pattern to match. Define the pattern using string literals, regular expressions, or Wrangle .
countpattern col: MyCol before: '|'
Output:
Counts
1if there is a value that appears before the pipe character (|) in theMyColcolumn, and no other pattern parameter is specified. If thebeforevalue does not appear in the column, the output value is0.If another pattern parameter such as
afteris specified, the total count of instances is written to the new column.
Usage Notes:
Required? | Data Type |
|---|---|
No | String or pattern |
The
beforeandtoparameters are very similar.toincludes the matching value as part of the extracted string.beforecan be used with eitherfrom,on, orafter. See Pattern Clause Position Matching.
from
Identifies the pattern that marks the beginning of the value to match. It can be a string literal, Wrangle , or regular expression. The from value is included in the match.
countpattern col: MyCol from: 'go:'
Output:
Counts
1if contents fromMyColthat occur fromgo:, to the end of the cell when no other pattern parameter is specified. Ifgo:does not appear in the column, the output value is blank.If another pattern parameter such as
tois specified, the total count of instances is written to the new column.
Usage Notes:
Required? | Data Type |
|---|---|
No | String or pattern |
The
afterandfromparameters are very similar.fromincludes the matching value as part of the extracted string.fromcan be used with eithertoorbefore. See Pattern Clause Position Matching.
on
Identifies the pattern to match, which can be a string literal, Wrangle , or regular expression.
countpattern col: MyCol on: `###ERROR`
Tip
You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.
Usage Notes:
Required? | Data Type |
|---|---|
No | String (literal, regular expression, or Wrangle |
to
Identifies the pattern that marks the ending of the value to match. Pattern can be a string literal, Wrangle , or regular expression. The to value is included in the match.
countpattern col:MyCol from:'note:' to: `/`
Output:
Counts instances from
MyColcolumn of all values that begin withnote:up to a backslash character.If a second pattern parameter is not specified, then this value is either
0or1.
Usage Notes:
Required? | Data Type |
|---|---|
No | String or pattern |
The
beforeandtoparameters are very similar.toincludes the matching value as part of the extracted string.tocan be used with eitherfromorafter. See Pattern Clause Position Matching.
ignoreCase
Indicates whether the match should ignore case or not.
Set to
trueto ignore case matching.(Default) Set to
falseto perform case-sensitive matching.
countpattern col: MyCol on: 'My String' ignoreCase: true
Output: Counts the instances of the following values if they appear in the MyCol column: My String, my string, My string, etc.
Usage Notes:
Required? | Data Type |
|---|---|
No | Boolean |
Examples
Tip
For additional examples, see Common Tasks.
Example - counting patterns in tweets
This example demonstrates how to count the number of occurrences of text patterns in a column.
Functions:
Item | Description |
|---|---|
IF Function | The |
Source:
The dataset below contains fictitious tweet information shortly after the release of an application called, "Myco ExampleApp".
Date | twitterId | isEmployee | tweet |
|---|---|---|---|
11/5/15 | lawrencetlu38141 | FALSE | Just downloaded Myco ExampleApp! Transforming data in 5 mins! |
11/5/15 | petramktng024 | TRUE | Try Myco ExampleApp, our new free data wrangling app! See www.example.com. |
11/5/15 | joetri221 | TRUE | Proud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com. |
11/5/15 | datadaemon994 | FALSE | Great start with Myco ExampleApp. Super easy to use, and actually fun. |
11/5/15 | 99redballoons99 | FALSE | Liking this new ExampleApp! Good job, guys! |
11/5/15 | bigdatadan7182 | FALSE | @support, how can I find example datasets for use with your product? |
There are two areas of analysis:
For non-employees, you want to know if they are mentioning the new product by name.
For employees, you want to know if they are including cross-references to the web site as part of their tweet.
Transformation:
The following counts the occurrences of the string ExampleApp in the tweet column. Note the use of the ignoreCase parameter to capture capitalization differences:
Transformation Name |
|
|---|---|
Parameter: Column | tweet |
Parameter: Option | Text or pattern |
Parameter: Text or pattern to count | 'ExampleApp' |
Parameter: Ignore case | true |
For non-employees, you want to track if they have mentioned the product in their tweet:
Transformation Name |
|
|---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | if(isEmployee=='FALSE' && countpattern_tweet=='1',true,false) |
Parameter: New column name | 'nonEmployeeExampleAppMentions' |
The following counts the occurrences of example.com in their tweets:
Transformation Name |
|
|---|---|
Parameter: Column | tweet |
Parameter: Option | Text or pattern |
Parameter: Text or pattern to count | 'example.com' |
Parameter: Ignore case | true |
For employees, you want to track if they included the above cross-reference in their tweets:
Transformation Name |
|
|---|---|
Parameter: Formula type | Single row formula |
Parameter: Formula | if(isEmployee=='TRUE' && countpattern_tweet1 == 1, true, false) |
Parameter: New column name | 'employeeWebsiteCrossRefs' |
Results:
After you delete the two columns tabulating the counts, you end up with the following:
Date | twitterId | isEmployee | tweet | employeeWebsiteCrossRefs | nonEmployeeExampleAppMentions |
|---|---|---|---|---|---|
11/5/15 | lawrencetlu38141 | FALSE | Just downloaded Myco ExampleApp! Transforming data in 5 mins! | false | true |
11/5/15 | petramktng024 | TRUE | Try Myco ExampleApp, our new free data wrangling app! See www.example.com. | true | false |
11/5/15 | joetri221 | TRUE | Proud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com. | true | false |
11/5/15 | datadaemon994 | FALSE | Great start with Myco ExampleApp. Super easy to use, and actually fun. | false | true |
11/5/15 | 99redballoons99 | FALSE | Liking this new ExampleApp! Good job, guys! | false | true |
11/5/15 | bigdatadan7182 | FALSE | @support, how can I find example datasets for use with your product? | false | false |