Page tree

Trifacta SaaS



Contents:

   

Contents:


Returns the position of the nth occurrence of a letter or pattern in the input string where a specified matching string is located in the provided column. You can search either from left or right.
  • A column reference can refer to a column of String, Object, or Array type, which makes the FINDNTH function useful for filtering data before it has been completely un-nested into tabular data.
  • Returned value is from the beginning of the string, regardless of the string index value.
  • If no match is found, the function returns a null value.

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

Column reference example:

findnth(Message, `{hashtag}`, 2, left, true)

Output: Searches the Message column value for second occurrence of the hashtag pattern from the left of the column by considering the case-sensitive matching. If a match is found, returns the index value of the location in the string where the pattern appears.

String literal example:

findnth('Hello, World','o', 2, left ,false)

Output: Searches the string Hello, World for the 2nd occurrence of the value o from the left of the column by ignoring the case. In this case, the returned value is 8

Syntax and Arguments

findnth(string_to_search, search_for, match_number, direction, ignore_case)


ArgumentRequired?Data TypeDescription
string_to_searchYstringName of the column, function returning a string, or string literal to be applied to the function
search_forYstringName of column, function returning a string, or string literal or pattern to find
match_numberYintegerCount of characters from the start of the value to include in the match
directionNstringThe direction to search the string from. This can be either left or right. Default is left.
ignore_caseNbooleanIf true, a case-insensitive match is performed. Default is false.

For more information on syntax standards, see Language Documentation Syntax Notes.

Item to be searched. Valid values can be:

  • String literals must be quoted ( 'Hello, World' ).
  • Column reference to any type that can be inferred as a string, which encompasses all values.
  • Function returning a string value.
  • Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal, function returning a string, or column reference (String, Array, or Object)myColumn

search_for

Column of strings,  function returning a string, string literal or pattern to find. An input value can be a literal,  Pattern , or a regular expression.

  • Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or pattern'Hello'

match_number

The number of string or pattern matches to find.  For example, a value of 2 instructs the FINDNTH function to begin searching from the second occurrence of the character or pattern in the column or a string value.

Usage Notes:

Required?Data TypeExample Value
Yesinteger (non-negative)'2'
  • Value must be a non-negative integer value.

  • Value must a non-negative integer. If the value is 0, then the match fails for all strings.

direction

The direction to search the string from. The direction can be either left or right. By default, it is left.

Usage Notes:

Required?Data TypeExample Value
Nostringleft

ignore_case

If true, the FINDNTH function ignores case when trying to match the string literal or pattern value.

Default value is false, which means that case-sensitive matching is performed by default.

Usage Notes:

Required?Data TypeExample Value
NoBooleantrue


Examples

Tip: For additional examples, see Common Tasks.

Example - Filter Hashtag messages 

Source:

The table below contains your customer's tweet messages. You are interested in the second hashtag that is listed in each message.

Tip: You can use either the FIND or the FINDNTH function for the first value in the string. However, you must use the FINDNTH function to find the second or later values in the string.


User NameLocationMessages

Eugenie

U.K

#dataprep  #businessintelligence #CommitToCleanData #London

JenifferNewYork

Learn how #NewYorklife#  #bigdata  #dataprep  #NewYork #

PatrickBerlin

#bigdata  #machine learning #datawrangling #Berlin

ChristySanFrancisco

#predictivetransformation, #businessintelligence, #startwiththeuser, #machinelearning #SFO

DaveParis

#commitocleandata, #pivot, #aggregation, #bigdata, #dataprep, #machinelearning

Transformation:

First, you must identify the records that contain the pattern hashtag. The following transform generates a new column containing the position of the 2nd hashtag in the Messages column. This value can be retrieved using the {hashtag} Trifacta pattern.

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula FINDNTH(Message, `{hashtag}`, 2, left, true)
Parameter: New column name find2ndhashtag_Message

User NameLocationMessagesfind2ndhashtag_Message

Eugenie

U.K

#dataprep  #businessintelligence #CommitToCleanData #London

11
JenifferNewYork

Learn how #NewYorklife#  #bigdata  #dataprep  #NewYork #

26
PatrickBerlin

#bigdata  #machinelearning #datawrangling #Berlin

10
ChristySanFrancisco

#predictivetransformation, #businessintelligence, #startwiththeuser, #machinelearning #SFO

28
DaveParis

#commitocleandata, #pivot, #aggregation, #bigdata, #dataprep, #machinelearning

20

When the step is added to the recipe, the find2ndhashtag_Message column is generated, containing the index value returned by the FINDNTH function. In this case, each row contains at least two instances of the {hashtag} Trifacta pattern, and the generated column contains the index position where it occurs in the Message column.

The next step is to extract the hashtag from the Messages column to convert into a meaningful data.

Transformation Name Extract patterns
Parameter: Column to extract from Messages
Parameter: Option Custom text or pattern
Parameter: Text to extract `{hashtag}`
Parameter: Number of matches to extract 2

NOTE: The number of matches to extract parameter defaults to 1, meaning that the transformation extracts a maximum of one value from each cell. This value can be set from 1-50. In this case, you should set to the value to 2, since you are interested in the second hashtag.

The above transformation generates two new columns with the first two extracted hashtag values from the Messages column. Next steps are:

  • Delete the column containing the first hashtag value. 
  • Rename the Messages2 column to be Second_hashtag_value.

Results:

User NameLocationMessagesSecond_hashtag_valuefind2ndhashtag_Message

Eugenie

U.K

#dataprep  #businessintelligence #CommitToCleanData #London

#businessintelligence 11
JenifferNewYork

Learn how #NewYorklife#  #bigdata  #dataprep  #NewYork #

#bigdata26
PatrickBerlin

#bigdata  #machine #learning #datawrangling #Berlin

#machinelearning10
ChristySanFrancisco

#predictivetransformation, #businessintelligence, #startwiththeuser, #machinelearning #SFO

#businessintelligence28
DaveParis

#commitocleandata, #pivot, #aggregation, #bigdata, #dataprep, #machinelearning

#pivot20

This page has no comments.