Excerpt |
---|
Returns the index value in the input string where a specified matching string is located in provided column, string literal, or function returning a string. Search is conducted left-to-right. |
- A column reference can refer to a column of String, Object, or Array type, which makes the
FIND
function useful for filtering data before it has been completely un-nested into tabular data. - Returned value is from the beginning of the string, regardless of the string index value.
- If no match is found, the function returns a null value.
- If you need to determine if a value is in an array or not, you can use the
MATCHES
function, which returns a true/false response. See MATCHES Function.
You can also search a string from the right. For more information, see RIGHTFIND Function.
Column reference example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value: find(MyName,'find this',true,0) |
---|
|
find(MyName,'find this',true,0) |
Output: Searches the MyName
column value for the string find this
from the beginning of the value, ignoring case. If a match is found, the index value where the string is located is returned.
String literal example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value: find('Hello, World','lo',false,2) |
---|
|
find('Hello, World','lo',false,2) |
Output: Searches the string Hello, World
for the string lo
, in a case-sensitive search, beginning at the third character in the string. Since the match is found at the fourth character, the value 3
is returned.
If example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value: if(find(SearchPool,'FindIt') >= 0, 'found it', '') as: 'Finds' |
---|
|
if(find(SearchPool,'FindIt') >= 0, 'found it', '') |
Output: Searches the SearchPool
column value for the string FindIt
from the beginning of the value (default). Default behavior is to not ignore case. If the string is found, the value found it
is returned. Otherwise, the column is empty.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | derive type:single value:find(input_string,string_pattern,[ignore_case], [start_index]) |
---|
|
find(string_to_search, search_for,[ignore_case], [start_at]) |
Argument | Required? | Data Type | Description |
---|
string_to_search | Y | string | Name of the column, function returning a string, or string literal to be applied to the function. |
search_for | Y | string | The string or pattern you want to look for. This can be a string, function returning a string, or string literal or pattern or regular expression. |
ignore_case | N | boolean | Indicates if the Find function ignores case when trying to match the string or pattern. The default value is false . |
start_at | N | integer (non-negative) | Indicates the position in the column or string literal value at which to begin the search. This value can be an integer, a function returning an integer, or a column containing integers. The default value is 0. If not specified, the entire string is searched. |
string_to_search
Name of the item to be searched. Valid values can be:
- String literals must be quoted (
'Hello, World'
). - Column reference to any type that can be inferred as a string, which encompasses all values.
- Function returning a string value.
Missing values generate the start-at parameter value.
- Multiple values and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String literal, function returning a string, or column reference (String, Array, or Object) | myColumn |
search_for
Column of strings, function returning a string, string literal or pattern to find. An input value can be a literal,
, or a regular expression.
- Missing string or column values generate the start-at parameter value.
- String literals must be quoted (
'Hello, World'
).
- Multiple values and wildcards are not supported.
- Column names are not supported.
Required? | Data Type | Example Value |
---|
Yes | String literal or pattern | 'Hello' |
ignore_case
If true
, the FIND
function ignores case when trying to match the string literal or pattern value.
Default value is false
, which means that case-sensitive matching is performed by default.
Required? | Data Type | Example Value |
---|
No | Boolean | true |
start_at
Indicates the position in the column or string literal value at which to begin the search. For example, a value of 2
instructs the FIND
function to begin searching from the third character in the column or string value.
Info |
---|
NOTE: Index values begin at 0 . If not specified, the default value is 0 , which searches the entire string. |
Value can be an integer, a function returning an integer, or a column containing integers.
Value must be a non-negative integer value.
- If this value is greater than the length of the string, then no match is possible.
Required? | Data Type | Example Value |
---|
No | Integer (non-negative) | 2 |
Example - Locate product purchases in transaction stream
Source:
You have the simplified transaction stream listed below in which master information about a transaction (TransactionId
and CustomerId
) is paired with order detail information that is brought into the application as an array in the OrderDetail
column. The array column contains information about product ID, quantity, and the type of transaction.
TransactionId | CustomerId | OrderDetail |
---|
12312312 | 100023 | [{"ProdId":"54321","Qty":"5","TransType":"PURCHASE"}] |
12312313 | 100045 | [{"ProdId":"94105","Qty":"12","TransType":"PURCHASE"}] |
12312314 | 100066 | [{"ProdId":"54321","Qty":"1","TransType":"TEST"}] |
12312315 | 100068 | [{"ProdId":"85858","Qty":"9","TransType":"PURCHASE"}] |
The transaction stream includes test transactions, which are identified by the value TEST
for TransType
in the detail column. You want to remove these transactions early in the process, which should simplify your dataset and speed up its processing.
Transformation:
First, you must identify the records that contain the test transaction value. The following transform generates a new column containing true/false values for whether the value "TEST"
appears in the OrderDetail
transform.
Tip |
---|
Tip: You should include the double-quotes around the value, in case the other fields in the array could contain some version of the value TEST . Note that the double quotes need to be escaped, as in the value below. |
D trans |
---|
RawWrangle | true |
---|
p03Value | find_OrderDetail |
---|
Type | step |
---|
WrangleText | derive type: single value: find(OrderDetail, '\"TEST\"', false, 0) as: 'find_OrderDetail' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | find(OrderDetail, '\"TEST\"', false, 0) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
When the step is added to the recipe, the find_OrderDetail
column is generated, containing the index value returned by the FIND
function. In this case, there is only one row that contains a value: 53
for the third transaction.
TransactionId | CustomerId | OrderDetail | find_OrderDetail |
---|
12312312 | 100023 | [{"ProdId":"54321","Qty":"5","TransType":"PURCHASE"}] |
|
12312313 | 100045 | [{"ProdId":"94105","Qty":"12","TransType":"PURCHASE"}] |
|
12312314 | 100066 | [{"ProdId":"54321","Qty":"1","TransType":"TEST"}] | 53 |
12312315 | 100068 | [{"ProdId":"85858","Qty":"9","TransType":"PURCHASE"}] |
|
You can then add the following step to keep the rows where the FIND
function returned a null value in the find_OrderDetail
column:
D trans |
---|
RawWrangle | true |
---|
p03Value | Keep matching rows |
---|
Type | step |
---|
WrangleText | filter type: missing missing: find_OrderDetail action: Keep |
---|
p01Name | Condition |
---|
p01Value | Is missing |
---|
p02Name | Column |
---|
p02Value | find_OrderDetail |
---|
p03Name | Action |
---|
SearchTerm | Filter rows |
---|
|
Results:
TransactionId | CustomerId | OrderDetail | find_OrderDetail |
---|
12312312 | 100023 | [{"ProdId":"54321","Qty":"5","TransType":"PURCHASE"}] | |
12312313 | 100045 | [{"ProdId":"94105","Qty":"12","TransType":"PURCHASE"}] | |
12312315 | 100068 | [{"ProdId":"85858","Qty":"9","TransType":"PURCHASE"}] | |
You can delete the find_OrderDetail
column at this time.