Excerpt |
---|
Unlike other types of data, text data has very few restrictions on the kinds of values that appear in a cell. In the application, this data is typically inferred as String data type. As a result, finding string values that mean the same thing can be a challenge, as minor differences in their content or structure can invalidate a match. |
This section provides some methods for comparing strings.
- Some target systems may impose limits on the lengths of imported values. For more information on managing the lengths of your strings, see Manage String Lengths.
Find Substrings
You can use the following functions to locate sub-strings that are part of a column's value.
The following transformation checks the left five values of the lowercase version of the ProdId column to see if it matches xxxx-
. If the value is detected, then the ProdName
value is set to NO_NAME
:
D trans |
---|
Type | step |
---|
p01Name | Columns |
---|
p01Value | ProdName |
---|
p02Name | Formula |
---|
p02Value | IF(LEFT(LOWER(ProdId,5))=='xxxx-','NO_NAME' ,ProdName) |
---|
SearchTerm | Edit with formula |
---|
|
Compare String Ends by Pattern
You can use the STARTSWITH
and ENDSWITH
functions to determine if a string begins or ends with a specified pattern.
Tip |
---|
Tip: These functions are most useful for performing pattern-based checks on strings. For string literals, you can use the LEFT and RIGHT functions. See below. |
The following transformation inserts the value error
in the custCodeStatus
column if the custCode
value begins with six digits in a row:
D trans |
---|
Type | step |
---|
p01Name | Columns |
---|
p01Value | custCodeStatus |
---|
p02Name | Formula |
---|
p02Value | IF(STARTSWITH(custCode,`{digit}{6}`), 'error',custCodeStatus) |
---|
SearchTerm | Edit with formula |
---|
|
Match Strings
Exact matching
You can use the EXACT
function to compare if two strings are exact matches. String inputs can be literals, column references, or expressions that evaluate to strings.
Info |
---|
NOTE: The EXACT function evaluates for exact matches. Whitespace or capitalization differences return false . |
You can nest function expressions inside of the EXACT
reference to eliminate common and perhaps not useful differences between strings. In the following transformation, a value of true
is inserted into the matches
column, if colA
and colB
are exact matches, after whitespace and case differences have been removed:
D trans |
---|
Type | step |
---|
p01Name | Formula |
---|
p01Value | IF(EXACT(LOWER(REMOVEWHITESPACE(colA)))==EXACT(LOWER(REMOVEWHITESPACE(colB))),'true','false') |
---|
p02Name | New column name |
---|
p02Value | matches |
---|
SearchTerm | New formula |
---|
|
The platform also supports the doublemetaphone algorithm for fuzzy matching. This algorithm provides mechanism for proximity matching; the DOUBLEMETAPHONEEQUALS
function supports an optional second parameter to define the strength of the algorithm.
This algorithm works by generating two separate encodings for each string: a primary encoding and a secondary encoding. You can experiment with these encodings using the DOUBLEMETAPHONE
function. See DOUBLEMETAPHONE Function.
This algorithm can be applied to compare two strings, as in the following transformation.
D trans |
---|
Type | step |
---|
p01Name | Formula |
---|
p01Value | DOUBLEMETAPHONEEQUALS(colA,colB,'strong') |
---|
p02Name | New column name |
---|
p02Value | matches |
---|
SearchTerm | New formula |
---|
|
Compare Strings
For string values, you can use the string comparison functions to check how strings compare using Latin collation settings.
Tip |
---|
Tip: Any column can be converted to String data type to use these functions. |
Include Page |
---|
| String Collation Rules |
---|
| String Collation Rules |
---|
|
Available functions:
D s also |
---|
inCQL | true |
---|
label | ((label = "transformation_ui") OR (label = "string_functions")) |
---|
|