Excerpt |
---|
Returns the number of characters in a specified string. String value can be a column reference or string literal. |
Column reference example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value:len(MyName) |
---|
|
len(MyName) |
Output: Returns the number of characters in the value in column MyName
.
String literal example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value:len('Hello, World') |
---|
|
len('Hello, World') |
Output: Returns the value 12
.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | derive type:single value:len(column_string) |
---|
|
len(column_string) |
Argument | Required? | Data Type | Description |
---|
column_string | Y | string | Name of the column or string literal to be applied to the function |
column_string
Name of the column or string constant to be searched.
- Missing string or column values generate missing string results.
- String constants must be quoted (
'Hello, World'
). - Multiple columns and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String literal or column reference | myColumn |
Example - Fixed Length Strings
Source:
Your product identifiers follow a specific structure that you'd like to validate in your recipe. In the following example data, the productId
column should contain values of length 6
.
You can see that there is already a column containing validation errors for the ProductName
column. Values in the ProductId
column that are not this length should be flagged in a new column. Then, you should merge the two columns together to create a ValidationError
column.
ProductName | ProductId | ErrProductName |
---|
Chocolate Bunnie | 123456 | Error-ProductName |
Chocolate Squirl | 88442286 | Error-ProductName |
Chocolate Gopher | 12345 | |
Transformation:
To validate the length of the values in ProductId
, enter the following transform. Note that the as
parameter enables you to rename the column as part of the transform.
D trans |
---|
RawWrangle | true |
---|
p03Value | 'ErrProductIdLength' |
---|
Type | step |
---|
WrangleText | derive type:single value: if(len(ProductId) <> 6, 'Error-length-ProductId','') '' as: 'ErrProductIdLength' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | if(len(ProductId) <> 6, 'Error-length-ProductId','') |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
The dataset now looks like the following:
ProductName | ProductId | ErrProductName | ErrProductIdLength |
---|
Chocolate Bunnie | 123456 | Error-ProductName | |
Chocolate Squirrel | 88442286 | Error-ProductName | Error-length-ProductId |
Chocolate Gopher | 12345 | | Error-length-ProductId |
You can blend the two error columns into a single DataValidationErrors
error column using the following merge
transform. Note again the use of the as
parameter:
D trans |
---|
RawWrangle | true |
---|
p03Value | 'DataValidationErrors' |
---|
Type | step |
---|
WrangleText | merge col:ErrProductName,ErrProductIdlength with:' ' as:'DataValidationErrors' |
---|
p01Name | Columns |
---|
p01Value | ErrProductName,ErrProductIdlength |
---|
p02Name | Separator |
---|
p02Value | '' |
---|
p03Name | New column name |
---|
SearchTerm | Merge columns |
---|
|
To clean up the data, you might want to do the following, which trims out the whitespace in the DataValidationErrors
column and removes the two individual error columns:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | set col:DataValidationErrors value:trim(DataValidationErrors) |
---|
p01Name | Columns |
---|
p01Value | DataValidationErrors |
---|
p02Name | Formula |
---|
p02Value | trim(DataValidationErrors) |
---|
SearchTerm | Edit column with formula |
---|
|
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | drop col:ErrProductName,ErrProductIdLength |
---|
p01Name | Columns |
---|
p01Value | ErrProductName,ErrProductIdLength |
---|
p02Name | Action |
---|
p02Value | Delete selected columns |
---|
SearchTerm | Delete columns |
---|
|
Results:
The final dataset should look like the following:
ProductName | ProductId | DataValidationErrors |
---|
Chocolate Bunnie | 123456 | Error-ProductName |
Chocolate Squirrel | 88442286 | Error-ProductName Error-length-ProductId |
Chocolate Gopher | 12345 | Error-length-ProductId |