Excerpt |
---|
Evaluates a String input against the Array datatype. If the input matches, the function outputs an Array value. Input can be a literal, a column of values, or a function returning String values. |
After you have converted your strings to arrays, if a sufficient percentage of input strings from a column are successfully converted to the other data type, the column may be retyped.
Tip |
---|
Tip: If the column is not automatically retyped as a result of this function, you can manually set the type to Array in a subsequent recipe step. |
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | derive type:single value: parsearray(strInput) as: 'convertedArray' |
---|
|
parsearray(strInput) |
Output: Returns the Array data type value for strInput
String values.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | derive type:single value:parsearray(str_input) |
---|
|
parsearray(str_input) |
Argument | Required? | Data Type | Description |
---|
str_input | Y | String | Literal, name of a column, or a function returning String values to match |
Literal, column name, or function returning String values that are to be evaluated for conversion to Array values.
- Missing values for this function in the source data result in null values in the output.
- Multiple columns and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String | '[1,2,3]' |
Example - parsing strings as an array
Source:
The following table represents raw imported CSV data:
setId | itemsA | itemsB |
---|
s01 | "1,2,3" | 4 |
s02 | "2,3,4" | 4 |
s03 | "3,4,5" | 4 |
s04 | "4,5,6" | 4 |
s05 | "5,6,7" | 4 |
s06 | "6,7,8" | 4 |
In the above table, you can see that the two items columns are interpreted differently. In the following steps, you can see how you can parse the data as array values before producing a new column intersecting the two arrays.
Transformation:
Create a new column to store the array version of itemsA
:
D trans |
---|
RawWrangle | true |
---|
p03Value | arrA |
---|
Type | step |
---|
WrangleText | derive type: single value: itemsA as: 'arrA' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | itemsA |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
Remove the quotes from the column:
D trans |
---|
RawWrangle | true |
---|
p03Value | '' |
---|
Type | step |
---|
WrangleText | replacepatterns col: arrA with: '' on: `"` global: true |
---|
p01Name | Column |
---|
p01Value | arrA |
---|
p02Name | Find |
---|
p02Value | `"` |
---|
p03Name | Replace with |
---|
p04Value | true |
---|
p04Name | Match all occurrences |
---|
SearchTerm | Replace text or pattern |
---|
|
Now create the array by merging the array text value with square brackets and then using the PARSEARRAY function to evaluate the merged value as an array:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | set col: arrA value: parsearray(merge(['[',arrA,']'])) |
---|
p01Name | Columns |
---|
p01Value | arrA |
---|
p02Name | Formula |
---|
p02Value | parsearray(merge(['[',arrA,']'])) |
---|
SearchTerm | Edit column with formula |
---|
|
You can create the second array column using a similar construction in a new column:
D trans |
---|
RawWrangle | true |
---|
p03Value | arrB |
---|
Type | step |
---|
WrangleText | derive type: single value: parsearray(merge(['[',itemsB,']'])) as: 'arrB' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | parsearray(merge(['[',itemsB,']'])) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
Since both columns have been parsed as array values, you can use the ARRAYINTERSECT function to find the common values:
D trans |
---|
RawWrangle | true |
---|
p03Value | arrIntersectAB |
---|
Type | step |
---|
WrangleText | derive type: single value: arrayintersect([arrA,arrB]) as: 'arrIntersectAB' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | arrayintersect([arrA,arrB]) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
Results:
setId | itemsA | itemsB | arrA | arrB | arrIntersectAB |
---|
s01 | "1,2,3" | 4 | [1,2,3] | [4] | [] |
s02 | "2,3,4" | 4 | [2,3,4] | [4] | [4] |
s03 | "3,4,5" | 4 | [3,4,5] | [4] | [4] |
s04 | "4,5,6" | 4 | [4,5,6] | [4] | [4] |
s05 | "5,6,7" | 4 | [5,6,7] | [4] | [] |
s06 | "6,7,8" | 4 | [6,7,8] | [4] | [] |