Excerpt |
---|
Unpacks nested data from an Array or Object column to create new rows or columns based on the keys in the source data. |
This transform works differently on columns of Array or Object type.
The unnest
transform must include keys that you specify as part of the transform step. To unnest a column of array data that contains no keys, use the flatten
transform. See Flatten Transform.
This transform might be automatically applied as one of the first steps of your recipe. See Initial Parsing Steps.
D code |
---|
unnest col: myObj keys:'sourceA','sourceB' pluck:true markLineage:true |
Output:
- Extracts from the
myObj
column the corresponding values for the keys sourceA
and sourceB
into two new columns. - Since
markLineage
is true
, these new column names are prepended with the source name: sourceA_column1
and sourceB_column2
. - Any non-missing values from the source columns are added to the corresponding new columns and are removed from the source column, since
pluck
is true
.
D code |
---|
unnest col:column_ref keys:'key1','key2' [pluck:true|false] [markLineage:true|false] |
Token | Required? | Data Type | Description |
---|
unnest | Y | transform | Name of the transform |
col | Y | string | Source column name |
keys | Y | string | Comma-separated list of quoted key names. See below for examples. |
pluck | N | boolean | If true , any values unnested from the source are also removed from the source. Default is false . |
markLineage | N | boolean | If true , the names of new columns are prepended with the name of the source column. |
Identifies the column to which to apply the transform. You can specify only one column.
Required? | Data Type |
---|
Yes | String (column name) |
Include Page |
---|
| keys Parameter |
---|
| keys Parameter |
---|
|
Info |
---|
NOTE: Keys that contain non-alphanumeric values, such as spaces, must be enclosed in square brackets and quotes. Values with underscores do not require this bracketing. |
The comma-separated list of keys determines the columns to generate from the source data. If you specify three values for keys, the three new columns contain the corresponding values from the source column.
This parameter has different syntax to use for single-level and multi-level nested data. There are also variations in syntax between Object and Array data type.
Required? | Data Type |
---|
Yes | Comma-separated String values. Syntax examples are provided below. |
Info |
---|
NOTE: Key names are case-sensitive. |
For a single, top-level key in an Object field, you can specify the key as a simple quoted string:
D code |
---|
unnest col:myCol keys: 'myObjKey' |
The above looks for the key myObjKey
among the top-level keys in the Object and returns the corresponding value for the new column. You can also bracket this key in square brackets:
D code |
---|
unnest col:myCol keys: '[myObjKey]' |
To specify multiple first-level keys, use the following:
D code |
---|
unnest col:myCol keys:'myObjKey','my2ndObjKey' |
The above generates two new columns ( myObjKey
and my2ndObjKey
) containing the corresponding values for the keys.
You can also reference keys that are below the first level in the Object.
Example data:
Code Block |
---|
{ "Key1" :
{ "Key1A" :
{ "Key1A1" : "Value1" }
}
}
{ "Key2" :
{ "Key2A" :
{ "Key2A1" : "Value2" }
}
}
{ "Key3" :
{ "Key3A" :
{ "Key3A1" : "Value3" }
}
} |
To acquire the data for the Key1A
key, use the following:
D code |
---|
unnest col: myCol keys: 'Key1[Key1A]' |
In the new column, the displayed value is the following:
Code Block |
---|
{ "Key1A1" : "Value1" } |
To unnest a third-layer value, use a transform similar to the following:
D code |
---|
unnest col: myCol keys: 'Key2[Key2A][Key2A1]' |
In the new column, this transform generates a value of Value2
.
Keys for Array data - single level
You can reference array elements using zero-based indexes or key names.
Info |
---|
NOTE: All references to Array keys must be bracketed. Array keys can be referenced by index number only. |
Example array data:
Code Block |
---|
["red","orange","yellow","green","blue","indigo","violet"] |
D code |
---|
unnest col: myCol keys:'[1]' |
The above transform retrieves the value orange
from the array.
D code |
---|
unnest col: myCol keys:'[1]','[3]' |
Returned values: orange
and green
.
Keys for Array data - multi-level
The following example nested Array data matches the structure of the Object data in the previous example:
Code Block |
---|
[ [ "Item1", ["Item1A", ["Item1A1","Value1"] ] ], [ "Item2", ["Item2A", ["Item2A1","Value2"] ] ], [ "Item3", ["Item3A",["Item3A1","Value3"] ] ] ] |
To unnest the value for Items2A
:
D code |
---|
unnest col:myCol keys:'[1][0]' |
The value inserted into the new column is the following:
Code Block |
---|
["Item2A1","Value2"] |
To unnest from the third level:
D code |
---|
unnest col:myCol keys:'[2][0][0]' |
The inserted value is Item3A
.
Include Page |
---|
| pluck Parameter |
---|
| pluck Parameter |
---|
|
Required? | Data Type |
---|
No | Boolean |
Include Page |
---|
| markLineage Parameter |
---|
| markLineage Parameter |
---|
|
Info |
---|
NOTE: If your unnest transform does not change the number of rows, you can still access source row number information in the data grid, assuming it was still available when the transform was executed. |
Required? | Data Type |
---|
No | Boolean |
You have the following dataset. The Sizes
column contains Object data on available sizes.
Source:
ProdId | ProdName | Sizes |
---|
1001 | Hat | {'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'Y'} |
1002 | Shirt | {'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'N'} |
1003 | Pants | {'Small':'Y','Medium':'Y','Large':'Y','Extra-Large':'N'} |
Transformation:
Info |
---|
NOTE: Depending on the format of your source data, you might need to perform some replacements in the Sizes column in order to make it inferred as proper Object type values. The final format should look like the above. |
If it is not inferred already, set the type of the Sizes
column to Object:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | settype col: Sizes type: 'Object' |
---|
p01Name | Columns |
---|
p01Value | Sizes |
---|
p02Name | New type |
---|
p02Value | Object |
---|
SearchTerm | Change column data type |
---|
|
Unnest the data into separate columns. The following prepends Sizes_
to the newly generated column name.
D trans |
---|
RawWrangle | true |
---|
p03Value | test |
---|
Type | step |
---|
WrangleText | unnest col:Sizes keys:'Small','Medium','Large','Extra-Large' markLineage:true |
---|
p01Name | Column |
---|
p01Value | Sizes |
---|
p02Name | Paths to elements |
---|
p02Value | 'Small','Medium','Large','Extra-Large' |
---|
p03Name | Include original column name |
---|
SearchTerm | Unnest Objects into columns |
---|
|
You might find it useful to add pluck:true
to the above transform. When added, values that are un-nested are removed from the source, leaving only the values that weren't processed:
D trans |
---|
RawWrangle | true |
---|
p03Value | true |
---|
Type | step |
---|
WrangleText | unnest col:Sizes keys:'Small','Medium','Large','Extra-Large' markLineage:true pluck:true |
---|
p01Name | Column |
---|
p01Value | Sizes |
---|
p02Name | Paths to elements |
---|
p02Value | 'Small','Medium','Large','Extra-Large' |
---|
p03Name | Remove elements from original |
---|
p04Value | true |
---|
p04Name | Include original column name |
---|
SearchTerm | Unnest Objects into columns |
---|
|
If all values have been processed, the Sizes
column now contains a set of maps missing data. You can use the following to determine if the length of the remaining data is longer than two characters. This transform is a good one to just preview:
D trans |
---|
RawWrangle | true |
---|
p03Value | 'len_Sizes' |
---|
Type | step |
---|
WrangleText | derive type:single value:(len(Sizes) > 2) as:'len_Sizes' |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | (len(Sizes) > 2) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
You can delete the source column:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | drop col:Sizes |
---|
p01Name | Columns |
---|
p01Value | Sizes |
---|
p02Name | Action |
---|
p02Value | Delete selected columns |
---|
SearchTerm | Delete columns |
---|
|
Results:
When you are finished, the dataset should look like the following:
ProdId | ProdName | Sizes_Small | Sizes_Medium | Sizes_Large | Sizes_Extra-Large |
---|
1001 | Hat | N | Y | Y | Y |
1002 | Shirt | N | Y | Y | N |
1003 | Pants | Y | Y | Y | N |
The following example demonstrates differences between the unnest
and the flatten
transform, including how you use unnest
to flatten array data based on specified keys.
Include Page |
---|
| EXAMPLE - Flatten and Unnest Transforms |
---|
| EXAMPLE - Flatten and Unnest Transforms |
---|
|
Include Page |
---|
| EXAMPLE - Extractkv and Unnest Transforms |
---|
| EXAMPLE - Extractkv and Unnest Transforms |
---|
|
D s also |
---|
label | wrangle_transform_unnest |
---|
|