Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0822

D toc

D s transforms

Excerpt

Unpacks nested data from an Array or Object column to create new rows or columns based on the keys in the source data.   

This transform works differently on columns of Array or Object type.

The  unnest  transform must include keys that you specify as part of the transform step. To unnest a column of array data that contains no keys, use the flatten transform. See Flatten Transform.

This transform might be automatically applied as one of the first steps of your recipe. See Initial Parsing Steps.

 

D s
snippetBasic

D code

unnest col: myObj keys:'sourceA','sourceB' pluck:true markLineage:true

Output:

  • Extracts from the myObj column the corresponding values for the keys sourceA and sourceB into two new columns. 
  • Since markLineage is true, these new column names are prepended with the source name: sourceA_column1 and sourceB_column2
  • Any non-missing values from the source columns are added to the corresponding new columns and are removed from the source column, since pluck is true.

D s
snippetParameters

D code

unnest col:column_ref keys:'key1','key2' [pluck:true|false] [markLineage:true|false]

TokenRequired?Data TypeDescription
unnestYtransformName of the transform
colYstringSource column name
keysYstringComma-separated list of quoted key names. See below for examples.
pluckNbooleanIf true, any values unnested from the source are also removed from the source. Default is false.
markLineageNbooleanIf true, the names of new columns are prepended with the name of the source column.

D s lang notes

col

Identifies the column to which to apply the transform. You can specify only one column.

D s
snippetusage

Required?Data Type
YesString (column name)

keys

Include Page
keys Parameter
keys Parameter

Info

NOTE: Keys that contain non-alphanumeric values, such as spaces, must be enclosed in square brackets and quotes. Values with underscores do not require this bracketing.

The comma-separated list of keys determines the columns to generate from the source data. If you specify three values for keys, the three new columns contain the corresponding values from the source column.

This parameter has different syntax to use for single-level and multi-level nested data. There are also variations in syntax between Object and Array data type.

D s
snippetusage

Required?Data Type
Yes

Comma-separated String values.

Syntax examples are provided below.

Keys for Object data - single-level

Info

NOTE: Key names are case-sensitive.

For a single, top-level key in an Object field, you can specify the key as a simple quoted string:

D code

unnest col:myCol keys: 'myObjKey'

The above looks for the key myObjKey among the top-level keys in the Object and returns the corresponding value for the new column. You can also bracket this key in square brackets:

D code

unnest col:myCol keys: '[myObjKey]'

To specify multiple first-level keys, use the following:

D code

unnest col:myCol keys:'myObjKey','my2ndObjKey'

The above generates two new columns ( myObjKey and my2ndObjKey) containing the corresponding values for the keys.

Keys for Object data - multi-level

You can also reference keys that are below the first level in the Object. 

Example data:

Code Block
{ "Key1" :
  { "Key1A" :
    { "Key1A1" : "Value1" }
  }
}
{ "Key2" :
  { "Key2A" :
    { "Key2A1" : "Value2" }
  }
}
{ "Key3" :
  { "Key3A" :
    { "Key3A1" : "Value3" }
  }
}

To acquire the data for the Key1A key, use the following:

D code

unnest col: myCol keys: 'Key1[Key1A]'

In the new column, the displayed value is the following:

Code Block
{ "Key1A1" : "Value1" }

To unnest a third-layer value, use a transform similar to the following:

D code

unnest col: myCol keys: 'Key2[Key2A][Key2A1]'

In the new column, this transform generates a value of Value2.

Keys for Array data - single level

You can reference array elements using zero-based indexes or key names.

Info

NOTE: All references to Array keys must be bracketed. Array keys can be referenced by index number only.

Example array data:

Code Block
["red","orange","yellow","green","blue","indigo","violet"]
D code

unnest col: myCol keys:'[1]'

The above transform retrieves the value orange from the array.  

D code

unnest col: myCol keys:'[1]','[3]'

Returned values: orange and green.

Keys for Array data - multi-level

The following example nested Array data matches the structure of the Object data in the previous example:

Code Block
[ [ "Item1", ["Item1A", ["Item1A1","Value1"] ] ], [ "Item2", ["Item2A",  ["Item2A1","Value2"] ] ], [ "Item3", ["Item3A",["Item3A1","Value3"] ] ] ] 

To unnest the value for Items2A:

D code

unnest col:myCol keys:'[1][0]'

The value inserted into the new column is the following:

Code Block
["Item2A1","Value2"]

To unnest from the third level:

D code

unnest col:myCol keys:'[2][0][0]'

The inserted value is Item3A.

pluck

Include Page
pluck Parameter
pluck Parameter

D s
snippetusage

Required?Data Type
NoBoolean

markLineage

Include Page
markLineage Parameter
markLineage Parameter

Info

NOTE: If your unnest transform does not change the number of rows, you can still access source row number information in the data grid, assuming it was still available when the transform was executed.

D s
snippetusage

Required?Data Type
NoBoolean


D s
snippetExamples

Example - Unnest an Object

You have the following dataset. The Sizes column contains Object data on available sizes. 

Source:

ProdIdProdNameSizes
1001Hat{'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'Y'}
1002Shirt{'Small':'N','Medium':'Y','Large':'Y','Extra-Large':'N'}
1003Pants{'Small':'Y','Medium':'Y','Large':'Y','Extra-Large':'N'}

Transformation:

Info

NOTE: Depending on the format of your source data, you might need to perform some replacements in the Sizes column in order to make it inferred as proper Object type values. The final format should look like the above.

If it is not inferred already, set the type of the Sizes column to Object:

D trans
RawWrangletrue
Typestep
WrangleTextsettype col: Sizes type: 'Object'
p01NameColumns
p01ValueSizes
p02NameNew type
p02ValueObject
SearchTermChange column data type

Unnest the data into separate columns. The following prepends Sizes_ to the newly generated column name.

D trans
RawWrangletrue
p03Valuetest
Typestep
WrangleTextunnest col:Sizes keys:'Small','Medium','Large','Extra-Large' markLineage:true
p01NameColumn
p01ValueSizes
p02NamePaths to elements
p02Value'Small','Medium','Large','Extra-Large'
p03NameInclude original column name
SearchTermUnnest Objects into columns

You might find it useful to add pluck:true to the above transform. When added, values that are un-nested are removed from the source, leaving only the values that weren't processed:

D trans
RawWrangletrue
p03Valuetrue
Typestep
WrangleTextunnest col:Sizes keys:'Small','Medium','Large','Extra-Large' markLineage:true pluck:true
p01NameColumn
p01ValueSizes
p02NamePaths to elements
p02Value'Small','Medium','Large','Extra-Large'
p03NameRemove elements from original
p04Valuetrue
p04NameInclude original column name
SearchTermUnnest Objects into columns

If all values have been processed, the  Sizes column now contains a set of maps missing data. You can use the following to determine if the length of the remaining data is longer than two characters. This transform is a good one to just preview:

D trans
RawWrangletrue
p03Value'len_Sizes'
Typestep
WrangleTextderive type:single value:(len(Sizes) > 2) as:'len_Sizes'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Value(len(Sizes) > 2)
p03NameNew column name
SearchTermNew formula

You can delete the source column:

D trans
RawWrangletrue
Typestep
WrangleTextdrop col:Sizes
p01NameColumns
p01ValueSizes
p02NameAction
p02ValueDelete selected columns
SearchTermDelete columns

Results:

When you are finished, the dataset should look like the following:

ProdIdProdNameSizes_SmallSizes_MediumSizes_LargeSizes_Extra-Large
1001HatNYYY
1002ShirtNYYN
1003PantsYYYN

Example - Unnest an array

The following example demonstrates differences between the unnest and the flatten transform, including how you use unnest to flatten array data based on specified keys.

Include Page
EXAMPLE - Flatten and Unnest Transforms
EXAMPLE - Flatten and Unnest Transforms
 

Example - extracting key values from car data and then unnesting into separate columns

Include Page
EXAMPLE - Extractkv and Unnest Transforms
EXAMPLE - Extractkv and Unnest Transforms

D s also
labelwrangle_transform_unnest