Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0822

This example shows how you can unpack data nested in an Object into separate columns using the following transforms:

Source:

You have the following information on used cars. The VIN column contains vehicle identifiers, and the Properties column contains key-value pairs describing characteristics of each vehicle. You want to unpack this data into separate columns.

VINProperties
XX3 JT4522year=2004,make=Subaru,model=Impreza,color=green,mileage=125422,cost=3199
HT4 UJ9122year=2006,make=VW,model=Passat,color=silver,mileage=102941,cost=4599
KC2 WZ9231year=2009,make=GMC,model=Yukon,color=black,mileage=68213,cost=12899
LL8 UH4921year=2011,make=BMW,model=328i,color=brown,mileage=57212,cost=16999

Transformation:

Add the following transformation, which identifies all of the key values in the column as beginning with alphabetical characters.

  • The valueafter string identifies where the corresponding value begins after the key.
  • The delimiter string indicates the end of each key-value pair.

D trans
RawWrangletrue
p03Value`=`
Typestep
WrangleTextextractkv col:Properties key:`{alpha}+` valueafter:`=` delimiter:`,`
p01NameColumn
p01ValueProperties
p02NameKey
p02Value`{alpha}+`
p03NameSeparator between key and value
p04Value','
p04NameDelimiter between pair
SearchTermConvert keys/values into Objects

Now that the Object of values has been created, you can use the unnest transform to unpack this mapped data. In the following, each key is specified, which results in separate columns headed by the named key:

D trans
RawWrangletrue
Typestep
WrangleTextunnest col:extractkv_Properties keys:'year','make','model','color','mileage','cost'
p01NameColumn
p01Valueextractkv_Properties
p02NamePaths to elements
p02Value'year','make','model','color','mileage','cost'
SearchTermUnnest Objects into columns

Results:

When you delete the unnecessary Properties columns, the dataset now looks like the following:

VINyearmakemodelcolormileagecost
XX3 JT45222004SubaruImprezagreen1254223199
HT4 UJ91222006VWPassatsilver1029414599
KC2 WZ92312009GMCYukonblack6821312899
LL8 UH49212011BMW328ibrown5721216999