Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.
Unpacks nested data from an Array or Object column to create new rows or columns based on the keys in the source data. This transform works differently on columns of Array or Object type.
unnest transform must include keys that you specify as part of the transform step. To unnest a column of array data that contains no keys, use the
flatten transform. See Flatten Transform.
This transform might be automatically applied as one of the first steps of your recipe. See Initial Parsing Steps.
- Extracts from the
myObjcolumn the corresponding values for the keys
sourceBinto two new columns.
true, these new column names are prepended with the source name:
- Any non-missing values from the source columns are added to the corresponding new columns and are removed from the source column, since
|Parameter||Required?||Transform Builder||Data Type||Description|
|col||Y||Column||string||Source column name|
|keys||Y||Paths||string||Comma-separated list of quoted key names. See below for examples.|
|pluck||N||Remove original||boolean||If |
|markLineage||N||Prepend original column name||boolean||If |
For more information on syntax standards, see Language Documentation Syntax Notes.
Identifies the column to which to apply the transform. You can specify only one column.
|Yes||String (column name)|
Comma-separated list of keys to use to extract data from the specified source column.
- Key values must be quoted. (e.g
'key1','key2'). Any quoted value is considered the path to a single key.
- Key values are case-sensitive.
- Each key must be listed. A range of keys cannot be specified.
NOTE: Keys that contain non-alphanumeric values, such as spaces, must be enclosed in square brackets and quotes. Values with underscores do not require this bracketing.
The comma-separated list of keys determines the columns to generate from the source data. If you specify three values for keys, the three new columns contain the corresponding values from the source column.
This parameter has different syntax to use for single-level and multi-level nested data. There are also variations in syntax between Object and Array data type.
Comma-separated String values.
Syntax examples are provided below.
Keys for Object data - single-level
NOTE: Key names are case-sensitive.
For a single, top-level key in an Object field, you can specify the key as a simple quoted string:
The above looks for the key
myObjKey among the top-level keys in the Object and returns the corresponding value for the new column. You can also bracket this key in square brackets:
To specify multiple first-level keys, use the following:
The above generates two new columns (
my2ndObjKey) containing the corresponding values for the keys.
Keys for Object data - multi-level
You can also reference keys that are below the first level in the Object.
To acquire the data for the
Key1A key, use the following:
In the new column, the displayed value is the following:
To unnest a third-layer value, use a transform similar to the following:
In the new column, this transform generates a value of
Keys for Array data - single level
You can reference array elements using zero-based indexes or key names.
NOTE: All references to Array keys must be bracketed. Array keys can be referenced by index number only.
Example array data:
The above transform retrieves the value
orange from the array.
Keys for Array data - multi-level
The following example nested Array data matches the structure of the Object data in the previous example:
To unnest the value for
The value inserted into the new column is the following:
To unnest from the third level:
The inserted value is
Indicates whether any values added from source to output columns should be removed from the source.
- Set to
trueto remove values from source after they have been added to output columns.
- (Default) Set to
falseto leave source columns untouched.
When set to
true, the names of new columns are prepended with the name of the source column. Example:
|Source Column||Output Column|
Nested key references are appended to the column name:
|Source Column||Key Value||Output Column|
NOTE: If your
unnest transform does not change the number of rows, you can still access source row number information in the data grid, assuming it was still available when the transform was executed.
Example - Unnest an Object
You have the following dataset. The
Sizes column contains Object data on available sizes.
NOTE: Depending on the format of your source data, you might need to perform some replacements in the
Sizes column in order to make it inferred as proper Object type values. The final format should look like the above.
If it is not inferred already, set the type of the
Sizes column to Object:
Unnest the data into separate columns. The following prepends
Sizes_ to the newly generated column name.
You might find it useful to add
pluck:true to the above transform. When added, values that are un-nested are removed from the source, leaving only the values that weren't processed:
If all values have been processed, the
Sizes column now contains a set of maps missing data. You can use the following to determine if the length of the remaining data is longer than two characters. This transform is a good one to just preview:
If you sort the values in the generated column, you can review the
true values to see if you need to modify your preceding
unnest transform.You can drop the source column:
When you are finished, the dataset should look like the following:
Example - Unnest an array
The following example demonstrates differences between the
unnest and the
flatten transform, including how you use
unnest to flatten array data based on specified keys.
- For more information, see Flatten Transform.
You have the following data on student test scores. Scores on individual scores are stored in the
Scores array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:
- One row for each student test
- Unique identifier for each student-score combination
When the data is imported from CSV format, you must add a
header transform and remove the quotes from the
Scoresarray (4) and the actual number: When the transform is previewed, you can see in the sample dataset that all tests are included. You might or might not want to include this column in the final dataset, as you might identify missing tests when the recipe is run at scale.
Unique row identifier: The
Scores array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of
LastName-FirstName-Scores values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called
Tests, which contains an index array for the number of values in the
Scores column. Index values start at
SOURCEROWNUMBERfunction: One row for each student test: Your data should look like the following:
Now, you want to bring together the
Scores arrays into a single nested array using the
flatten transform, you can unpack the nested array:
unnest: After you drop
column1, which is no longer needed you should rename the two generated columns: Unique row identifier: You can do one more step to create unique test identifiers, which identify the specific test for each student. The following uses the original row identifier
OrderIndexas an identifier for the student and the
TestNumbervalue to create the
TestIdcolumn value: The above are integer values. To make your identifiers look prettier, you might add the following:
Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the
LastNamevalue, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following: You can now use this as a grouping parameter for your calculation:
After you drop unnecessary columns and move your columns around, the dataset should look like the following:
Example - extracting key values from car data and then unnesting into separate columns
This example shows how you can unpack data nested in an Object into separate columns using the following transforms:
- extractkv - Removes key-value pairs from a source string. See Extract Transform.
unnest- Unpacks nested data in separate rows and columns. See Unnest Transform.
You have the following information on used cars. The
VIN column contains vehicle identifiers, and the
Properties column contains key-value pairs describing characteristics of each vehicle. You want to unpack this data into separate columns.
Add the following transform, which identifies all of the key values in the column as beginning with alphabetical characters.
valueafterstring identifies where the corresponding value begins after the key.
delimiterstring indicates the end of each key-value pair.
Now that the Object of values has been created, you can use the
unnesttransform to unpack this mapped data. In the following, each key is specified, which results in separate columns headed by the named key: Results:
When you drop the unnecessary Properties columns, the dataset now looks like the following:
This page has no comments.