Page tree

 

Support | BlogContact Us | 844.332.2821

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0411

...

Transform CategoryDescription
Initial Parsing

When your dataset is initially loaded into the Transformer page, one or more of these transforms may be automatically added to your recipe to transform it into easy-to-use tabular data.

For more information, see Initial Parsing Steps.

Manage ColumnsThese transforms assist in adding, removing, or changing the contents of the columns in your dataset.
Manage RowsRow-based transforms allow you to remove duplicate rows or keep or delete rows based on conditional expressions.
Search and ReplaceUse these transforms to locate patterns in your data and, if needed, to replace them.
Nested DataThese transforms can be used to nest or unnest your data.
AggregationThese transforms enable you to perform aggregated analysis on your dataset using group-based aggregation functions.
OtherMiscellaneous transforms that do not fit into any of the other categories.

The following are the available transforms.

...

TransformCategoryDescriptionUI Equivalent
Aggregate TransformAggregationThe aggregate transform performs summary calculations across a set of values in a column, as grouped by the values in another column. For example, you can compute the average and standard deviation of test scores by student, by gender, by class room number, or by all of those groups.

In the Transform Editor, click the Tools ( Image Removed ) icon. Then, select Aggregate .

 

Comment TransformOtherInserts a non-functional comment as a recipe step. 
Countpattern TransformSearch and ReplaceCounts the number of instances of a specified pattern in a column and writes that value into a newly generated column. Source column is unchanged. 
Deduplicate TransformManage RowsRemoves exact duplicate rows from your dataset. Duplicate rows are identified by exact matches between values. For example, two strings with different capitalization do not match. 
Delete TransformManage RowsDeletes a set of rows in your dataset, based on a condition specified in the row expression. If the conditional expression is true, then the row is deleted.Select one or more rows in the data grid. The delete transform should be one of the suggestions.
Derive TransformManage ColumnsGenerate a new column where the values are the output of the value expression. Expression can be calculated based on values specified in the group parameter. Output column can be named as needed.  
Drop TransformManage ColumnsRemoves the specified column or columns permanently from your dataset.

Select Drop from the column drop-down.

Select one or more columns in the data grid. The drop transform should be one of the suggestions.

Extract TransformSearch and ReplaceExtracts a subset of data from one column and inserts it into a new column, based on a specified string or pattern. The source column in unmodified. 
Extractkv TransformSearch and Replace

Extracts key-value pairs from a source column and writes them to a new column. Source column must be of String type, although the data can be formatted as other data types.

 
Extractlist TransformSearch and ReplaceExtracts a set of values based on a specified pattern from a source column of any data type. The generated column contains an array of occurrences of the specified pattern. While the new column contains array data, the data type of the new column is sometimes inferred as String. 
Flatten TransformNested DataUnpacks array data into separate rows for each value. 
Header TransformInitial ParsingUses one row from the dataset sample as the header row for the table. Each value in this row becomes the name of the column in which it is located.This transform might be automatically added to the beginning of your recipe. See Initial Parsing Steps.
Keep TransformManage RowsRetains a set of rows in your dataset, which are specified by the conditional in the row expression. All other rows are removed from the dataset.Select one or more rows in the data grid. The keep transform should be one of the suggestions.
Merge TransformManage ColumnsMerges two or more columns in your dataset to create a new column of String type. Optionally, you can insert a delimiter between the merged values. 
Move TransformManage ColumnsMoves the specified column or columns before or after another column in your dataset. 
Nest TransformNested DataCreates a map an Object or array Array of values using column names and their values as key-value pairs for one or more columns. Generated column type is determined by the into parameter. 
Pivot TransformNested Data

The pivot transform can be used to pivot your data into columns and aggregate the results. Reshape your dataset into summary information. When you pivot data, the values of a selected column become new columns in the dataset, each of which contains a summary calculation that you specify. This calculation can be based on all rows for totals across the dataset or based on group of rows you define in the transform. 

 
Rename TransformManage ColumnsRenames a column to a specified name. Select Rename from the column drop-down.
Replace TransformSearch and ReplaceReplaces values within the specified column or columns based on the string literal, pattern, or location within the cell value, as specified in the transform.Select a value in a cell in the data grid. This transform is typically one of the suggestions.
Set TransformSearch and ReplaceReplaces all values in the specified column with the specified value, which can be a literal or an expression. You can specify an optional row: parameter, containing a conditional test to identify the rows where the replacement is to be made within the column. Select a value in a cell in the data grid. This transform is typically one of the suggestions.
Settype TransformManage ColumnsSets the data type of the specified column. This transform does not modify the source values. The data in the column is re-inferred against the specified data type, which can change the results of column profiling.Select a new data type from the icon on the left side of the column header.
Split TransformInitial ParsingSplits the specified column into separate columns of data based on the delimiters in the transform. Delimiters can be specified in a number of methods described below.This transform might be automatically added to the beginning of your recipe. See Initial Parsing Steps .
Splitrows TransformInitial ParsingSplits a column of values into separate rows of data based on the specified delimiter. You can split rows only on String literal values. Pattern-based row splitting is not supported.This transform might be automatically added to the beginning of your recipe. See Initial Parsing Steps .
Unnest TransformNested Data

Unpacks nested data from an array Array or map Object column to create new rows or columns based on the keys in the source data. This transform works differently on columns of Object or Array or Map type. 

This transform might be automatically added to the beginning of your recipe. See Initial Parsing Steps .
Unpivot TransformNested DataReshapes the layout of data by merging one or more columns into key and value columns. Keys are the names of input columns, and the values are the cell values from the source columns. Rows of data are duplicated, once for each input column. 
Valuestocols TransformManage ColumnsFor each unique value in a column, a separate column is created. For each row that contains the value in the source column, an indicator value is inserted in the new column. This value can be a literal value or the output of a function. If no indicator value is generated, a null value is written. 
Window TransformAggregationThe window transform enables you to perform summations and calculations based on a rolling window of data relative to the current row. For example, you can compute the rolling average for a specified column for the current row value and the nine preceding rows. This transform is particularly useful for processing time or otherwise sequential data. 

...