Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Contents:


For each unique value in a column, a separate column is created. For each row that contains the value in the source column, an indicator value is inserted in the new column.

  • This indicator value can be a literal value or the output of a function. 
  • If no indicator value is generated, a null value is written. 

This transform is used to generate indicator columns, which can be used in statistical analysis.

  • It evaluates entire cell values for uniqueness. It does not scan for individual elements in Object or Array data.
  • If a row in the source column contains a missing value, an indicator value is added in a new Empty column.
  • It is not appropriate for tabulating counts of strings or patterns in a column. See Countpattern Transform.

Optionally, you can specify a default value, which is applied to all non-indicator value cells in the new column.

NOTE: When this transform is applied in the data grid, it only identifies the unique values in the current sample. If there are other unique values in the entire dataset, new columns are not created for them when the transform is executed across the entire dataset.

Basic Usage

Source:

Data
Happy
Dog
Happy Happy Dog

Transform:

valuestocols col:Data value:'X'

Results:

DataHappyDogHappy_Happy_Dog
HappyX  
Dog X 
Happy Happy Dog  X

Parameters

valuestocols col:column_ref value:(expression) default:(expression) [limit:int_num]

ParameterRequired?Transform BuilderData TypeDescription
colYColumnstringName of source column
valueYFormula if presentstringString literal, column, or function call that defines the value to use as the indicator value in any newly generated column
defaultNFormula if missingstringString literal, column, or function call that defines the value to use to indicate a false match in any newly generated column
limitNMax columns to createinteger (positive)Maximum number of columns to generate. Default is 50.

For more information on syntax standards, see Language Documentation Syntax Notes.

col

Identifies the column to which to apply the transform. You can specify only one column.

Usage Notes:

Required?Data Type
YesString (column name)

value

For the valuestocols transform, this parameter specifies the value to insert in each row of a generated column where the column name of the generated column appears in the same row of the source column. This value can be a string literal, a column reference, or a function.

Usage Notes:

Required?Data Type
YesString literal, column reference, or function call

default

Optionally, this parameter can be used to specify the value to insert in each row of a generated column where the column name of the generated column does not appear in the same row of the source column. This value can be a string literal, a column reference, or a function.

If this parameter is not specified, a missing value is inserted.

Usage Notes:

Required?Data Type
NoString literal, column reference, or function call

limit

The limit parameter defines the maximum number of columns to create from the unique values detected in the source column. If not specified, the limit is 50.

NOTE: Be careful setting this parameter too high. In some cases, the application can run out of memory generating the results, and your results can fail.

Usage Notes:

Required?Data Type
No. Default value is 50.Integer (positive)


Examples

Example - Basic valuestocols

Source:

This dataset contains onboarding milestones for three employees who joined the company at the same time. The milestones were recorded and organized by date as individual items, so it's not easy to verify that all five milestones have been checked off for each employee:

  • Orientation

  • Contact Info

  • Acquire Computer

  • HR Policies Training

  • Product Training

DateNameonboardingChecklist
4/4/16Bowie KuhnOrientation
4/4/16Happy ChandlerContact Info
4/4/16Bowie KuhnContact Info
4/4/16Bowie KuhnAcquire Computer
4/4/16Bud SeligProduct Training
4/4/16Bud SeligOrientation
4/5/16Happy ChandlerHR Policies Training
4/5/16Happy ChandlerOrientation
4/5/16Happy ChandlerAcquire Computer
4/5/16Bowie KuhnHR Policies Training
4/5/16Bud SeligHR Policies Training
4/5/16Bud SeligContact Info
4/6/16Happy ChandlerProduct Training

Transform:

The following transform creates columns for each of the values in the onboardingChecklist column and adds a yes value where there is a match for the row:

valuestocols col: onboardingChecklist value:'yes'

Results:

In the generated columns, you can quickly assess whether all three employees have completed an individual onboarding item:

  • Bud Selig has not acquired his computer.
  • Bowie Kuhn has not had product training.
DateNameonboardingChecklistOrientationContact_InfoAcquire_ComputerProduct_TrainingHR_Policies_Training
4/4/16Bowie KuhnOrientationyes    
4/4/16Happy ChandlerContact Info yes   
4/4/16Bowie KuhnContact Info yes   
4/4/16Bowie KuhnAcquire Computer  yes  
4/4/16Bud SeligProduct Training   yes 
4/4/16Bud SeligOrientationyes    
4/5/16Happy ChandlerHR Policies Training    yes
4/5/16Happy ChandlerOrientationyes    
4/5/16Happy ChandlerAcquire Computer  yes  
4/5/16Bowie KuhnHR Policies Training    yes
4/5/16Bud SeligHR Policies Training    yes
4/5/16Bud SeligContact Info yes   
4/6/16Happy ChandlerProduct Training   yes 

Example - Magazine subscriptions

This example shows how you can cross-reference columns of data using the following transforms:

  • flatten - Flatten values in an array into separate rows in the dataset. See Flatten Transform.
  • valuestocols - Extract unique instances of values into separate columns, with an indicator added to each row where the unique value is found. See Valuestocols Transform.

Source:

The following data covers magazine subscriptions for individual customers. Their subscriptions are stored in an array of values. You are interested in who is subscribing to each magazine. 

CustIdSubscriptions
Anne Aimes["Little House and Garden","Sporty Pants","Life on the Range"]
Barry Barnes["Sporty Pants","Investing for Tomorrow"]
Cindy Compton["Cakes and Pies","Powerlifting Plus","Running for Days"]
Darryl Diaz["Investing for Tomorrow","Cakes and Pies"]

Transform:

When this data is loaded into the Transformer, you might need to apply a header to it. If it is in CSV format, you might need to apply some replace transforms to clean up the Subscriptions column so it looks like the above. 

When the Subscriptions column contains cleanly formatted arrays, the column is re-typed as Array type. You can then apply the flatten transform:

 

flatten col:Subscriptions

Each CustId/Subscription combination is now written to a separate row. You can use this new data structure to break out instances of magazine subscriptions. Using the following transform, you can add the corresponding CustId value to the column:

valuestocols col:Subscriptions value:CustId

Drop the two source columns:

drop col:CustId,Subscriptions

Results:

Little_House_and_GardenSporty_PantsLife_on_the_RangeInvesting_for_TomorrowCakes_and_PiesPowerlifting_PlusRunning_for_Days
Anne Aimes      
 Anne Aimes     
  Anne Aimes    
 Barry Barnes     
   Barry Barnes   
    Cindy Compton  
     Cindy Compton 
      Cindy Compton
   Darryl Diaz   
    Darry Diaz  

 

 

 

Your Rating: Results: PatheticBadOKGoodOutstanding! 6 rates

This page has no comments.