Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »


Contents:

   

Contents:


In the Dataprep by Trifacta application, you can rename individual columns through the column drop-down. Through transform steps, you can apply renaming to one or more columns.

NOTE: An imported dataset requires about 15 rows to properly infer column data types and the row, if any, to use for column headers.

Name Requirements

  • Column names are case-insensitive and cannot begin with whitespace.
  • Column names cannot contain escaped characters, such as \n.

NOTE: When publishing to Avro,

column names support alphanumeric characters and the underscore (_) character only. Column names cannot begin with a numeral. Other characters cause an error to occur.


NOTE: Column names with spaces or special characters in a transformation must be wrapped by curly braces. Example:

column1,{Column 2 with space},column3

Tip: To prevent potential issues with downstream systems, you should limit your column lengths to no more than 128 characters.

Rename Individual Columns

To rename a column, click the drop-down caret next to the column name. Click Rename.

Rename through Suggestions

Steps:

  1. If your column already exists, click the name of the column. 
  2. Click the Rename suggestion card.
  3. Click Modify.
  4. Replace the newColumnName value with your preferred column name.

Rename for a New Column

Columns that are generated through transform steps are given a default name. 

For the following types of transforms, however, you can specify the column name as part of the step:

  • derive
  • extractkv
  • merge
  • nest

When a transform is added to the recipe, an as: clause is automatically added to the transform step. You can modify your transform to change the value of the as: column.


For example, the following transform generates a new column with the first word from the Name column. The as: value renames this generated column as FirstName:

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula FIND(Name,`{start} `,false,0)
Parameter: New column name FirstName

Auto-Generated Column Names

When your transforms generate new columns, names are automatically assigned to these columns based on the following pattern. 

  1. If the transform includes a function reference, the function name is included in the new column. Example:

    Transformation Name New formula
    Parameter: Formula type Single row formula
    Parameter: Formula LEFT(city,3)


    New column name: left_city

     

  2. If the above step is applied again, a duplicate column is generated with the following name. Example:

    Transformation Name New formula
    Parameter: Formula type Single row formula
    Parameter: Formula LEFT(city,3)


    New column name: left_city1

  3. If the transform does not contain a function reference, the following convention is used:

    Transformation Name New formula
    Parameter: Formula type Single row formula
    Parameter: Formula 'A'

    New column name: column1

    Transformation Name New formula
    Parameter: Formula type Single row formula
    Parameter: Formula 'B'

    New column name: column2

Rename Multiple Columns

Dataprep by Trifacta enables to rename multiple columns using a single transformation. You can perform this batch renaming using one of the methods described in this section.

NOTE: In macros, Rename Columns transformations do not work. This is a known issue.

Tip: To prevent potential issues with downstream systems, you should limit your column lengths to no more than 128 characters.


Steps:

  1. Open the Transform Builder to add a new step to your recipe. 
  2. From the drop-down in the first textbox, select Rename columns.
  3. Select your method of renaming. See below.
  4. Select the column or columns to which to apply the rename. 
  5. To add the step to your recipe, click Add.

Batch rename methods

The following methods can be applied to renaming multiple columns.

Manual rename

For each column that you select, you must add the new name just below the old one. 

  • To add additional columns to the mapping, click Add.
  • To remove columns from the mapping, click Remove.

Add prefix

For the selected columns, you can apply a specific prefix value to the names. Example:

Old Column NamePrefixNew Column Name
column1pre_pre_column1
column2pre_pre_column2
column3pre_pre_column3

Add suffix

For the selected columns, you can apply a specific suffix value to the names. Example:

Old Column NameSuffixNew Column Name
column1_newcolumn1_new
column2_newcolumn2_new
column3_newcolumn3_new

Find and replace

You can apply literals, Dataprep by Trifacta patterns, or regular expressions to match patterns of text in the source column names. These matching values can then be replaced by a fixed value.

Tip: The default behavior is to replace the first instance. Use the Match all occurrences checkbox to apply the pattern matching across all columns in your set.

For more information on patterns, see Text Matching.

Use row(s) as column names

When this method is applied, all of the values in the specified row or rows are used as the new names for each column.

NOTE: This method applies to all columns in the dataset.

Types:

TypeDescription
Use a single row to rename columns

Specify the row number in the sample to use as the source for column names.

NOTE: Source row number information must be available. See below.

Use the first row in the sample to rename columns Use the first row in the sample as the name for all columns.
Combine multiple rows to rename columns

Specify two or more rows to combine into column names. Details are below.

NOTE: Source row number information must be available. See below.

Source row number information:

NOTE: If source row number information is no longer available, this method cannot be used for column rename.

  • If a value is not applied for the source row number, the next row of data is used.
  • Source row numbers apply. Current row numbers may not be the same. In the data grid, mouse over the leftmost column to see available row information.
  • Each value in the row or combination of values across rows must be unique within the set of new column names.
  • The row is removed from its original position.
  • If the product is unable to find unique multi-row headers for the column, the first row of the header set is used.

Combine multiple rows:

The following transformation renames the columns in the dataset based on the values in rows 3 and 4 of the data:

Transformation Name Rename columns
Parameter: Option Use row(s) as column names
Parameter: Type Combine multiple rows to name columns
Parameter: Row Numbers - row A 3
Parameter: Row Numbers - row B 4
Parameter: Choose your separator '_'
Parameter: Fill across? Selected

In the above:

  • The separator is defined as an underscore character (_). This value can be empty.
  • When Fill across is selected, if any row value is empty, the last non-empty value for the row in a previous column is used as part of the column header.

  • No labels

This page has no comments.