Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0822

D toc

Whether data is missing, mismatched, or simply wrong, you can use a variety of methods in the

D s webapp
rtrue
 to replace values in one or more columns with literal values or pattern-based replacements. 

Replacement methods

In the Transformer page, you can use the following methods to replace values: 

MethodDescription
By selection

Select a value in the data grid to prompt a series of suggestions on what to do with the data. Typically, replacement options are near the top of the suggestions.

Tip

Tip: You can replace specific values in a column with a preferred value. For more information, see Replace Cell Values.

By column menuFrom the menu to the right of the column, select Replace and a sub-menu item to begin configuring a replacement transformation. See Column Menus.
By Transformer toolbarAt the top of the data grid, click the Replace icon in the Transformer toolbar to begin configuring replacements. See Transformer Toolbar.
By Search panelIn the Search panel, enter replace to build a replacement transformation from scratch. See Search Panel.

Replace by selection

When you select data in the data grid, the replacement suggestions are pre-specified for you, including a number of variants available in the suggestion card. 

Notes:

  • Suggestions are typically conservative in the scope of their changes. Case-sensitive searches and matching of the first occurrence only are the default settings.
  • Order of listing of suggestions in a suggestion card:
    • Pattern-based replacements are listed first. These replacements use 
      D s lang
      itempatterns
      , instead of regular expressions. Regular expressions can be more difficult to control.
    • Literal value replacements are listed below the pattern-based ones. 

For more information, see Overview of Predictive Transformation.

Mask data

For privacy reasons or for sensitivity reasons, you may wish to mask sensitive data in one or more columns with fixed strings.

Delete whole column(s)

If you need to remove the data in an entire column, the easiest method is to delete a column. Select one or more columns and then select Delete from the column drop-down. See Remove Data.

Masking all values

You can use a transformation like the following to replace all values in a column with a simple string. In this case, the value #REDACTED# has been inserted in place of all values in the column. 

Info

NOTE: This replacement changes the data type of the column to String. If you must retain the original data type, the replacement value should be valid for the data type.


D trans
Typestep
p01NameColumns
p01ValuetransactionValue
p02NameFormula
p02Value'#REDACTED#'
SearchTermEdit column with formula

Partial masking of values

Suppose you wish to partially mask data in a column. In the following example, data for the AcctNum column is masked, except for the last four characters (digits):

D trans
Typestep
p01NameColumns
p01ValueAcctNum
p02NameFormula
p02Valuevalue: merge(['XXXX',right(AcctNum, 4)], '')
SearchTermEdit column with formula

Mask multiple columns based on data type

You can use the following type of transformation to hide data based on data type. In this example, the values in all columns with Social Security Number (SSN) are replaced with a masking value: XXX-XX-XXXX:

Warning

This method performs a simple text replacement of the data in the columns(s). After this transformation has been applied to the data, the source data is no longer available, unless you step back to a step before this one. For these kinds of operations, you may find it more secure to apply these kinds of masking operations to the source data in a single recipe and then make that output available to other users to use as an imported dataset.


D trans
Typestep
p01NameColumns
p01ValueAll
p02NameFormula
p02Valueif(isvalid($col, ['SSN']), 'XXX-XX-XXXX', $col)
SearchTermEdit column with formula

Replace with values from another column

Replace whole column

You can do simple replacements of data from one column into another with transformations like the following. In this example, the values of colB are replaced with the values of colA with 0.15 added to them:

D trans
Typestep
p01NameColumns
p01ValuecolB
p02NameFormula
p02ValuecolA + 0.15
SearchTermEdit with formula

Replace partial values from another column

You can use the MERGE function to blend full or partial sets of columns into a new column. In the following example, the newBrandId value is concatenated with the product code in the ProdId column to create a new product identifier:

D trans
Typestep
p01NameColumns
p01ValueProdId
p02NameFormula
p02Valuemerge([newBrandId, right(prodId, 5)], '-')
SearchTermEdit with formula

Replace between positions

You can perform replacements based on character positions that you specify as part of the transformation.

  • The beginning character value is specified as a number from 0, which starts on the left.
  • The ending character value must be equal to or greater than the beginning character value.

In the following example, the Whse_Name column values are prepended with the value old-

D trans
p03Value0
Typestep
p01NameColumn
p01ValueWhse_name
p02NameStart position
p02Value0
p03NameEnd position
p04Valueold-
p04NameReplace with
SearchTermReplace by position

Search and replace text or pattern

You can search and replace content in your dataset based on literals or patterns. In the following example, the value ##CLT_NAME## is replaced with Our Customer, Inc. across all columns in the dataset:

D trans
p03Value'Our Customer, Inc.'
Typestep
p01NameColumn
p01ValueAll
p02NameFind
p02Value'##CLT_NAME##'
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or patterns

Replace missing values

Replace missing with zeroes

For numeric data, you may choose to replace values that are missing in a column with zeros. The following transformation sets missing values in the Qty and DiscountPct columns of Decimal data type to 0:

D trans
Typestep
p01NameColumns
p01ValueQty,DiscountPct
p02NameFormula
p02Valueif(ismissing([$col]), '0', $col)
SearchTermEdit column with formula

Replace missing with average values

One of the problems with the above method is that any statistical computations applied to the column are now affected by the zeroing of the missing values. For example, the computation for the AVERAGE function does not factor in missing values into the count of rows, which result in skewing of values for your purposes. 

The following example creates a new column from the DiscountPct column in which empty values are inserted as the average of the values in the source column:

D trans
p03ValueDiscountPct-0toAVG
Typestep
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueif(ismissing([DiscountPct]), average(DiscountPct), DiscountPct)
p03NameNew column name
SearchTermNew formula

In this manner, the new column can be used for some statistical modeling, while preserving the original values in the original column.

Replace mismatched values

You can perform replacements based on the values in a column that are mismatched against a specified type. 

In the following example, Datetime values that do not match the yyyy*mm*dd, where the asterisk (*) is a wildcard value.

D trans
p03ValueDate/Time
p06NameNew value
WrangleTextreplacemismatched col: Daily datetimeType: 'Datetime','yy-mm-dd','yyyy*mm*dd' type: Datetime with: custom text: '##BAD_DATE##'
p01NameColumns
p06Value'##BAD_DATE##'
p03NameData type to evaluate
p04Valueyyyy*mm*dd
SearchTermReplace mismatched values
Typestep
p05NameReplace with
p01ValueMultiple
p02NameColumn 1
p02ValuemyDate
p05ValueCustom value
p04NameDate/Time type

Info

NOTE: In the above example, the Date/Time type parameter applies only to replacements that are mismatched against the Date/Time data type. This parameter is used to specify the Datetime format against which the source values are validated. The parameter does not appear in Replace mismatched values transformations for other data types.

Rename columns

For more information, see Rename Columns.